Do AI detectors actually work?

They work in the sense that they can flag AI-heavy text — but they also flag plenty of human text. No detector is accurate enough to use as sole evidence that someone cheated.

Which AI detector is the most accurate?

No independent, large-scale study has found a consistently most accurate detector. All tools tested show meaningful false positive rates. We recommend treating any result as a prompt for conversation, not a conclusion.

Can AI detectors be fooled?

Yes, easily. Asking AI to rewrite in a casual tone, editing a few sentences, or running output through a paraphrase tool typically drops the detection score significantly.

Why do detectors flag non-native English speakers?

Non-native speakers often write in a formal, careful, structured style — which matches the statistical patterns detectors look for. This is one of the most well-documented and serious problems with current tools.

Should schools use AI detectors to catch cheating?

Not as a primary tool. A detector result that is taken as evidence without a conversation or other context is unfair to students and likely to produce wrongful accusations.

AI Detectors Tested: Accuracy, False Positives and What Teachers Should Know

When ChatGPT became widely available in late 2022, a new category of software appeared almost immediately: AI detectors. The promise was appealing — paste in text, get a percentage, know if a human or machine wrote it. Schools started subscribing. Parents started checking homework. Employers started screening job applications.

The reality turned out to be much messier. Several years of research and real-world use have made one thing clear: these tools are genuinely useful for understanding statistical patterns in text, and genuinely unreliable for judging any single piece of writing. Understanding why — and what the tools actually measure — is the most important thing before using one.

How AI Detectors Work

Every AI detector analyzes text for statistical properties that tend to differ between AI output and human writing. The two main signals are perplexity and burstiness.

Perplexity measures how surprising each word choice is, given the words before it. AI tends to pick highly probable, predictable words. Human writers make more unexpected choices — a metaphor, a slang term, a long word where a short one would do. Low perplexity suggests machine-like predictability.

Burstiness measures how much sentence length varies. Humans tend to mix very short sentences with longer ones in an uneven rhythm. AI tends toward more uniform sentence lengths, especially in formal writing.

These are reasonable ideas. The problem is that many humans write in ways that score as low-perplexity and low-burstiness — especially people who write carefully, formally, or in English as a second language.

Comparison: What to Look for in a Detector

The table below compares the main categories of AI detection tools across criteria that matter for practical use. It does not include invented accuracy percentages — those vary too much by use case and prompt style to be meaningful. The qualitative ratings reflect patterns widely reported in independent testing and published research.

Criterion	Free browser tools	School / LMS integrations	API-based tools	Open-source tools
False positive risk on ESL text	High	High	Moderate to High	Varies widely
False positive risk on formal human writing	High	Moderate to High	Moderate	Varies
Detection of lightly edited AI text	Low	Low to Moderate	Moderate	Low
Detection of paraphrased AI text	Very Low	Low	Low	Very Low
Explains why it flagged text	Rarely	Sometimes	Sometimes	Depends on tool
Audit log / evidence trail	No	Sometimes	Yes	No
Cost	Free	Subscription (per school)	Pay per use	Free
Suitable as sole evidence of cheating	No	No	No	No

The last row is the same across every category, because no tool currently available meets the standard of evidence needed to accuse someone of academic dishonesty on its own.

The False Positive Problem

False positives — cases where the detector flags human writing as AI — are the central failure mode. They are well-documented, widely reported, and serious.

Some groups are more likely to be flagged than others:

Non-native English speakers write in patterns that match AI statistical signatures more closely. Formal vocabulary, careful grammar, and structured paragraphs are all traits that score as low-perplexity.

Students who write formally for academic assignments — the way they are often taught to write — produce text that many detectors find suspicious.

Writers who draft carefully and edit tend to produce smoother, more predictable text than writers who dash things off.

There is no way to know from the outside whether a false positive is happening in any given case. That is the core problem. A result of "98% AI" tells you that the text scores similarly to AI-generated text. It does not tell you that AI generated it.

What Detectors Cannot Catch

Modern AI, when prompted to write conversationally, with imperfections, or in a specific person's style, can produce text that scores as fully human on nearly every detector. Anyone motivated to evade detection can do so easily:

Ask the AI to "write like a high school student" or "make it sound casual"
Edit a few sentences manually after generating
Run the text through a free paraphrase tool
Ask the AI to vary sentence lengths and include contractions

This means a student who takes cheating seriously is unlikely to be caught by a detector. A student who did not try to evade detection might be caught. The tools end up being slightly better at catching careless use of AI than deliberate misuse.

What Actually Helps

For teachers, the more durable approaches involve the writing process rather than the final product:

Ask students to submit drafts at multiple stages, not just a final document
Include in-class writing components that mirror out-of-class assignments
Ask students to discuss their work: what sources they used, what was difficult, what they would change
Look for inconsistencies between a student's verbal explanation and what the essay argues

A student who used AI to write an essay will typically struggle to explain it. A student who wrote it — even with AI assistance for research or editing — will have something to say about their own thinking process.

For parents, the same principle applies. If you are curious whether your child used AI for an assignment, ask them to walk you through what they did. The conversation is more informative than any detector.

What to Try Next

To understand what patterns actually show up in AI writing — beyond what a detector measures — read How to Tell If a Text Was Written by AI. If you want a practical guide for talking to your kid about AI and homework, My Kid Uses ChatGPT for Homework — A Parent's Guide has a calm, step-by-step approach.

AI Detectors Tested: Accuracy, False Positives and What Teachers Should Know

Frequently asked questions

Keep reading