AI detectors are not reliable enough to use as proof of anything. Every major tool tested produces false positives — flagging real human writing as AI — often at rates that make the results close to meaningless for individual cases. They can be a starting point, but never a verdict.
When ChatGPT became widely available in late 2022, a new category of software appeared almost immediately: AI detectors. The promise was appealing — paste in text, get a percentage, know if a human or machine wrote it. Schools started subscribing. Parents started checking homework. Employers started screening job applications.
The reality turned out to be much messier. Several years of research and real-world use have made one thing clear: these tools are genuinely useful for understanding statistical patterns in text, and genuinely unreliable for judging any single piece of writing. Understanding why — and what the tools actually measure — is the most important thing before using one.
How AI Detectors Work
Every AI detector analyzes text for statistical properties that tend to differ between AI output and human writing. The two main signals are perplexity and burstiness.
Perplexity measures how surprising each word choice is, given the words before it. AI tends to pick highly probable, predictable words. Human writers make more unexpected choices — a metaphor, a slang term, a long word where a short one would do. Low perplexity suggests machine-like predictability.
Burstiness measures how much sentence length varies. Humans tend to mix very short sentences with longer ones in an uneven rhythm. AI tends toward more uniform sentence lengths, especially in formal writing.
These are reasonable ideas. The problem is that many humans write in ways that score as low-perplexity and low-burstiness — especially people who write carefully, formally, or in English as a second language.
Comparison: What to Look for in a Detector
The table below compares the main categories of AI detection tools across criteria that matter for practical use. It does not include invented accuracy percentages — those vary too much by use case and prompt style to be meaningful. The qualitative ratings reflect patterns widely reported in independent testing and published research.
| Criterion | Free browser tools | School / LMS integrations | API-based tools | Open-source tools |
|---|---|---|---|---|
| False positive risk on ESL text | High | High | Moderate to High | Varies widely |
| False positive risk on formal human writing | High | Moderate to High | Moderate | Varies |
| Detection of lightly edited AI text | Low | Low to Moderate | Moderate | Low |
| Detection of paraphrased AI text | Very Low | Low | Low | Very Low |
| Explains why it flagged text | Rarely | Sometimes | Sometimes | Depends on tool |
| Audit log / evidence trail | No | Sometimes | Yes | No |
| Cost | Free | Subscription (per school) | Pay per use | Free |
| Suitable as sole evidence of cheating | No | No | No | No |
The last row is the same across every category, because no tool currently available meets the standard of evidence needed to accuse someone of academic dishonesty on its own.
The False Positive Problem
False positives — cases where the detector flags human writing as AI — are the central failure mode. They are well-documented, widely reported, and serious.
Some groups are more likely to be flagged than others:
Non-native English speakers write in patterns that match AI statistical signatures more closely. Formal vocabulary, careful grammar, and structured paragraphs are all traits that score as low-perplexity.
Students who write formally for academic assignments — the way they are often taught to write — produce text that many detectors find suspicious.
Writers who draft carefully and edit tend to produce smoother, more predictable text than writers who dash things off.
There is no way to know from the outside whether a false positive is happening in any given case. That is the core problem. A result of "98% AI" tells you that the text scores similarly to AI-generated text. It does not tell you that AI generated it.
What Detectors Cannot Catch
Modern AI, when prompted to write conversationally, with imperfections, or in a specific person's style, can produce text that scores as fully human on nearly every detector. Anyone motivated to evade detection can do so easily:
- Ask the AI to "write like a high school student" or "make it sound casual"
- Edit a few sentences manually after generating
- Run the text through a free paraphrase tool
- Ask the AI to vary sentence lengths and include contractions
This means a student who takes cheating seriously is unlikely to be caught by a detector. A student who did not try to evade detection might be caught. The tools end up being slightly better at catching careless use of AI than deliberate misuse.
What Actually Helps
For teachers, the more durable approaches involve the writing process rather than the final product:
- Ask students to submit drafts at multiple stages, not just a final document
- Include in-class writing components that mirror out-of-class assignments
- Ask students to discuss their work: what sources they used, what was difficult, what they would change
- Look for inconsistencies between a student's verbal explanation and what the essay argues
A student who used AI to write an essay will typically struggle to explain it. A student who wrote it — even with AI assistance for research or editing — will have something to say about their own thinking process.
For parents, the same principle applies. If you are curious whether your child used AI for an assignment, ask them to walk you through what they did. The conversation is more informative than any detector.
What to Try Next
To understand what patterns actually show up in AI writing — beyond what a detector measures — read How to Tell If a Text Was Written by AI. If you want a practical guide for talking to your kid about AI and homework, My Kid Uses ChatGPT for Homework — A Parent's Guide has a calm, step-by-step approach.



