How Does AI Detection Work?

AI detectors work by analyzing text for two key statistical signals: perplexity (how predictable each word choice is) and burstiness (how much sentence length varies). AI-generated text tends to have low perplexity and low burstiness, with uniformly structured sentences. Detectors are trained on millions of AI and human writing samples to recognize these patterns and assign a probability score.

The Two Core Detection Signals

1. Perplexity

Perplexity measures how surprised a language model would be by each word choice in a piece of text. When a word choice is very predictable the obvious, high-probability word for the context perplexity is low. When a writer chooses something unexpected but fitting, perplexity is high.

AI language models are trained to predict the most probable next token at each step. This makes their output inherently low-perplexity every word is the "expected" choice. Human writers, by contrast, make more varied decisions: they choose more specific words, use idioms, make analogies, or opt for a less common phrasing that better captures what they mean.

Example:

AI (low perplexity): "This has a significant impact on the overall performance of the system."

Human (higher perplexity): "This quietly chokes the system's throughput especially at scale."

2. Burstiness

Burstiness measures how much sentence lengths vary across a piece of text. Human writing "bursts" mixing short, punchy sentences with longer, explanatory ones without a pattern. AI writing is notably uniform: most sentences are similar in length, reflecting the model's tendency to produce balanced, templated structures.

This is one of the most reliable detection signals because it's structural it doesn't depend on any specific vocabulary or topic. Even if an AI model avoids "AI words," its sentence lengths will still be uniformly distributed.

Example:

AI (low burstiness): All sentences are 15-20 words. Each paragraph has three sentences. Every section is balanced in length.

Human (high burstiness): Short. Then a longer sentence that develops the idea with specific context. Then a really long sentence that explains the nuance, provides an example, and maybe introduces a counterpoint before closing with the main point.

How Each Major AI Detector Works

GPTZero

Pioneered the perplexity + burstiness approach. Provides sentence-level highlighting showing which sentences are AI-flagged. Also reports document-level AI probability.

Turnitin

Integrates AI detection with plagiarism checking. Uses a trained classifier on top of perplexity signals. Reports a percentage AI-writing indicator alongside the plagiarism originality score.

Originality.ai

Uses a combination of perplexity analysis and a fine-tuned classifier trained on GPT, Claude, and other models. Provides sentence-level scoring and a document-level percentage.

Copyleaks

Enterprise-grade detection using perplexity and n-gram distribution analysis. Integrated into enterprise workflows for content verification at scale.

ZeroGPT

Free consumer tool using a perplexity-based model. Reports percentage AI content and highlights suspected AI sentences. Less precise than paid tools but widely used.

Winston AI

Used by media organizations and academic institutions. Provides a human/AI score with document-level probability. Trains specifically on the latest AI model outputs.

Limitations of AI Detection

AI detectors are useful but imperfect. Several limitations are well-documented:

False positives

Human-written academic or technical writing is often more uniform in structure than creative writing and can trigger AI detection flags. Non-native English speakers are disproportionately flagged because their writing tends to be more grammatically predictable.

Model specificity

Detectors trained primarily on GPT output may perform differently on Claude, Gemini, or other models. As new AI models are released, detectors need retraining.

Short text

Statistical methods require sufficient text to be meaningful. Detectors are unreliable on very short passages (under ~100 words) because the sample size is too small for confident analysis.

Evolving arms race

As humanization tools improve, detectors must improve in response. Detection accuracy on humanized text has historically been lower than on raw AI output.

How Humanizing Bypasses AI Detection

Since AI detection relies primarily on perplexity and burstiness, bypassing it requires genuinely altering these properties not just masking them at the surface level.

TextHumanizer rewrites text to raise perplexity (introducing more varied, contextually appropriate vocabulary choices) and increase burstiness (varying sentence lengths and structures naturalistically). The result exhibits human-typical statistical properties because it has been rewritten to genuinely match human writing patterns.

This is why TextHumanizer produces more reliable bypass results than simple word-substitution tools: it targets what detectors actually measure.

Read: How to bypass AI detection →

FAQ

How does AI detection work?

How accurate are AI detectors?

AI detectors vary in accuracy. GPTZero and Originality.ai report high accuracy on clearly AI-generated text. However, all detectors produce false positives human-written text that scores as AI-generated particularly for technical writing, formal academic prose, and non-native English speakers whose writing is naturally more uniform. No detector is 100% accurate.

What is perplexity in AI detection?

In AI detection, perplexity measures how predictable each word choice is. Language models like GPT-4 are trained to predict the most probable next word making their output very low-perplexity (each word was the expected choice). Human writers make more varied, sometimes surprising word choices, producing higher perplexity. Detectors use low perplexity as a signal of AI authorship.

Can AI detectors be fooled?

AI detectors can be bypassed by humanizing the text rewriting it to exhibit human-typical statistical properties (higher perplexity, higher burstiness). Tools like TextHumanizer are specifically designed for this. They don't 'fool' detectors through tricks; they produce text that genuinely exhibits human writing characteristics.

Does Turnitin's AI detector work differently from GPTZero?

Turnitin and GPTZero use similar underlying approaches both analyze statistical properties of text. The main differences are in their training data, weighting of specific signals, and how they present scores. Turnitin integrates AI detection with plagiarism checking in a single score; GPTZero provides sentence-level AI probability highlighting. Both look for the same fundamental signals.

Bypass AI detection What is an AI humanizer?AI humanizer vs paraphraser Free AI humanizer