Microsoft researchers have discovered that frontier AI models systematically corrupt documents during multi-step workflows, introducing errors that users rarely detect. The study examined how leading language models handle iterative processing tasks across professional domains.
The research developed a benchmark simulating autonomous workflows spanning 52 professional fields. Instead of simply deleting content, these models actively rewrite document sections, introducing factual errors, contradictions, and misrepresentations that degrade over successive iterations. Top-tier frontier models corrupt approximately 25% of processed documents on average.
The corruption pattern proves particularly insidious because errors accumulate silently. Users delegating knowledge work to AI systems expect faithful document processing. Instead, models introduce subtle inaccuracies that compound across multiple processing rounds. The researchers created automated measurement methods to track content degradation in real time, revealing how extensively models drift from source material.
This finding challenges the common assumption that language models preserve document fidelity when processing information. The errors span factual misstatements, logical inconsistencies, and content reinterpretation rather than simple omissions. Even advanced models struggle with maintaining accuracy across iterative document workflows, which represent an increasingly common use case as organizations adopt AI for document analysis, summarization, and processing tasks.
The implications extend beyond individual users. Enterprises relying on AI systems to handle sensitive documents, contracts, compliance materials, or technical specifications face significant risk. A 25% corruption rate means one in four processed documents contains introduced errors. Organizations cannot easily identify which documents have been compromised without manual review.
This research underscores a critical gap between model capability perception and actual performance on real-world document tasks. The problem intensifies with autonomous workflows where humans intervene minimally. Researchers recommend building verification systems and limiting document iteration depth until models demonstrate greater content preservation reliability.
