LLMs edited 86 human essays toward a semantic cluster not occupied by any human writer [D]

Researchers from Berkeley, UCSD, UW, and Google DeepMind studied three datasets: a 100-person controlled user study, 86 pre-LLM human-written essays (collected in 2021) revised by GPT-5-mini, Gemini 2.5 Flash, and Claude Haiku, and 18,000 ICLR 2026 peer reviews.

The clearest finding: when mapped in embedding space, LLM-revised essays cluster tightly in a region not occupied by any of the human-written essays. Human essays are spread broadly. The LLM pushes every essay in the same direction regardless of revision instruction - even "fix grammar only" produces this shift. The unique lexical fingerprint of each writer is overwritten by the LLM's preferred vocabulary.

The stance shift is measurable: users given LLM assistance wrote significantly more neutral essays and avoided definitive positions. LLMs increased nouns and adjectives, decreased pronouns - more formal and statistical, less personal. Personal experience arguments were replaced with statistical and expert-citation arguments.

The ICLR 2026 finding is the sharpest institutional data point: 21% of peer reviews were AI-generated. AI reviewers scored papers 10% higher, were 136% more likely to emphasize reproducibility, and 84% more likely to emphasize scalability. Humans were more likely to comment on clarity as both a strength and a weakness. The criteria being rewarded in peer review are already shifting.

The user study paradox: heavy LLM users recognized the loss of voice but reported equivalent satisfaction. The efficiency gain is immediate; the cultural cost is diffuse.

Is there a writing task where you've noticed LLM revision consistently pulling you away from what you actually wanted to say?

submitted by /u/jimmytoan
[link] [comments]

Leave a Comment