Getting Claude to rank the inkhaven bloggers

With apologies to those who didn't make this post, it seems you need to up your game

Yesterday, Alexander Wales published a post entitled "Can an LLM have taste? Inkhaven Week 1, ranked by Claude". I found this very entertaining.

He took Claude, used it to compare a bunch of inkhaven posts, ranked them, and provided us with this wonderful list of the top ten posts so far:

Three Stones are Enough: The Case Against Leaves, in Particular, Anna Mattinger
An open letter to 21 people I know who died, Layla Hughes
endometrial biopsy, kaylee
Softhead, macroraptor
Every Lighthaven Writing Residency, Layla Hughes
The largest manufacturer of feelings in human history, Natalie Cargill
The one that loved me most, MLL
I did it. I found the worst poem in the world., Natalie Cargill
“Love, Mum” - What AIs can’t see about abuse, Natalie Cargill
Lost Mesoamerican Technologies, Lost Futures

I thought this was great, so I decided to make my own version, but better. Rather than using pure ranking, I (or rather, Claude, who assisted me with the actual implementation of this hare-brained scheme) decided to use a Bradley-Terry model, which Claude informs me is rather like the Elo system used to rank chess players.

Using the Anthropic API, we gave Claude Opus 4.6 the following prompt (also written by Claude, but edited by me), including 8 posts for it to rank:

You are judging posts from Inkhaven, a writing residency where participants commit to publishing one blog post every day for 30 days. The residents are a mix of AI safety researchers, rationalists, fiction writers, and generally thoughtful people. The audience skews heavily rationalist — LessWrong regulars, EA-adjacent, people who take ideas seriously but also appreciate a good joke.
You will be shown 8 Inkhaven posts. Rank them by quality, from best to worst.
The question to ask yourself for each post: "Would a typical rationalist vote to read more of this sort of thing?" You're not rating a single post in isolation — you're judging whether the author, writing in this mode, should keep going. Insight, craft, honest thinking, and distinctive voice all count.
So does being funny — humour is a genuine virtue here, not a tiebreaker.
A few things to keep in mind:
- Do NOT be generous or encouraging. Predict the actual taste of the rationalist audience. Many of these posts will be mediocre and that's fine to say.
- Fiction, essays, rants, reviews, and technical posts are all on the same scale — judge each by whether it succeeds at what it's trying to do.
- Length is not quality. A tight 500 words can beat a bloated 3000.
- Weird and niche is fine, often good. Idiosyncrasy is often a feature, not a bug.
=== POST {i} ===
title + first 4000 chars of body.
Rank all 8 posts from best to worst. Think through your reasoning, then give your final answer as a comma-separated list of post numbers inside <answer> tags.

We did five iterations:

Get baseline estimates for each post
Get more accurate estimates for posts liable to be in the top 10
Get proper estimates for the posts we'd accidentally imported in the wrong format
Try to push my post from 2nd to 1st place (instead, it ended up in 10th).
Realise that we were missing a bunch of posts and add those in (it didn't change much)

$40 in burned API credits later, we got the following table:

#	Score	Author	Title
1	+2.77σ	Natalie Cargill	How to invent a disease
2	+2.55σ	Alec Thompson	More Legal Systems Very Different From Ours 1
3	+2.54σ	Avi	The Smell
4	+2.53σ	Aaron Gertler	Posts I Will Not Be Writing
5	+2.49σ	Smitty	How the Claude Mythos leak happened
6	+2.49σ	Natalie Cargill	The largest manufacturer of feelings in human history
7	+2.49σ	viv	The phenomenology of being hungry while pregnant
8	+2.47σ	Alec Thompson	More Legal Systems Very Different From Ours 2: Nazi Private Law
9	+2.46σ	Anna Mattinger	Three Stones are Enough: The Case Against Leaves, in Particular
10	+2.45σ	Sean Herrington	The quest for general intelligence is hitting a wall
11	+2.39σ	Alec Thompson	Why did Hitler hate Roman law?
12	+2.35σ	Austen	Forgotten 18th Century Chinese Republics
13	+2.33σ	Alec Thompson	Finding Jack O'Neil
14	+2.28σ	Vishal Prasad	When the buffalo went away...
15	+2.28σ	viv	Late pregnancy is pretty bizarre
16	+2.21σ	Itsi Weinstock	Sin as a physical particle
17	+2.20σ	Bill Jackson	Two critiques of Rethink Priorities' Moral Weights project
18	+2.16σ	Natalie Cargill	I did it. I found the worst poem in the world.
19	+2.12σ	viv	How many genders are there?
20	+1.99σ	Benjamin Sturgeon	Revisiting GSM-Symbolic: Do 2026 Frontier Models Still Fail at Confounded Grade School Math?

Claude did a diligent bootstrapping check to ensure we had the right posts in the top 20, and found that post 19 was there 90% of the time, while Ben Sturgeon's Revisiting GSM-Symbolic hit a mere 22%. You're on thin ice, Ben.

Averaging the scores of the individual posts also enables us to give a ranking of the authors. The top 20 authors at inkhaven right now are... [drum roll]:

#	Score	Posts included	Author	Best post
1	+2.60σ	10	viv	The phenomenology of being hungry while pregnant
2	+2.52σ	9	Natalie Cargill	How to invent a disease
3	+2.23σ	9	Alec Thompson	More Legal Systems Very Different From Ours 1
4	+1.82σ	9	Aaron Gertler	Posts I Will Not Be Writing
5	+1.56σ	9	Steven K	Prosaic License
6	+1.34σ	9	Katja Grace	Eggs, rooms, puzzles, and talking about AI
7	+1.06σ	9	capsuletime	Fuck Blogging
8	+1.05σ	10	Kevin Z Wu	(box\|bag) in (box\|bag) in (box\|bag)
9	+1.03σ	9	Austen	Forgotten 18th Century Chinese Republics
10	+0.82σ	10	Drew Schorno	2035
11	+0.68σ	9	Justis Mills (Writing Advisor)	Why No Wheel Bus Again?
12	+0.58σ	9	Bill Jackson	Two critiques of Rethink Priorities' Moral Weights project
13	+0.49σ	9	Lawrence Chan	We're actually running out of benchmarks to upper bound AI capabilities
14	+0.44σ	9	Avi	The Smell
15	+0.43σ	9	Derek Razo	How to Pay to Change the Law
16	+0.37σ	9	conq	19th century poet UTTERLY DESTROYS critics (NO MERCY!)
17	+0.32σ	9	Alicorn (Writing Advisor)	Dogs Are Rude
18	+0.31σ	6	Remy	You Know What They Say About Assuming
19	+0.29σ	7	Layla Hughes	Every Lighthaven Writing Residency
20	+0.22σ	9	Henry Stanley	Inkhavening

I should probably note that I filtered out anyone who hadn't published posts on at least 2/3rds of the days, so Vishal Prasad (+1.89σ, 2 posts), Robert Mushkatblat (+1.41σ, 4 posts), A.G.G Liu (+0.7σ, 1 post), Justin Kuiper (+0.32σ, 1 post) and Georgia Ray (+0.3σ, 1 post) didn't make the cut on number, despite having the quality.

Alexander Wales (-0.23σ, 4 posts), whose post inspired this one, is also, sadly, left out of the rankings. (Sorry).

Discuss

Leave a Comment