Getting Claude to rank the inkhaven bloggers

With apologies to those who didn't make this post, it seems you need to up your game


Yesterday, Alexander Wales published a post entitled "Can an LLM have taste? Inkhaven Week 1, ranked by Claude". I found this very entertaining.

He took Claude, used it to compare a bunch of inkhaven posts, ranked them, and provided us with this wonderful list of the top ten posts so far:

  1. Three Stones are Enough: The Case Against Leaves, in Particular, Anna Mattinger
  2. An open letter to 21 people I know who died, Layla Hughes
  3. endometrial biopsy, kaylee
  4. Softhead, macroraptor
  5. Every Lighthaven Writing Residency, Layla Hughes
  6. The largest manufacturer of feelings in human history, Natalie Cargill
  7. The one that loved me most, MLL
  8. I did it. I found the worst poem in the world., Natalie Cargill
  9. “Love, Mum” - What AIs can’t see about abuse, Natalie Cargill
  10. Lost Mesoamerican Technologies, Lost Futures

I thought this was great, so I decided to make my own version, but better. Rather than using pure ranking, I (or rather, Claude, who assisted me with the actual implementation of this hare-brained scheme) decided to use a Bradley-Terry model, which Claude informs me is rather like the Elo system used to rank chess players.

Using the Anthropic API, we gave Claude Opus 4.6 the following prompt (also written by Claude, but edited by me), including 8 posts for it to rank:

You are judging posts from Inkhaven, a writing residency where participants commit to publishing one blog post every day for 30 days. The residents are a mix of AI safety researchers, rationalists, fiction writers, and generally thoughtful people. The audience skews heavily rationalist — LessWrong regulars, EA-adjacent, people who take ideas seriously but also appreciate a good joke.

You will be shown 8 Inkhaven posts. Rank them by quality, from best to worst.

The question to ask yourself for each post: "Would a typical rationalist vote to read more of this sort of thing?" You're not rating a single post in isolation — you're judging whether the author, writing in this mode, should keep going. Insight, craft, honest thinking, and distinctive voice all count.

So does being funny — humour is a genuine virtue here, not a tiebreaker.

A few things to keep in mind:

- Do NOT be generous or encouraging. Predict the actual taste of the rationalist audience. Many of these posts will be mediocre and that's fine to say.

- Fiction, essays, rants, reviews, and technical posts are all on the same scale — judge each by whether it succeeds at what it's trying to do.

- Length is not quality. A tight 500 words can beat a bloated 3000.

- Weird and niche is fine, often good. Idiosyncrasy is often a feature, not a bug.

=== POST {i} ===

title + first 4000 chars of body.

Rank all 8 posts from best to worst. Think through your reasoning, then give your final answer as a comma-separated list of post numbers inside <answer> tags.

We did five iterations:

  1. Get baseline estimates for each post
  2. Get more accurate estimates for posts liable to be in the top 10
  3. Get proper estimates for the posts we'd accidentally imported in the wrong format
  4. Try to push my post from 2nd to 1st place (instead, it ended up in 10th).
  5. Realise that we were missing a bunch of posts and add those in (it didn't change much)

$40 in burned API credits later, we got the following table:

#

Score

Author

Title

1

+2.77σ

Natalie Cargill

How to invent a disease

2

+2.55σ

Alec Thompson

More Legal Systems Very Different From Ours 1

3

+2.54σ

Avi

The Smell

4

+2.53σ

Aaron Gertler

Posts I Will Not Be Writing

5

+2.49σ

Smitty

How the Claude Mythos leak happened

6

+2.49σ

Natalie Cargill

The largest manufacturer of feelings in human history

7

+2.49σ

viv

The phenomenology of being hungry while pregnant

8

+2.47σ

Alec Thompson

More Legal Systems Very Different From Ours 2: Nazi Private Law

9

+2.46σ

Anna Mattinger

Three Stones are Enough: The Case Against Leaves, in Particular

10

+2.45σ

Sean Herrington

The quest for general intelligence is hitting a wall

11

+2.39σ

Alec Thompson

Why did Hitler hate Roman law?

12

+2.35σ

Austen

Forgotten 18th Century Chinese Republics

13

+2.33σ

Alec Thompson

Finding Jack O'Neil

14

+2.28σ

Vishal Prasad

When the buffalo went away...

15

+2.28σ

viv

Late pregnancy is pretty bizarre

16

+2.21σ

Itsi Weinstock

Sin as a physical particle

17

+2.20σ

Bill Jackson

Two critiques of Rethink Priorities' Moral Weights project

18

+2.16σ

Natalie Cargill

I did it. I found the worst poem in the world.

19

+2.12σ

viv

How many genders are there?

20

+1.99σ

Benjamin Sturgeon

Revisiting GSM-Symbolic: Do 2026 Frontier Models Still Fail at Confounded Grade School Math?

Claude did a diligent bootstrapping check to ensure we had the right posts in the top 20, and found that post 19 was there 90% of the time, while Ben Sturgeon's Revisiting GSM-Symbolic hit a mere 22%. You're on thin ice, Ben.

Averaging the scores of the individual posts also enables us to give a ranking of the authors. The top 20 authors at inkhaven right now are... [drum roll]:

#

Score

Posts included

Author

Best post

1

+2.60σ

10

viv

The phenomenology of being hungry while pregnant

2

+2.52σ

9

Natalie Cargill

How to invent a disease

3

+2.23σ

9

Alec Thompson

More Legal Systems Very Different From Ours 1

4

+1.82σ

9

Aaron Gertler

Posts I Will Not Be Writing

5

+1.56σ

9

Steven K

Prosaic License

6

+1.34σ

9

Katja Grace

Eggs, rooms, puzzles, and talking about AI

7

+1.06σ

9

capsuletime

Fuck Blogging

8

+1.05σ

10

Kevin Z Wu

(box|bag) in (box|bag) in (box|bag)

9

+1.03σ

9

Austen

Forgotten 18th Century Chinese Republics

10

+0.82σ

10

Drew Schorno

2035

11

+0.68σ

9

Justis Mills (Writing Advisor)

Why No Wheel Bus Again?

12

+0.58σ

9

Bill Jackson

Two critiques of Rethink Priorities' Moral Weights project

13

+0.49σ

9

Lawrence Chan

We're actually running out of benchmarks to upper bound AI capabilities

14

+0.44σ

9

Avi

The Smell

15

+0.43σ

9

Derek Razo

How to Pay to Change the Law

16

+0.37σ

9

conq

19th century poet UTTERLY DESTROYS critics (NO MERCY!)

17

+0.32σ

9

Alicorn (Writing Advisor)

Dogs Are Rude

18

+0.31σ

6

Remy

You Know What They Say About Assuming

19

+0.29σ

7

Layla Hughes

Every Lighthaven Writing Residency

20

+0.22σ

9

Henry Stanley

Inkhavening

I should probably note that I filtered out anyone who hadn't published posts on at least 2/3rds of the days, so Vishal Prasad (+1.89σ, 2 posts), Robert Mushkatblat (+1.41σ, 4 posts), A.G.G Liu (+0.7σ, 1 post), Justin Kuiper (+0.32σ, 1 post) and Georgia Ray (+0.3σ, 1 post) didn't make the cut on number, despite having the quality.

Alexander Wales (-0.23σ, 4 posts), whose post inspired this one, is also, sadly, left out of the rankings. (Sorry).



Discuss

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top