The loss curve said tie. The judges said otherwise. Seeking replication for an early LLM training result [R]

TL;DR - I've written two novel functions that shape the training signal for LLMs. Early tests show people prefer responses from models trained with my functions by ~59.9%, but I'm just one guy with one GPU. Hoping someone with more resources can prove me right or wrong.

The functions:

Per-token gain: Each token's loss gets scaled by how surprising it is. Confident-correct tokens coast, surprising ones get amplified, and the average comes out unchanged so total gradient budget is preserved.

Per-layer divergence scaling: Each transformer block's gradients get scaled by how much that block actually changed the representation during the forward pass. Actively-revising layers get amplified, settled layers get attenuated, again normalized so the overall scale is preserved.

The test:

Two 1.2B-parameter language models trained on identical data in identical order, same seed, 30,000 steps, 3.9B tokens. One uses standard cross-entropy. The other uses precision-weighted per-token gain plus per-layer divergence-scaled gradients. Smoothed validation loss between the two were statisticaly identical.

The results:

42 blind judges made 1,181 pairwise judgments. 29 humans (myself plus 28 volunteers, some recruited via r/SampleSize posts, some people I know in person) and 13 foundation-model judges spanning eleven vendors. The gain-trained model was preferred in 59.9% of 784 decisive comparisons. Two-sided binomial p = 2.80e-8 (p calculated under per-judgment independence; per-judge sensitivity below). Ties were 33.6% of total judgments and excluded from the binomial.

Humans and foundation models agreed: 60.5% vs 59.0% decisive gain preference, within 1.5 points of each other. They also agreed on which prompts favored gain: 26 of 32 questions had the same human-majority and FM-majority direction (81.2%), and per-question gain rates correlated at Pearson r = 0.78 between the two types. The direction survives every sensitivity filter I ran (excluding speed-clickers, tie-biased judges, partial completions, the author, and all of the above simultaneously). The strictest filter leaves 26 judges and 832 judgments at 62.9% decisive gain preference.

Limitations:

Single seed at 1.2B, no multi-seed replication. 16.4% of Chinchilla-optimal training. Token gain and layer gain were not ablated separately at 1.2B. A/B prompts are short-form because both models are too undertrained for coherent long-form output. No preregistration.

The repo has the PDF, the method code (two short files), the eval webapp, the per-judge JSON, and the full sensitivity analysis: Precision Weighted Training GitHub Repo

I'm an independent part-time researcher and need a cs.LG endorser to put this on arXiv. If anyone here has prior cs.LG submissions, reads the paper, and feels it meets the bar, an endorsement would be appreciated. Honest pass also fine.

submitted by /u/ScreamingAmish
[link] [comments]

Leave a Comment