Back again with another training problem I keep running into while building dataset slices for smaller LLMs

Hey, I’m back with another one from the pile of model behaviors I’ve been trying to isolate and turn into trainable dataset slices.

This time the problem is reliable JSON extraction from financial-style documents.

I keep seeing the same pattern:

You can prompt a smaller/open model hard enough that it looks good in a demo.
It gives you JSON.
It extracts the right fields.
You think you’re close.

That’s the part that keeps making me think this is not just a prompt problem.

It feels more like a training problem.

A lot of what I’m building right now is around this idea that model quality should be broken into very narrow behaviors and trained directly, instead of hoping a big prompt can hold everything together.

For this one, the behavior is basically:

Can the model stay schema-first, even when the input gets messy?

Not just:
“can it produce JSON once?”

But:

can it keep the same structure every time
can it make success and failure outputs equally predictable

One of the row patterns I’ve been looking at has this kind of training signal built into it:

{ "sample_id": "lane_16_code_json_spec_mode_en_00000001", "assistant_response": "Design notes: - Storage: a local JSON file with explicit load and save steps. - Bad: vague return values. Good: consistent shapes for success and failure." }

What I like about this kind of row is that it does not just show the model a format.

It teaches the rule:

vague output is bad
stable structured output is good

That feels especially relevant for stuff like:

financial statement extraction
invoice parsing

So this is one of the slices I’m working on right now while building out behavior-specific training data.

Curious how other people here think about this.

submitted by /u/JayPatel24_
[link] [comments]

Leave a Comment