The decline in LLM reasoning and catastrophic forgetting might share the same root cause.

When we look at LLMs, we can see them as structures that generate and sustain a consistent reasoning path during inference, based on the specific premises, rules, and context they are given. During the LoRA-based continual learning phase, they act as knowledge structures that constantly reorganize the dependencies between old and new information as premises are updated.

Taking this perspective, I began to suspect that the degradation of reasoning performance and the issue of catastrophic forgetting might actually be two sides of the same coin. I felt that solving one might lead to solving the other, and after trying it out, I tested this idea in two settings.

The core of the issue is that in order to preserve any structure, you have to satisfy the specific conditions required to maintain it.
I formalized this as a minimal model of structural persistence and then tested it in two settings.

LLM reasoning degradation
This is an experiment showing that as contradictory information accumulates within a conversation, it becomes increasingly difficult for an LLM to maintain logical reasoning.

When those contradictions were organized externally—sorting them into what was true before versus what is true now—the performance became much more stable compared to when they were left unorganized.

In other words, the takeaway is that the breakdown may not be caused by the length of the text itself, but rather by the accumulation of unresolved contradictions.

LLM catastrophic forgetting
When teaching an LLM new things, it tends to overwrite old knowledge with new information rather than gradually accumulating it. It is essentially an overwrite process.

The model's coherence breaks down significantly during updates where a change in a premise requires all related knowledge dependent on that premise to be revised.

In our experiments, we found that having the model relearn related knowledge all at once improved the results slightly. However, it remains extremely difficult to update the model while keeping previous knowledge perfectly intact.

My view is that these problems may be substantially reduced not just by better training tricks, but by changing the architecture itself.

---
Sorry if the English is a little awkward—this was originally written in Japanese.

submitted by /u/IndividualBluebird80
[link] [comments]

Leave a Comment