Why the Industry Shifted from Traditional ML to LLMs: A Practitioner’s View from Banking

The Old Playbook Is Gone. Here Is What Actually Replaced It.

I spent the first decade of my career in banking watching the same pattern repeat itself. A new business problem surfaces, the data team gets involved, months pass, and eventually a model gets deployed that does exactly one thing reasonably well. Then the requirements change, and the cycle starts over.

In forex operations and international payments, this was especially visible. You had one model for anomaly detection on SWIFT transactions, a separate one for currency risk scoring, another for flagging correspondent banking failures. Each of them was its own engineering project. Maintaining them felt like running a fleet of aging vehicles, each with different mechanics, different fuel requirements, different failure modes.

Then something changed around 2022 and 2023. Not gradually. Quite abruptly.

If you work in AI or data, you have obviously heard about large language models by now. But I want to write about this shift from a specific vantage point, that of someone who has actually lived through both worlds, built systems on both sides, and watched how the ground moved beneath the feet of engineering teams in financial services. Because the hype around LLMs and the practical reality of that shift are two very different conversations, and most articles only tell one of them.

What Traditional ML Actually Looked Like in Practice

The Narrow Task Trap

Let me give you a concrete picture. In trade finance and invoice finance, one common requirement is extracting structured data from uploaded documents, things like invoice numbers, buyer and seller names, payment terms, currency amounts. Before LLMs, a team would typically build a pipeline involving OCR, named entity recognition, regex-based extraction, and a rule layer on top to handle edge cases.

That pipeline would work well enough for a narrow document format. Change the template, add a new geography with different invoice conventions, and you were back to retraining and retuning. The model did not understand the document. It matched patterns in the data it had seen.

In commercial banking, credit underwriting teams wanted models that could flag unusual clauses in loan agreements. Building that required annotated training data, a classification model, and significant legal domain expertise baked into the labeling process. It took time to build, time to validate, and was fragile when the clause types evolved, which they always do.

The True Cost Nobody Talks About

The direct cost of building an ML model is often discussed. The indirect costs usually are not.

Every narrow model needs labeled data. Labeling costs money and time. Every model needs a feature engineering process, which requires someone who understands both the domain and the statistical requirements. Every model needs to be monitored after deployment because data distributions shift, business rules change, regulatory requirements evolve.

In a large bank running dozens of such models across different product lines, the maintenance burden alone becomes a full-time commitment for an entire team. And yet, each model remained brittle. Useful within its lane, but only within its lane.

What Changed When LLMs Arrived

A Different Kind of Capability

Here is what struck me when I first properly experimented with GPT-3 and later GPT-4 in a banking context. These models did not just do one thing. You could give them an invoice in natural language and ask structured questions about it. You could give them a loan agreement and ask whether a specific clause was unusual relative to market standards. You could feed them a SWIFT message and ask them to explain it in plain English for a compliance officer.

None of this required me to label training data. None of it required me to engineer features. I was guiding the model through prompts and getting usable outputs within hours rather than months.

That is not hype. That is a genuine workflow change.

The Pretrained Model Advantage

The reason LLMs can do this is that they have already been trained on vast corpora of text that includes legal documents, financial reports, technical manuals, and much else. The knowledge is already inside the model. What you are doing when you write a prompt or fine-tune is directing that existing capability toward your specific use case.

In traditional ML, you started from scratch for every problem. With LLMs, you start from a foundation that already understands language, context, and a wide range of domain knowledge. You are customizing, not constructing.

The time-to-useful-output shrinks dramatically. A task that previously required a three-month ML project can often be prototyped in a few days using prompt engineering, and deployed with confidence after a few weeks of testing.

A Practical Example from Financial Services

Let me show you something concrete. Below is a simplified Python example of how you might use an LLM via the OpenAI API to extract structured information from an invoice description. This mirrors the kind of task that used to require a dedicated NLP pipeline.

import openai
import json

client = openai.OpenAI(api_key="your-api-key-here")
invoice_text = """
Invoice Number: INV-2024-00784
Date: 15 March 2024
Seller: Meridian Exports Ltd, Mumbai, India
Buyer: GlobalTech GmbH, Frankfurt, Germany
Description: Supply of industrial components - 500 units @ USD 120 each
Total Amount: USD 60,000
Payment Terms: 60 days from date of invoice
Currency: USD
"""
prompt = f"""
You are a trade finance document analyst. Extract the following fields from the invoice text below and return them as a JSON object:
- invoice_number
- invoice_date
- seller_name
- seller_country
- buyer_name
- buyer_country
- total_amount
- currency
- payment_terms_days
Return only the JSON object. No explanation.
Invoice Text:
{invoice_text}
"""
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": prompt}
    ],
    temperature=0
)
raw_output = response.choices[0].message.content
extracted_data = json.loads(raw_output)
print(json.dumps(extracted_data, indent=2))

Output:

{
  "invoice_number": "INV-2024-00784",
  "invoice_date": "15 March 2024",
  "seller_name": "Meridian Exports Ltd",
  "seller_country": "India",
  "buyer_name": "GlobalTech GmbH",
  "buyer_country": "Germany",
  "total_amount": 60000,
  "currency": "USD",
  "payment_terms_days": 60
}

What you are seeing here is the model reading an unstructured text block, understanding its meaning, and returning a cleanly structured JSON output with zero custom training. In a real trade finance system, you would feed this into a downstream validation layer, check the payment terms against credit limits, flag the cross-border transaction for compliance review, and so on.

A traditional ML approach to the same problem would have required a named entity recognition model trained on annotated invoice data, a separate date normalization step, a regex layer for currency and amount extraction, and significant engineering to handle variations in invoice format. The LLM approach handles format variation out of the box because it understands language, not just patterns.

Now Let Us Add a Layer: Risk Flagging with Context

Here is a second example. This time, we are checking whether an invoice from a cross-border transaction should be flagged for additional due diligence based on common trade-based money laundering indicators.

def check_trade_risk(invoice_data: dict) -> dict:
    risk_prompt = f"""
You are a trade compliance analyst at an international bank.
Review the following invoice details and assess whether any trade-based money laundering 
or fraud risk indicators are present. Consider factors like:
- Unusual payment terms for the transaction size
- Mismatched buyer/seller country risk levels
- Round number amounts that may indicate price manipulation
- Missing or incomplete counterparty information
Invoice Details:
{json.dumps(invoice_data, indent=2)}
Return a JSON object with:
- risk_level: one of LOW, MEDIUM, HIGH
- flags: list of specific concerns (empty list if none)
- recommendation: a one-line action recommendation
Return only the JSON object.
"""
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "user", "content": risk_prompt}
        ],
        temperature=0
    )
    return json.loads(response.choices[0].message.content)

risk_result = check_trade_risk(extracted_data)
print(json.dumps(risk_result, indent=2))

Output:

{
  "risk_level": "MEDIUM",
  "flags": [
    "Round number transaction amount (USD 60,000) may warrant additional verification",
    "60-day payment terms on a USD 60,000 cross-border transaction is within normal range but should be validated against buyer credit profile",
    "India to Germany corridor: standard risk, no elevated country risk flags"
  ],
  "recommendation": "Proceed with standard KYC verification; validate buyer credit history and confirm goods description matches HS code classification."
}

This is the kind of contextual, reasoning-based output that traditional ML models cannot produce. A classification model might have given you a binary fraud/not-fraud label. The LLM gives you a risk level, the specific reasons behind it, and a recommended action. That is the difference between a pattern matcher and a reasoning system.

Does This Mean Traditional ML Is Dead?

No. And anyone who says otherwise is simplifying the story to make it sharper than it is.

Traditional ML is still the right tool for a specific class of problems. If you are building a real-time fraud scoring engine that needs to evaluate thousands of transactions per second with sub-millisecond latency, you are not going to route that through an LLM API. A gradient boosted tree or a neural network with a fixed feature set will outperform on speed and cost at that scale.

If you are doing time series forecasting on FX rates using proprietary trading data, structured ML methods remain competitive. If you need a model that is fully explainable to a regulator, the interpretability of traditional models is still a genuine advantage.

The shift is not about replacement. It is about the default starting point. When a new problem lands on your team’s desk, the first question used to be: what data do I need to train a model? The question now is: can a pretrained LLM solve this with the right prompting or fine-tuning? Often the answer is yes, and that changes the economics of the whole project significantly.

What This Means for People in Financial Services

If you are working in banking, payments, or any adjacent domain and you have not yet built something with an LLM, the gap between where you are and where the industry is heading is growing quietly every quarter.

The practical skills that matter right now are not exotic. You need to understand prompt engineering well enough to get reliable, structured outputs from a model. You need to know when to fine-tune versus when to prompt. You need to understand retrieval-augmented generation, which allows you to connect LLMs to your proprietary data without exposing it to external training processes. And you need to think clearly about evaluation, because LLMs can sound confident while being wrong, and building robust evaluation frameworks is one of the less glamorous but most important parts of deploying them responsibly.

None of this requires a PhD. It requires curiosity, hands-on practice, and a willingness to rethink some assumptions about how AI projects get built.

Final Thoughts

The shift from traditional ML to LLMs is not a technology story. It is an engineering economics story.

Traditional ML built powerful tools, but the cost of entry was high and the returns were narrow. You needed the right data, the right expertise, the right time horizon, and you got something that did one job well. LLMs lowered the cost of entry dramatically and expanded the scope of what is possible from a single model. That changes what a small team can build. It changes what a bank or a fintech can prototype in a quarter. It changes the baseline expectation of what AI can do for a business problem.

I have seen this shift play out across forex operations, payments reconciliation, compliance reporting, and document processing. The teams that are moving fastest are not the ones with the biggest budgets. They are the ones who understood early that the workflow had changed and started building with the new tools without waiting for a perfect strategy.

The old playbook served us well. But you would not navigate an international payment with a fax machine either.

What has your experience been with transitioning from traditional ML workflows to LLM-based approaches? If you have worked in financial services or any regulated industry, I would be particularly interested in how you have handled explainability and compliance requirements alongside LLM adoption. Share your experiences in the comments below.

If you found value in this, I would really appreciate your support. A like helps this reach more people, a comment starts meaningful conversations, and a share can help someone else who is trying to understand AI but does not know where to start.

If you enjoy content that simplifies complex concepts into practical insights, consider following me. I am consistently working on breaking down data science, AI, and real-world systems in a way that is useful, relatable, and easy to apply.

Why the Industry Shifted from Traditional ML to LLMs: A Practitioner’s View from Banking was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.