How to Integrate an LLM in a VS Code Extension? (Beginner Friendly Guide with Code)

Learn how to add AI inside your VSCode extension : first using the built-in vscode.lm API, then by calling external LLMs like OpenAI and Claude directly.

Why should you care?

Every developer uses VS Code. And almost every developer now uses some kind of AI tool. But here is the thing — you don’t have to wait for someone else to build the AI tool you need. You can build your own.

Maybe you want a small button that explains selected code. Maybe you want to ask questions about a file without leaving the editor. Maybe your team has a private LLM and you want to use it nicely.

A VS Code extension lets you do all of this. And it is not as hard as it looks.

In this blog I will walk you through:

What a VS Code extension really is
Mathod 1: The modern vscode.lm API (uses GitHub Copilot, no API key needed)
Method 2: Calling external LLMs like OpenAI, Claude, or Ollama directly
How to choose between the two
Some tips I wish I knew earlier

Let’s go.

First, the basics

What is a VS Code extension?

Think of VS Code as a house. It is nice, but kind of empty. An extension is like furniture you add to the house. Some furniture makes the house prettier (themes). Some makes it more useful (linters, formatters). Some adds a new room (like a chat panel).

Under the hood, a VS Code extension is just a Node.js program. You write JavaScript or TypeScript, VS Code runs it. That’s it.

What is an LLM?

LLM means Large Language Model. These are the AI models behind ChatGPT, Claude, Gemini, and similar tools. For our purpose, you can think of an LLM as a function:

you send text → LLM thinks → you get text back

Two ways to use LLMs in your extension

This is the most important part of the blog. There are two main ways to add an LLM to your extension, and many tutorials online only show one of them.

Method 1 — vscode.lm API (the built-in way): VS Code itself gives you an API to talk to language models. This API uses the user's GitHub Copilot subscription. You don't need to manage API keys. You don't pay for anything. This was added in July 2024 and it is the official modern approach.

Method 2 — External API (the direct way): You call OpenAI, Anthropic, or any other LLM provider directly from your extension. You (or your user) need an API key. You have full control over which model to use, including local ones like Ollama.

Which one is better? It depends. I will cover both, and at the end I’ll help you decide.

What you need before starting

Make sure you have these installed:

Node.js (version 18 or above)
VS Code (obviously)
Yeoman and the VS Code generator — these help create the starter project

Run this in your terminal:

npm install -g yo generator-code

Step 1: Create the extension project

Open your terminal and run:

yo code

It will ask you some questions. Here is what I usually pick:

Type of extension: New Extension (TypeScript)
Name: mule-to-graphql
Identifier: mule-to-graphql
Description: AI helper inside VS Code
Bundler: webpack
Package manager: npm

After a minute, you will have a folder with your new extension. Open it in VS Code:

cd mule-to-graphql
code .

Open src/extension.ts. You will see something like this:

import * as vscode from 'vscode';

export function activate(context: vscode.ExtensionContext) {
  const disposable = vscode.commands.registerCommand('mule-to-graphql.helloWorld', () => {
    vscode.window.showInformationMessage('Hello World from mule-to-graphql!');
  });
  context.subscriptions.push(disposable);
}
export function deactivate() {}

This is already a working extension. Press F5 to run it. A new VS Code window opens. Press Ctrl+Shift+P (or Cmd+Shift+P on Mac), type "Hello World", and hit enter. You should see the message.

Now let’s make it useful.

Method 1: Using the Built-In vscode.lm API

This is the modern, official way. No API keys. No billing setup. The user just needs GitHub Copilot installed.

How it works (in simple words)

Your extension asks VS Code: “give me a chat model”
VS Code asks the user: “this extension wants to use your Copilot, ok?”
User clicks allow
Your extension sends messages, gets a response
Done

The first time each user uses your extension, they will see a consent popup. After that, it just works.

Step 2: Add a command that uses vscode.lm

First, update package.json to add a new command:

"contributes": {
  "commands": [
    {
      "command": "mule-to-graphql.explainWithCopilot",
      "title": "Mule to graphql: Explain Code (using Copilot)"
    }
  ]
}

Now replace the code in src/extension.ts:

import * as vscode from 'vscode';

export function activate(context: vscode.ExtensionContext) {
  const explainCmd = vscode.commands.registerCommand('mule-to-graphql.explainWithCopilot', async () => {
    const editor = vscode.window.activeTextEditor;
    if (!editor) {
      vscode.window.showWarningMessage('Please open a file first.');
      return;
    }
    const selectedText = editor.document.getText(editor.selection);
    if (!selectedText) {
      vscode.window.showWarningMessage('Please select some code first.');
      return;
    }
    // ask VS Code for a chat model
    const [model] = await vscode.lm.selectChatModels({
      vendor: 'copilot',
      family: 'gpt-4o'
    });
    if (!model) {
      vscode.window.showErrorMessage('No language model found. Is Copilot installed and signed in?');
      return;
    }
    // build the messages to send
    const messages = [
      vscode.LanguageModelChatMessage.User(
        'You are a coding helper. Explain code in simple, short words. Use bullet points when useful.'
      ),
      vscode.LanguageModelChatMessage.User(`Please explain this code:\n\n${selectedText}`)
    ];
    // show a progress popup while we wait
    vscode.window.withProgress({
      location: vscode.ProgressLocation.Notification,
      title: 'Asking Copilot...',
      cancellable: true
    }, async (progress, token) => {
      try {
        const response = await model.sendRequest(messages, {}, token);
        // the response comes in as small pieces (streaming)
        // we collect them into one string
        let fullText = '';
        for await (const chunk of response.text) {
          fullText += chunk;
        }
        // open the answer in a new tab as markdown
        const doc = await vscode.workspace.openTextDocument({
          content: fullText,
          language: 'markdown'
        });
        vscode.window.showTextDocument(doc, { preview: true });
      } catch (err) {
        if (err instanceof vscode.LanguageModelError) {
          vscode.window.showErrorMessage(`LLM error: ${err.message}`);
        } else {
          vscode.window.showErrorMessage(`Something went wrong: ${(err as Error).message}`);
        }
      }
    });
  });
  context.subscriptions.push(explainCmd);
}
export function deactivate() {}

Let me explain what is happening here, line by line (the important bits):

vscode.lm.selectChatModels(...) This asks VS Code for a model. We asked for vendor: 'copilot' and family: 'gpt-4o'. If no matching model is found, the array will be empty.

vscode.LanguageModelChatMessage.User(...) This creates a message from the user's side. The first one acts like a system prompt telling the model how to behave. The second one is the actual request.

model.sendRequest(...) Sends the messages. Returns a response object where response.text is an async iterable. You read pieces one by one — this is called streaming.

Streaming loop:

for await (const chunk of response.text) {
  fullText += chunk;
}

Here I just collect all chunks into one string. But if you want a nicer experience, you can show text piece by piece as it arrives. More on that later.

Step 3: Test it

Press F5. In the new window, open a file, select some code, and run the command "Mule to graphql: Explain Code (using Copilot)".

The very first time, VS Code will show a popup asking for consent. Click allow. Then you will see the progress notification, and after a few seconds, a new tab with the explanation.

That is it. You built an AI extension without writing a single line of authentication code.

A small tip: showing streamed text nicely

Instead of waiting for the full answer, you can update a document as text arrives. Here is a rough idea:

// create an empty markdown doc first
const doc = await vscode.workspace.openTextDocument({ content: '', language: 'markdown' });
const shownDoc = await vscode.window.showTextDocument(doc);

// then append chunks as they come in
for await (const chunk of response.text) {
  await shownDoc.edit(editBuilder => {
    const lastLine = doc.lineCount - 1;
    const lastCol = doc.lineAt(lastLine).text.length;
    editBuilder.insert(new vscode.Position(lastLine, lastCol), chunk);
  });
}

It feels much faster even though the total time is the same.

Method 2: Calling an External LLM API

Now let’s look at the other way. This one is useful when:

You don’t want to depend on the user having Copilot
You want to use a specific model like Claude or a local Ollama model
You are building something for your company’s private LLM
You want to bring your own API key

Step 4: Install the SDK

For OpenAI:

npm install openai

Step 5: Write the helper

I like to keep API calls in their own file. Create src/llm.ts:

import OpenAI from 'openai';

export async function explainWithAI(code: string, apiKey: string): Promise<string> {
  const client = new OpenAI({ apiKey });
  const response = await client.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      {
        role: 'system',
        content: 'You are a helpful coding assistant. Explain code in simple words. Keep it short.'
      },
      {
        role: 'user',
        content: `Explain this code:\n\n${code}`
      }
    ]
  });
  return response.choices[0].message.content ?? 'No response.';
}

Step 6: Add the command

Add a new command in package.json:

{
  "command": "mule-to-graphql.explainWithOpenAI",
  "title": "Mule to graphql: Explain Code (using OpenAI)"
}

Now add the handler in src/extension.ts:

import { explainWithAI } from './llm';
// ... inside activate():
const openAiCmd = vscode.commands.registerCommand('mule-to-graphql.explainWithOpenAI', async () => {
  const editor = vscode.window.activeTextEditor;
  if (!editor) return;
  const selectedText = editor.document.getText(editor.selection);
  if (!selectedText) {
    vscode.window.showWarningMessage('Please select some code first.');
    return;
  }
  // safer way to store API keys - uses the OS keychain
  let apiKey = await context.secrets.get('muleToGraphql.openaiKey');
  if (!apiKey) {
    // ask the user to enter their key (only the first time)
    apiKey = await vscode.window.showInputBox({
      prompt: 'Enter your OpenAI API key',
      password: true,
      ignoreFocusOut: true
    });
    if (!apiKey) {
      vscode.window.showErrorMessage('API key is required.');
      return;
    }
    // save it for next time
    await context.secrets.store('muleToGraphql.openaiKey', apiKey);
  }
  vscode.window.withProgress({
    location: vscode.ProgressLocation.Notification,
    title: 'Thinking...',
    cancellable: false
  }, async () => {
    try {
      const answer = await explainWithAI(selectedText, apiKey!);
      const doc = await vscode.workspace.openTextDocument({
        content: answer,
        language: 'markdown'
      });
      vscode.window.showTextDocument(doc, { preview: true });
    } catch (err) {
      vscode.window.showErrorMessage(`API error: ${(err as Error).message}`);
    }
  });
});
context.subscriptions.push(openAiCmd);

Notice a few things I did:

Used context.secrets.store() instead of regular settings. This saves keys in the OS keychain (macOS Keychain, Windows Credential Manager). Never hardcode secrets and don't put them in plain settings.
Asked for the key only once using showInputBox with password: true. Next time, we read it from storage silently.
Wrapped the API call in try/catch. Networks fail, keys expire — handle it.

Using Claude instead of OpenAI

Install the Anthropic SDK:

npm install @anthropic-ai/sdk

And the helper looks almost the same:

import Anthropic from '@anthropic-ai/sdk';

export async function explainWithClaude(code: string, apiKey: string): Promise<string> {
  const client = new Anthropic({ apiKey });
  const msg = await client.messages.create({
    model: 'claude-sonnet-4-5',
    max_tokens: 1024,
    messages: [
      { role: 'user', content: `Explain this code in simple words:\n\n${code}` }
    ]
  });
  // Claude's response is in a content array
  const block = msg.content[0];
  return block.type === 'text' ? block.text : 'No text response.';
}

Using a local model (Ollama)

If you run Ollama locally, you don’t need any SDK. Just use fetch:

export async function explainWithOllama(code: string): Promise<string> {
  const res = await fetch('http://localhost:11434/api/generate', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      model: 'llama3',
      prompt: `Explain this code:\n\n${code}`,
      stream: false
    })
  });

if (!res.ok) {
    throw new Error(`Ollama returned status ${res.status}`);
  }
  const data = await res.json();
  return data.response;
}

Same pattern every time: send text, get text.

So which one should I use?

Here is a simple table to help you choose:

Pick vscode.lm if…

Your users are developers who already use GitHub Copilot
You don’t want to deal with API keys, billing, or rate limits
You want the fastest path to a working AI extension
You want your extension to feel native to VS Code
You are okay with the models Copilot supports (GPT-4o, Claude, etc.)

Pick an External API if…

Your users do not have Copilot (or you can’t assume they do)
You need a specific model — for example, a fine-tuned one or a local Ollama model
You are building for a company with a private LLM
You want full control over which model runs, how much it costs, and how it behaves
You want to offer “bring your own key” so power users can use their own OpenAI or Claude account

My honest advice: If you are building for the general public and your users are likely to have Copilot, go with vscode.lm. It is less work and feels more native. If you are building for a specific team, an internal tool, or want model flexibility, go with the external API.

You can also do both — let users pick in settings. That is what many popular extensions do.

Things I learned the hard way

Some things nobody tells you until you trip over them:

Token limits are real. If the user selects a 2000-line file, your request will either fail or cost a lot. Add a check for length. I usually cut off around 8000 characters and warn the user.

Streaming feels much better. Waiting 10 seconds for the full answer feels slow. Showing tokens as they arrive feels instant. Both ways (vscode.lm and external APIs) support streaming, use it.

Cache when you can. If a user runs the same question twice, don’t pay for it twice. A simple Map keyed by the input text works fine for a start.

Handle offline gracefully. The internet dies. A clear message like “Cannot reach the AI service, check your connection” is better than a red error with a stack trace.

Test your prompt. The biggest difference between an extension that feels smart and one that feels dumb is the system prompt. Spend time on it. Try different versions.

With vscode.lm, always check if the model exists. Not every user has the same models. If you ask for gpt-4o and it's not there, selectChatModels returns an empty array. Handle it.

What to build next

Once you have the basic version working, you can grow it in many directions:

Add a sidebar chat panel using the Webview API
Auto-generate doc comments as you type
Create unit tests from a selected function
Translate error messages into plain English
Summarize git diffs before a commit
Register your extension as a Chat Participant in the Copilot chat panel (this is a whole new level and uses vscode.lm under the hood)

Each one is just a different command with a different prompt. The hard part is, getting the LLM call working inside an extension. you already did.

Publishing (when you are ready)

When you want to share your extension:

npm install -g @vscode/vsce
vsce package
vsce publish

You will need a publisher account on the VS Code Marketplace. It is free and takes about ten minutes to set up.

Final thoughts

Building an extension felt scary to me at first. I thought there was some secret VS Code magic I needed to learn. There isn’t. It is just JavaScript, a config file, and one or two APIs.

And once you add an LLM to it — whether through vscode.lm or by calling an API directly — you suddenly have a personal assistant inside the tool you already use every day. That is a really nice place to be.

If you got stuck somewhere, drop a comment. Happy coding.

If this helped you, a clap on Medium means a lot. And if you built something cool with this, I would love to see it.

How to Integrate an LLM in a VS Code Extension? (Beginner Friendly Guide with Code) was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.