APIs in the Age of AI Agents

Why Every AI Builder Needs to Understand the Basics

Image created by the author using chatGPT.

Introduction

Software systems require defined interfaces to exchange data and invoke functionality. An application programming interface (API) provides that interface. In AI systems, APIs connect language models, databases, and external tools.

Agent-based systems rely on repeated interactions with such interfaces. An agent receives a task, decomposes it, retrieves context, invokes tools, and returns a result. Each of these steps may involve one or more API calls. In this setting, APIs define how components interact and how work is executed.

If you are building agentic systems, orchestrating tools, or deploying LLM powered systems, understanding APIs is crucial as it is an essential part of the architecture.

Why You Should Care

AI Agents Are the Driver

More and more, as agentic workflows keep maturing and impacting industries, it is not just about larger models. It is more and more about orchestration of systems.

A typical agent loop includes task evaluation, tool selection, context retrieval, and response generation. Each tool invocation is a call to a defined interface. When the tool is remote, the interface is usually an HTTP API. You can read more about the agentic loop here.

The agent tool loop concept. Inside of an AI agent, there is an LLM inside of a loop that makes decisions based on the consequences of actions (tools) to achieve a task. Each iteration the context is updated with the status of the task and the result of the action. Image created by the author.

At the core, an AI agent typically operates in a cycle when receives a task:

1. Checks if the task is completed
2. Breaks it into steps
3. Calls tools
4. Retrieves context
5. Updates its reasoning
6. Returns a result

Consider an agent that summarizes clinical notes, queries a vector database, computes a risk score, writes results to a database, and triggers a notification. If any of these components are external services, the agent must coordinate multiple API calls. The behavior of the system depends on latency, error handling, authentication, and response structure across those calls.

Understanding APIs allows a developer to measure and control latency across chained requests, handle partial failures, restrict access to sensitive data, and limit unnecessary model invocations. These concerns determine whether a system operates reliably under load.

Without API literacy, you can build prototypes. With API literacy, you can build production systems.

The basics: What Is an API

API stands for Application Programming Interface. It is a structured contract that allows one software system to communicate with another, a client and a server. The client sends a request in a defined format. The server returns a response in a defined format.

In HTTP-based APIs, the main elements are endpoint, method, headers, request body, and response.

1. Endpoint

An endpoint is a URL that identifies a resource or capability. For example, POST /v1/summarizeor GET /v1/patients/123.

Example:

POST /v1/summarize
GET /v1/patients/123
POST /v1/risk-assessment

In agent systems, endpoints often represent tools.

2. HTTP Method

The HTTP method indicates the action.

GET retrieves information without changing state.
POST sends data to be processed. Most AI inference calls use POST.
PUT updates existing resources.
DELETE removes resources.

Agents usually rely heavily on POST because they send structured inputs to models or tools.

3. Headers

Headers contain metadata about the request.

Common headers:

- Content-Typedefines the format of the payload, usually application json.
- Authorizationcarries credentials such as API keys or tokens.

Example:

Content-Type: application/json
Authorization: Bearer sk-123456

Headers are critical in regulated systems where traceability and authentication matter.

4. Request Body

This is the actual data being sent. Example for a summarization model:

{
  "text": "Patient presents with bleeding on probing...",
  "max_tokens": 200,
  "temperature": 0.2
}

The body contains both task data and configuration parameters.

5. Response

The response contains structured output. Well-designed APIs return predictable JSON objects so that agents can parse results without ambiguity.

{
  "summary": "Moderate periodontal risk with localized inflammation.",
  "tokens_used": 145
}

In this way agents can reliably parse the output.

A complete request includes all of these elements. The server returns an HTTP status code and a response body. A status code of 200 indicates success. Other codes indicate client or server errors.

Putting It All Together

Here is what a complete request looks like conceptually:

POST /v1/summarize HTTP/1.1
Host: api.example.com
Content-Type: application/json
Authorization: Bearer sk-123456

{
  "text": "Patient presents with bleeding on probing...",
  "max_tokens": 200,
  "temperature": 0.2
}

And the response:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "summary": "Moderate periodontal risk with localized inflammation.",
  "tokens_used": 145
}

What happens when the service requested takes longer than a few seconds?

Synchronous and Asynchronous requests

In a synchronous request, the client waits for the server to respond. This model is simple and works for short operations such as small inference calls.

In an asynchronous request, the client submits a job and continues execution. The result is retrieved later through polling, callbacks, or a message queue. This model is used for long-running tasks such as processing large documents or coordinating multiple steps.

Agent systems often mix both approaches. Short tool calls may be synchronous. Long-running processes are typically asynchronous to avoid blocking the entire system.

Communication Protocols: REST, WebSockets, and WebRTC

Most APIs use HTTP with a request — response model, often described as REST. But AI agents do more than single request-response calls. They stream tokens, maintain sessions, coordinate voice conversations, and push updates in real time. Protocol choice matters.

REST

REST stands for Representational State Transfer. It is an architectural style built on top of HTTP. REST APIs are stateless: each request contains all necessary information. In practice, REST means:

- HTTP based endpoints
- Stateless communication
- Request and response model
- Structured data exchange, usually JSON

Each request is independent. The server does not store session state unless explicitly designed to.

REST is ideal for:
- Model inference calls
- Structured tool invocation
- Retrieving records from a database
- Triggering background jobs

If your agent calls POST /v1/risk-assessment and waits for a result, REST is the right tool. REST is simple, scalable, and widely supported.

WebSockets

WebSockets enable persistent, bidirectional communication between client and server. Unlike REST, instead of sending the request and receiving the response and terminating the transaction, the connection remains open. Both client and server can send messages at any time.

This is ideal for:

- Streaming LLM tokens
- Real time chat interfaces
- Progress updates for long running tasks
- Live dashboards

Instead of waiting for the full model output, tokens stream incrementally. WebSockets improve user experience by reducing perceived latency. However, they require connection management, careful scaling strategies and load balancing considerations. They are powerful but add architectural complexity.

WebRTC

WebRTC stands for Web Real Time Communication. It is designed for peer to peer real time audio, video, and data exchange. WebRTC is optimized for low latency, adaptive bitrate and media streaming. As you can see, this protocol shines for

- Voice based AI agents
- AI powered telehealth
- Real time conversational systems
- Interactive tutoring platforms

If you are building a voice assistant that processes speech and responds instantly, WebRTC is often the correct choice. Note that it is not designed for typical JSON based inference calls.

Choosing the Right Protocol

A simple guide to decide what protocol to use. Many systems use more than one protocol. A system may use REST for tool invocation, WebSockets for streaming responses, and WebRTC for audio interaction.

Authentication and Authorization

APIs require a method to verify the identity of the caller and determine what actions are permitted for each caller.

For example, for agents, this determines which agent is allowed to call which tool or which user can access which patient record or which backend service can invoke which model.

In AI agent systems, authentication is part of architecture. Let us unpack the major methods.

1. Bearer Tokens

The Simplest Model of Trust

Bearer tokens are a common method. The client includes a token in the Authorization header. The server validates the token and grants access if it is valid. You often see this:

Authorization: Bearer sk-123456

What does Bearer mean? It literally means, whoever bears this token is allowed access. The token itself is the proof. There is no additional identity check in the request. If the token is valid, access is granted.

Think of a backstage concert pass. If you hold the pass, security lets you in. They do not ask your name. The pass is enough. But if someone steals your pass, they can enter as you. That is the tradeoff.

Bearer tokens are often popular because they are:

Simple
Stateless
Easy to validate
Easy to scale

This is why model providers and internal microservices commonly use them. This method of authentication is often chosen for

server to server communication
Internal tools
Model inference endpoints
Systems without complex user roles

Limitation

They do not inherently define permissions or identity. They only validate possession. If you need fine grained control or user delegation, bearer tokens alone are not enough.

2. API Keys in Custom Headers

A Variation of Bearer

API keys are similar. They are often sent in custom headers such as x-api-key. They are used in public or metered APIs. Example:

x-api-key: abc123

This is conceptually similar to bearer tokens. The main difference is convention. Using Authorization with Bearer follows standardized HTTP semantics and integrates better with middleware and security tooling.

API keys are often used for:

Public APIs
Rate limited services
Metered usage

They share the same weakness as bearer tokens: If leaked, access is compromised.

3. Basic Authentication

Username and Password Over HTTP

Basic authentication sends a username and password with each request. It is rarely used in new systems.

Basic authentication sends:

Authorization: Basic base64(username:password)

This is an older method and not used any more. It is like writing your username and password on a note, encoding it, and attaching it to every request. Even though it is usually protected by HTTPS, it is not ideal for modern distributed systems. However, it is still used in legacy systems or internal low risk environments. It is rarely recommended for AI agent architectures.

4. OAuth2

Delegated Authorization

OAuth stands for Open Authorization. This method is an authorization framework that allows a user to grant limited access to an application without sharing credentials. It is not just authentication. It is an authorization framework.

An authorization server issues tokens with defined scopes. These scopes specify what actions are allowed. This model is used in systems where users delegate access to applications.

Why It Exists

Imagine you want an AI app to access your Google Drive. You do not give the app your Google password. Instead you log into Google, Google asks if you approve access, Google gives the app a token and then the app uses that token to access only allowed resources.

What Makes OAuth Different

Bearer tokens prove possession. OAuth defines permission scopes and delegation. Scopes define what is allowed. For example, imagine we have

read:patients
write:notes
invoke:risk-model

In this case, we have a user that wants to calculate the risk for a patient to develop certain condition:

User → Client App → Authorization Server → Resource Server

In an OAuth setting, it would look like this

User        Client App       Auth Server       API Server
 |              |                |                |
 | Login        |                |                |
 |------------->|                |                |
 |              | Request token  |                |
 |              |--------------->|                |
 |              |                | Issue token    |
 |              |<---------------|                |
 |              | Call API with token             |
 |              |-------------------------------> |
 |              |                | Validate token |
 |              |                |--------------> |
 |              |                |<-------------- |
 |              |<------------------------------- |

In this way you can make sure that the right person is identified and authortized to use only the designated resources.

5. JWT

JSON Web Tokens Explained

JWT stands for JSON Web Token. Why JSON? Because the token contains structured claims encoded as JSON. Why Web? Because it is designed for web based systems and distributed services. Why Token? Because it represents a verifiable credential.

A JWT has three parts:

Header
Payload
Signature

The payload may include the user ID, role, expiration and permissions.

Example payload conceptually:

{
  "sub": "user123",
  "role": "clinician",
  "exp": 1735689600
}

The signature ensures the token has not been tampered with.

Think of a digitally signed passport. It contains your information. It has a government signature. Anyone who trusts that government can verify it without calling the government every time. That is the key advantage.

This is powerlful because it enables stateless authentication and horizontal scaling in microservice architectures.

The server does not need to check a database every time. It verifies the signature and trusts the claims. Therefore, it is preferred for distributed AI systems, microservices, agentic architectures and role based access control systems.

JWT is often used together with OAuth2. OAuth2 handles delegation. JWT carries the actual claims.

6. Mutual TLS

Certificate Based Trust

Mutual TLS uses certificates to authenticate both client and server. It is used in environments that require strong guarantees about the identity of both parties.

It is like two embassies verifying each other’s official seals before exchanging documents. This method is commonly used for healthcare systems, financial systems, inter organization communication and highly regulated environments. It is strong but operationally complex.

This Matters for AI Agents because not every agent should call every tool. Not every user should access every dataset and not every service should invoke every model.

Bottom line, authentication defines policy. Authorization defines permissions. In agentic systems where different levels of autonomy are part of the system, this becomes crucial.

How to Build a minimal API

A simple API can be implemented with a web framework such as FastAPI. The following example defines an endpoint that computes a risk score from input data and requires a bearer token for access.

Install dependencies:

pip install fastapi uvicorn

Create app.py

The application defines input and output schemas, a function to compute a score, and a route that validates the request and returns a structured response. Running the server exposes an OpenAPI specification and an interactive interface for testing.

from fastapi import FastAPI, Header, HTTPException
from pydantic import BaseModel
from typing import Optional

app = FastAPI()
API_KEY = "secret-key"
class RiskRequest(BaseModel):
    age: int
    smoking: bool
    probing_depth: float

class RiskResponse(BaseModel):
    risk_level: str
    score: float

def simple_risk_model(age, smoking, probing_depth):
    score = probing_depth * 2
    if smoking:
        score += 3
    if age > 50:
        score += 1
    return score

@app.post("/v1/risk-assessment", response_model=RiskResponse)
def assess_risk(request: RiskRequest, authorization: Optional[str] = Header(None)):
    if authorization != f"Bearer {API_KEY}":
        raise HTTPException(status_code=401, detail="Unauthorized")
    score = simple_risk_model(
        request.age,
        request.smoking,
        request.probing_depth
    )
    if score > 8:
        level = "High"
    elif score > 4:
        level = "Moderate"
    else:
        level = "Low"
    return RiskResponse(risk_level=level, score=score)

This type of endpoint can be invoked by an agent as a tool. The agent sends structured input and receives structured output.

Run the server:

uvicorn app:app --reload

Visit:

http://127.0.0.1:8000/docs

FastAPI generates interactive documentation using OpenAPI.

API endpoints can be tested with HTTP clients such as Postman or Bruno before integrating them into an agent system.

Screenshot of Postman. Image edited by the author.

Design Principles for AI Agent APIs

Explicit Schemas for deterministic outputs

Agent systems depend on predictable interfaces. Input and output schemas should be explicit. Responses should be structured so that they can be parsed without additional interpretation.

Prefer structured JSON over ambiguous text when tools must be parsed by agents.

Versioning

Versioning allows changes without breaking existing clients. Routes such as /v1 and /v2 indicate different versions of the API. Use versioned routes such as /v1 and /v2 to evolve safely as development continues or or complexity grows.

Idempotency

Idempotency determines whether repeated requests produce the same result without side effects. Many inference endpoints are designed to be idempotent.

Observability

Observability includes logging request identifiers, latency, and errors. These records are used to trace failures and measure performance.

Separation of Concerns

Separate authentication, business logic, inference, and logging layers. Clean architecture keeps agent systems maintainable as complexity grows.

Conclusion

APIs define how components in an AI system interact. In agent-based systems, most actions involve one or more API calls. The structure of those calls determines how data is exchanged, how failures are handled, and how access is controlled.

Understanding endpoints, protocols, authentication methods, and response formats allows a developer to design systems that operate predictably under load.

References

If you found this post interesting or relevant to your work, please consider leaving a clap or a comment. Thanks 😁

APIs in the Age of AI Agents was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.