Governing AI Agents in Entra ID: Clearing the Legacy Estate

You cannot govern what you cannot classify. And in most enterprise tenants, the largest population of agents was never governed to begin with.

The first post of this series established the foundation: why a complete picture of the agent estate in Entra ID requires querying four distinct identity surfaces, and why the Microsoft Graph API is the right tool for building that observability layer.

This post addresses what you find when you look at that estate in a large enterprise that has been running Copilot Studio or Power Virtual Agents for any meaningful length of time: a significant backlog of agents that predate Microsoft’s blueprint-governed identity model and therefore sit outside the formal governance structure that model provides.

This is not a failure of governance. It is a direct consequence of Microsoft’s own schema evolution. Blueprints were introduced after many organizations were already running production agents. Legacy agents cannot be retroactively assigned a blueprint. They exist as tag-based application registrations — queryable through Surface 4 of the collection strategy (the four identity surfaces are a framework introduced in the first post of this series; Surface 4 covers the full historical estate of tag-based app registrations, Surface 2 covers formally provisioned agent identities).

The question is not how to wish them away. The question is how to work down the debt systematically without disrupting production.

The Nature of the Problem

In a mature enterprise tenant, the tag-based application registration population — agents tagged with AIAgentBuilder, AgentCreatedBy:, power-virtual-agents, and related markers — will typically dwarf the formally typed agent identity population. These registrations are not uniform. Within that legacy population there will be:

  • Active production agents that real users and workflows depend on daily
  • Agents created during pilots or proofs of concept that were never decommissioned
  • Agents from Frontier experiments — Microsoft’s early-access program for tenants piloting pre-release agentic capabilities — that were never published to the Agent 365 registry
  • Duplicates created when agents were rebuilt or republished under a new name
  • Agents whose creating team no longer exists in the organization

Treating all of them the same way — either ignoring them or attempting mass remediation — is the wrong approach. The right approach starts with classification.

Step 1: Classify Before Acting

The data collected through Surface 4 already contains the signals needed to segment the legacy population into actionable buckets. Two fields do most of the work:

  • lastSignInDateTime — null or >90 days indicates dormancy
  • hasOwner — False indicates no accountability anchor recorded in Entra

Combining these produces four segments:

  • Active + owned — lowest immediate risk; periodic review only
  • Active + unowned — moderate risk; ownership assignment is the priority
  • Dormant + owned — schedule a decommissioning conversation with the owner
  • Dormant + unowned — highest cleanup value, lowest disruption risk; act here first

A fifth segment sits outside the active/dormant axis entirely: never activated — agents with no lastSignInDateTime recorded at all. These are not dormant; they were provisioned but never exercised. A dormant agent was once active and has a traceable owner history. A never-activated agent has neither. You cannot confirm whether it was intentionally deployed, whether its permissions are correctly scoped, or whether the provisioning team is still active. Treat never-activated agents as a distinct remediation category: verify intent, confirm permission scoping, and either exercise or decommission within 60 days.

On the 90-day threshold: it is a reasonable default but not universally appropriate. Financial close agents, annual compliance workflows, and seasonal HR processes legitimately operate on longer cadences. Before applying the filter, check whether permission scopes or display name conventions suggest a periodic workflow — AuditLog.Read.All or FinancialData.Read combined with an ownership record are signals that a longer threshold is appropriate. Maintain a formal exception register for agents approved for extended thresholds, reviewed annually.

A less obvious failure mode: agents invoked by other agents or by scheduled platform jobs where sign-in telemetry lands on the platform service principal rather than on the agent registration itself. An agent that appears to have no sign-in activity for 90 days may in fact be active — its activity is simply invisible at the registration level. Step 4 covers this in more detail, but the practical implication for the 90-day filter is that lastSignInDateTime on the agent's app registration is a necessary signal, not a sufficient one. Before flagging an agent as dormant, cross-reference against the platform SP's sign-in activity for the same period. An agent whose registration shows no activity but whose platform SP shows consistent activity is active, not dormant, and should not enter the decommissioning pipeline.

Step 2: Decommission Dormant Unowned Agents First

The dormant and unowned segment is the easiest win — no active users, no owner to consult, and the lowest risk of disrupting a production workflow.

Stage 1 — Qualify the candidate. Confirm all three conditions:

  • lastSignInDateTime is null or older than 90 days
  • hasOwner = False in Entra
  • createdDateTime is older than 180 days

If any condition fails, the agent moves to a different path — active unowned agents go to Step 3, recently provisioned agents go to the never-activated category.

Stage 2 — Disable and monitor. Set accountEnabled = False. This prevents new sign-ins while preserving the object and its audit history. Run a 30-day monitoring window, querying sign-in activity against the agent's appId.

One operational note: Entra’s signInActivity has documented reporting lag of up to 24 hours. For automated pipelines, query both signInActivity and auditLogs/signIns at deletion time rather than relying on the scheduled collection run alone, and add a 48-hour pre-deletion hold with a final sign-in check as a further safety layer.

On the window length: the common advice is to enforce 33 days internally while communicating 30 externally. This is defensible once, but if a team arrives on day 31 with a valid claim and you honor it — which you should — the effective window becomes 33 days in practice and stakeholders will treat it as the floor going forward. The cleaner approach is to commit to 30 days as the external policy — the number stakeholders see in notifications, communications, and policy documents — and enforce 33 days internally in the automation pipeline as a buffer against telemetry lag. The distinction matters: 33 days is an implementation detail for whoever builds the deletion job, not a commitment made to business owners or environment admins.

  • If a sign-in occurs during the window: re-enable, move to active unowned path
  • If no sign-in occurs by the end of the window: proceed to Stage 3

Stage 3 — Delete. The object has been confirmed dormant, unowned, and non-responsive to the disable period. Deletion is the correct outcome.

Step 3: Resolve Ownership on Active Unowned Agents

The notes field is the first lookup — for many Copilot Studio auto-registered agents it contains the creator's UPN set at provisioning time, though not all agents populate this field.

Where that field is empty or the creator has left the organization, ownership resolution becomes the hardest step in the program and the one most likely to stall it. Four concrete heuristics narrow the search:

  • Display name conventions — team codes embedded in agent names (FIN-, HR-, MKTG-) often route directly to the right group without further investigation
  • Graph permission scopes — Calendars.ReadWrite and Mail.Send points to a scheduling workflow; Files.ReadWrite.All on a specific SharePoint site points to document processing. Scopes imply the consuming workflow even when the owning team is unknown
  • Sign-in IP and location — lastSignInIp can narrow the originating business unit in organizations with consistent network topology
  • createdDateTime correlation — cross-referencing creation date against ITSM change records or Copilot Studio environment audit logs can surface the deployment event that created the agent

When none of these signals resolve to a clear owner, the agent enters a time-boxed challenge process. This is the most politically sensitive step in the program.

Who to notify — reach four audiences simultaneously, in priority order:

  1. The creating environment’s admin group — but verify the environment still exists and has active admins. For the legacy population, deprecated environments with no active admins are common. If the environment is decommissioned, skip directly to broader channels.
  2. The IT or identity governance distribution list
  3. The business unit mapped to the agent’s last known sign-in IP
  4. A tenant-wide governance announcement channel — newsletter, intranet post, or widely monitored collaboration channel — as the channel of last resort, framing the message as a named-date deletion notice

Notification should include the agent’s display name, appId, createdDateTime, last sign-in date, and a clear 30-day deadline.

What constitutes a valid claim — a responding team must do at least one of:

  • Name the business workflow and an accountable owner willing to be recorded in Entra
  • Provide a change record or approval artifact tracing the agent to an authorized deployment
  • Demonstrate active dependency through sign-in activity during the challenge window

“We think we built this but are not sure what it does” is not a valid claim. The agent proceeds to deletion.

Why executive sponsorship is required — ambiguous claims generate escalation pressure immediately. Without a pre-established mandate, governance teams almost always capitulate. Before the challenge process launches, three things must be in place: a named executive sponsor at VP level or above who has approved the deletion policy and will absorb escalations; a documented policy with a visible approval chain that makes escalation responses a policy reference rather than a judgment call; and a single 30-day extension option for teams demonstrating active investigation — granted once, documented, non-renewable.

Escalations should route to the executive sponsor, not back to the governance team. Once criteria are relaxed for one escalation they become negotiable for all subsequent ones.

After a valid claim — record the owner in Entra and add the agent to the next access review cycle before re-enabling. Re-enabling without both steps converts a resolved challenge into a deferred governance gap.

Step 4: Apply Compensating Controls — Do Not Force Migration

Microsoft has not deprecated the legacy tag-based model. Forcing active production agents through re-provisioning is technically complex, risks breaking dependent workflows, and provides limited marginal governance value if compensating controls can be applied instead.

Govern legacy agents in place:

  • Owner assignment — every active legacy agent should have an identifiable owner recorded in Entra, even if it is a team rather than an individual
  • Access review cycles — include legacy app registrations in periodic Entra access reviews so continued operation is explicitly reaffirmed on a schedule
  • Credential auditing — hasSecret = True is the highest-risk finding in this population. Unlike blueprint-governed agents where credential policy is inherited from the parent template, legacy agents manage credentials independently. Every secret needs a documented holder and a rotation schedule.
  • Federated credential exposure — a separate risk applies to legacy agents re-registered or migrated to Azure AI Foundry. AI Foundry blueprints use federated identity credentials — a secretless model where trusted platform identities exchange tokens directly. These agents show hasSecret = False and hasCertificate = False while remaining active authentication endpoints. The signal to check is federated credentials on the blueprint's app registration, queryable via GET /v1.0/applications/{id}/federatedIdentityCredentials. Governance programs checking only secret and certificate presence will miss this surface entirely.
  • Sign-in monitoring — track lastSignInDateTime week-over-week. An agent that becomes dormant is a decommissioning candidate. Note that sign-in activity on the agent's app registration does not represent the full access picture: for legacy Copilot Studio and PVA agents, effective API access flows through the platform service principal rather than through the agent registration itself. Correlating agent sign-in data with platform SP activity is the more complete audit approach, though no single API surface exposes that correlation today.

What this step implies for the overall program: compensating controls are not a path to zero legacy agents — they are a path to a stable, governed legacy population. The goal is not elimination of the tag-based estate but elimination of the ungoverned tag-based estate. A target of “zero ungoverned legacy agents” is achievable and defensible to any audit function. “Zero legacy agents” is neither.

Step 5: Freeze New Creation Outside the Blueprint Flow

The legacy estate is a bounded problem if you stop it from growing. Every new agent provisioned outside the blueprint-governed flow adds to the backlog.

The highest-priority control: admin consent at creation time. Requiring admin consent for any new app registration that requests Graph permissions above a defined threshold forces the governance conversation before the agent exists — before sign-in history accumulates, before a production dependency forms. Configure this under Enterprise Applications → Consent and permissions → User consent settings. A reasonable threshold for agentic workloads is any *.All scope or high-privilege permission: Mail.ReadWrite, Files.ReadWrite.All, Calendars.ReadWrite, Directory.ReadWrite.All.

For reviewer assignment, avoid centralizing all decisions in the identity governance team — in most mid-size organizations that team is two people and will become the bottleneck that kills adoption. A delegation model works better: business-unit reviewers handle routine scopes (Mail.Read, Files.Read); identity governance countersigns anything above the high-privilege threshold. This keeps requests moving without concentrating every decision in a team that cannot absorb the volume.

Pair admin consent with an ownership-at-creation requirement — no app registration passes the admin consent review without a named owner recorded in Entra. This single requirement, applied consistently, eliminates the ownership resolution problem in Step 3 for all future registrations. The legacy estate exists precisely because this requirement was not in place when those agents were created.

Additional controls:

  • Environment routing policies in the Copilot Studio admin center — restrict publication to managed environments with explicit DLP policies
  • Maker permissions — scope creation to approved groups or roles
  • App registration restriction — set “Users can register applications” to No in Entra ID to prevent ad hoc registrations that would appear on Surface 4
  • Alert on new agentic tags — the collection strategy’s weekly diff surfaces new Surface 4 registrations within 24 hours of creation, catching agents created outside controlled channels before they age into the backlog

Step 6: Track Convergence as a Metric

State metrics describe where the program is. Velocity metrics describe whether it is moving. Both are required — a program that shows the same numbers week after week may appear stable while actually being stalled.

Four state metrics:

  • Surface 4 active count — owned percentage trending toward 100%
  • Surface 4 dormant count — trending toward zero
  • Surface 4 never-activated challenge completion rate — trending toward 100%
  • Surface 2 count — trending upward as the governed flow matures

Never-activated agents are excluded from the Surface 4 / Surface 2 governance debt ratio. Including them would allow a program to show misleading progress by decommissioning dormant agents while the never-activated population grows unchecked.

Four velocity metrics:

  • Dormant count weekly delta — should be consistently negative. Near zero for two consecutive weeks means the decommissioning pipeline has stalled.
  • Never-activated challenge completion velocity — agents completing the challenge process per week as a percentage of total. Near zero means the challenge process is not reaching the right channels.
  • Ownership resolution rate — active unowned agents acquiring a recorded owner per week. Near zero means resolution heuristics are not surfacing viable candidates.
  • New Surface 4 creation rate — net-new tag-based registrations per weekly diff. Should trend toward zero as Step 5 controls take effect. Persistent elevation means controls are incomplete or being bypassed.

Reading state and velocity together:

State Velocity Status Improving Positive Working as designed Improving Near zero Decelerating — easy candidates exhausted Flat Near zero Stalled — escalate, audit pipeline Flat Negative Losing ground — review Step 5 controls Worsening Any Immediate escalation required

State metrics should be reported weekly to the governance team and monthly to the executive sponsor as a direction indicator — accelerating, holding steady, or decelerating — rather than raw numbers.

Rolling Out the Program

The technical design is the easier half. The organizational rollout is where these programs fail.

Before touching anything: establish executive sponsorship, a documented and approved policy with a visible sign-off chain, and a stakeholder communication plan. Surprise is the enemy of a smooth rollout.

Pilot on the lowest-risk population first — dormant and unowned agents in a bounded scope: one environment, one business unit, or one creation-date cohort. The pilot validates collection data accuracy, generates real escalations for the governance team and sponsor to rehearse, and produces a reference case that makes the broader rollout easier to approve.

Expand in order of increasing risk: dormant unowned at scale → never-activated unowned → dormant owned → active unowned. Each stage should reach a defined steady state before the next begins.

Define rollback criteria before starting — conditions under which the program pauses:

  • False positive rate above 5% of disabled agents generating valid claims
  • A confirmed production outage caused by a disabled agent
  • Executive sponsor withdrawal

A program without defined stopping conditions either runs under increasing friction or collapses abruptly. Neither serves the governance goal.

Resource Requirements

Requirements vary significantly by estate size, automation depth, and organizational context. The figures below are illustrative reference points derived from the scope of work described in this post — classification, decommissioning, ownership resolution, challenge process management, and metric reporting — not benchmarks from a formal study. They are intended to help teams size an internal proposal and identify which model most closely matches their situation, not to serve as budget commitments. Actual requirements will depend on the maturity of existing identity governance tooling, the degree of pipeline automation achieved, and how much of the challenge process can be absorbed by existing operational teams.

Small-to-medium estate (under 500 legacy registrations): 2–4 weeks engineering build time; 2–4 hours per week steady-state operations; one part-time governance analyst and one engineer during build. Peak load is the first 90 days of the initial classification wave.

Large estate (500–5,000 registrations): 4–8 weeks build time; 1–2 days per week ongoing; one dedicated governance analyst, one part-time engineer, and a program manager. The program manager role is consistently underestimated — stakeholder communication at this scale is a sustained effort, not a setup task.

Very large estate (5,000+ registrations): 8–16 weeks including a design phase before engineering begins. Peak remediation staffing — the initial classification wave through the first full decommissioning cycle — is five to seven people. This is the figure most likely to surface in budget conversations, so be precise: it is peak load, not ongoing headcount. Once the initial backlog clears, typically 12–18 months in, steady-state operations drop to roughly Model 2 levels — one dedicated analyst, one part-time engineer, a program manager. Integrate this into an existing identity governance function rather than standing it up as a parallel capability; the steady-state workload is a natural extension of what that function already does.

The difference between these models is automation depth as much as estate size. An organization with 3,000 registrations and a well-built automated pipeline can operate closer to Model 1 resourcing than Model 3. The parts that cannot be automated — claim validity assessment, escalation handling, exception register management — are where human time should be concentrated.

Tooling: Build, Buy, or Extend

The Graph API implementation in this series is a reference implementation, not a permanent posture. The right mix of custom and commercial tooling depends on what coverage gaps exist today.

What commercial IGA tools cover: access reviews, lifecycle workflows, and permission analytics on generic service principal objects. Saviynt, SailPoint, and One Identity can enroll legacy agent app registrations in access review campaigns and trigger decommissioning workflows — this covers Step 4’s access review cycles control without custom tooling.

Where the gap is: the agent-specific data model. As of April 2026, no commercial IGA connector traverses all four surfaces as a unified model — typed subtypes, blueprint relationships, agent registry queries, and linkedAgentCount derivation are not covered. This is the layer that requires custom engineering.

Microsoft’s native tooling worth evaluating first:

  • Entra ID Governance access reviews (P2 license) for service principal review campaigns
  • Entra Permissions Management for over-privilege detection
  • Power Platform CoE Starter Kit for Copilot Studio inventory if already deployed
  • Microsoft Purview as an analytics and alerting layer over the collection output

The framework: buy or use native tooling for access reviews, lifecycle workflows, and audit retention. Build for the agent-specific collection layer. Extend existing ITSM, SIEM, or reporting platforms for diffing, alerting, and metric dashboards — a ServiceNow workflow or a Power BI report consuming the collection output is a lighter build than a standalone application.

Evaluate commercial tools again in six to twelve months. Coverage gaps that require custom engineering today may be closed by then. Build the custom layer to be replaceable: clean interfaces, documented schemas, no business logic that cannot migrate to a commercial tool when support matures.

The Bigger Picture

The legacy estate problem is not unique to AI agents. It mirrors what happened with service principal sprawl, app registration accumulation, and every previous wave of platform adoption that moved faster than governance frameworks could absorb. The pattern is consistent: adoption precedes governance, schema matures after adoption, and the resulting backlog requires a systematic burn-down strategy rather than a one-time audit.

What is different about the agent estate is the rate of accumulation. Copilot Studio makes it trivially easy for non-technical users to create agents. The governance team does not have the same visibility or approval authority over agent creation that it typically has over application deployments. The backlog grows faster and with less organizational visibility than previous analogues.

That is the argument for building the observability foundation described in the first post before the backlog becomes unmanageable — and for treating legacy remediation not as a project with an end date, but as an ongoing operational discipline with a tracked burn-down rate.

What’s Next

The next post in this series will explore permission drift detection: how to use periodic snapshots of the agent estate to surface newly added delegated permissions and app role assignments, and what a week-over-week diff of the governed agent population can reveal about how permissions are changing over time.

Tags: Microsoft Entra ID, Microsoft Graph API, AI Governance, Copilot Studio, Power Virtual Agents, Identity Management, Agentic AI, Zero Trust


Governing AI Agents in Entra ID: Clearing the Legacy Estate was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top