Token Management for AI Agents: Lifetimes, Rotation, and Revocation at Machine Speed
Updated 2026-05-15 · 10 min read · By @guptadeepak
Key takeaways
- Default to 5-15 minute access-token lifetimes for agents. The token theft window equals the lifetime, and agents act at machine speed.
- Refresh-token rotation per use (RFC 6749bis) plus client-binding (mTLS or DPoP) is the production-default refresh scheme for agents.
- Scope tokens per tool, not per agent. A token that authorizes one tool's API surface bounds the blast radius when that tool's credentials leak.
- Token caching by the agent runtime is where most token-management bugs live. Cache per (user, agent, audience) tuple; never cache across users.
- Anomaly detection for agent tokens is volume + pattern based: unexpected scope use, unexpected destinations, rate spikes. SIEM tooling is starting to ship agent-aware detection.
What makes agent token management different
The chain of decisions, in the order you make them:
- Access-token lifetime: 5-15 minutes.
- Refresh-token rotation: per use.
- Sender-constraint: mTLS or DPoP.
- Scope: per tool, not per agent.
- Caching: per (user, agent, audience) tuple.
- Revocation: refresh-token revocation, short access-token lifetime, no real-time access-token revocation in hot paths.
- Anomaly detection: volume + pattern.
The rest of the guide expands each.
Access-token lifetime
The strongest single lever for reducing agent-token risk is the lifetime. Shorter = smaller window after theft, but more refresh overhead.
The numbers that work in production:
- 5 minutes: aggressive but reasonable for agents calling few APIs per session. Refresh roundtrip becomes visible in latency-sensitive paths.
- 15 minutes: the common default. Good balance between theft window and refresh overhead.
- 30-60 minutes: defensible when paired with sender-constrained tokens (mTLS or DPoP) — the token is useless without the key, so the lifetime matters less.
- >1 hour: too long for agents without sender-constraints. Recoverable only by external mitigations (strict scope, anomaly detection, killable refresh tokens).
The latency math: a 5-minute token means the agent refreshes 12 times per hour per session. At 100ms per refresh, that's 1.2 seconds of refresh overhead per hour — invisible unless calls are blocking on the refresh. The right pattern is to refresh proactively (a minute before expiration) on a background thread, so foreground calls never wait.
Refresh-token rotation
Per RFC 6749bis §4.13, refresh tokens for public clients (and many production deployments for confidential clients too) should rotate on every use. The flow:
- Agent presents refresh token RT1 to the token endpoint.
- OAuth server issues a new access token AT2 and a new refresh token RT2.
- RT1 is invalidated immediately.
- Agent stores RT2; next refresh uses RT2.
Two security properties this provides:
- Bounded replay window: a stolen refresh token can be used exactly once before invalidation.
- Theft detection: if RT1 is presented again after RT2 has been issued, the server knows there are two parties holding what should be a single secret. The conventional response is to revoke the entire token family (RT1, RT2, and anything derived from them) and force re-authentication.
This is the default behavior in modern OAuth servers (Auth0, Curity, Keycloak, Ory Hydra, WorkOS). For older servers, enable refresh-token rotation explicitly.
Combine with sender-constraints to make even the one-use theft window unusable: the refresh token can only be presented by the client holding the corresponding mTLS cert or DPoP key.
Sender-constraints: mTLS vs DPoP
Bearer tokens are the source of most replay attacks. Sender-constrained tokens bind the token to a key the client holds; theft of the token alone is useless.
The two production-grade mechanisms:
- mTLS-bound tokens (RFC 8705): token is valid only when presented over a TLS connection where the client presents a specific certificate. The right pick when the agent runs in a server-side environment with mTLS infrastructure already (service mesh, internal cluster, server-running agent).
- DPoP (RFC 9449): the client holds a public/private key pair, signs a per-request DPoP proof, includes the key thumbprint in the token. Same property as mTLS, lighter to deploy. Right for agents in environments without mTLS — browser-running agents, mobile, edge functions.
Adoption of either is a one-time platform investment that pays back continuously. Adopting before a token-theft incident is the prudent move.
The detail on mTLS lives in mTLS Explained; the OAuth specifics in OAuth 2.1 Explained.
Scope per tool, not per agent
The architectural choice that bounds blast radius: tokens scope per tool, not per agent.
The wrong pattern: an agent has one OAuth client with scopes [calendar:rw, mail:rw, drive:rw, contacts:rw]. Every API call uses the same token. The token, once stolen, can do all four. This is the easy default and the wrong default.
The right pattern: the agent has multiple OAuth clients, one per tool surface. The calendar tool uses a token scoped calendar:rw; the mail tool uses a separately-obtained token scoped mail:rw. Token theft from the calendar tool gives the attacker calendar access only.
In practice this looks like: each "tool" the agent uses is registered as its own OAuth client; the agent's tool dispatcher obtains tokens scoped per tool; the tokens never cross tool boundaries inside the agent runtime.
For MCP-based architectures, this maps naturally: each MCP server is its own OAuth client; the agent has tokens for each MCP server, scoped to that server's exposed tools. See MCP Server Identity Model.
Token caching: where the bugs live
Agent runtimes cache tokens. Done wrong, this is the source of cross-user data leaks. The rules:
- Cache key:
(user_id, agent_id, audience). Anything less granular is a leak waiting to happen. - TTL: shorter than the token expiration. A token expiring in 15 minutes should evict from cache at 12-13 minutes to avoid using a token that expires mid-call.
- Eviction on user change: when the agent context switches users, evict cached tokens for the previous user immediately. Long-running agent processes that serve many users concurrently are the worst-case here.
- Isolation in shared environments: an agent platform serving many tenants should isolate the cache per tenant. A token leak in the cache is a tenant-data leak.
- No persistence by default: in-memory cache only, unless the runtime crashes and the cost of re-authentication is operationally prohibitive. If you must persist, encrypt at rest with a key separate from the cache key material.
The recurring bug pattern: a refactor introduces a more "efficient" global cache; per-tuple keying is forgotten; for a brief window before the bug is found, every request uses one user's token. Test for this explicitly with adversarial integration tests that exercise concurrent users.
Revocation
True real-time revocation requires the resource server to introspect the token (RFC 7662) on every call — an OAuth server roundtrip per API call. That kills the latency story.
The hybrid pattern, which is what most production deployments use:
- Access tokens: short-lived, validated statelessly at the resource server (JWT signature check, no issuer roundtrip). Revocation is "wait for expiration" — bounded by the short lifetime.
- Refresh tokens: stateful at the issuer, revocable instantly. Compromise detected → revoke the refresh token → access token expires within minutes → access stops.
For sensitive operations where access-token revocation must be immediate, introspection on those specific operations is a reasonable cost. Bank transfers, key changes, admin actions — fine to take the roundtrip. Ordinary read traffic — keep it stateless.
Anomaly detection for agent tokens
Account-takeover detection for humans uses signals like unfamiliar geolocation, unusual login time, behavioral biometrics. None of those work for agents. Agent token anomaly detection is volume + pattern based:
- Unusual API call rate: the agent normally calls 10 APIs per session; suddenly 1,000. Either the agent is doing something new or its token is being amplified by an attacker.
- Unexpected scope use: the token has 5 scopes; the agent normally uses 2. A spike in the other 3 is suspicious.
- Destination drift: the token is normally used from one set of IP ranges or one container fleet; sudden use from elsewhere is a signal.
- Cross-tool burst: the agent normally uses tool A then tool B; a burst of every tool simultaneously suggests automated abuse.
Modern SIEM tools (Datadog, Splunk, Sumo, Snowflake-based security stacks) are starting to ship agent-aware detection rules. CIAM products with strong audit-log structure (Auth0, Descope, Curity, WorkOS) feed those rules with the right primitives.
Implementation guidance
- Default to 15-minute access tokens for agents. Adjust based on measured refresh latency and the threat model.
- Refresh-token rotation per use. Default-on for modern OAuth servers. Verify it is enabled for your deployment.
- Sender-constrained tokens (mTLS or DPoP) wherever the runtime supports them. The platform investment pays back in every future token-theft scenario you do not have.
- Scope tokens per tool, not per agent. Multiple OAuth clients are cheap; cross-tool blast-radius is expensive.
- Cache tokens per (user, agent, audience). Adversarial concurrent-user tests in CI.
- Refresh proactively in the background, not lazily on the first failing call. Latency invisibility matters.
- Hybrid revocation: stateful refresh tokens, stateless access tokens.
- Volume + pattern anomaly detection on the audit-log stream. Feed it into your SIEM.
Related vendors
Auth0
Auth0 remains the safest mid-market default for B2C plus B2B Enterprise SSO when developer velocity matters more than long-run TCO. Below 50k MAU it is hard to beat. Above 500k MAU, cost and Actions-driven lock-in make alternatives like FusionAuth (self-host), Cognito (AWS-native), or Stytch plus Corbado (passkey-first) increasingly attractive.
Curity
Curity is the standards-purist enterprise CIAM in 2026, among the most spec-correct OAuth 2.0 / OIDC implementations available, with strong FAPI and Open Banking support that suits financial services and regulated workloads. The configuration-as-code model treats identity like infrastructure-as-code, which appeals to engineering-mature enterprises. Outside the standards-correctness or FAPI use cases, the enterprise pricing and learning curve make broader-scope CIAM (Auth0, Ping) more practical.
Descope
Descope is the orchestration-first CIAM in 2026, its Flows visual editor is the most capable no-code auth designer in the market, paired with above-average passkey orchestration and an early MCP-native posture for AI agents. For mid-market B2C and B2B SaaS that wants modern auth without writing the orchestration layer, Descope is one of the strongest picks. Compliance breadth and ecosystem maturity still favor Auth0 above 500k MAU.
WorkOS
WorkOS is the strongest B2B-first CIAM in 2026 by deliberate scope choice, every product surface assumes the buyer is selling to enterprise IT, not to consumers. AuthKit's 1M MAU free tier makes it a credible Auth0 alternative for B2B SaaS that doesn't need adaptive risk or B2C consumer flows. For pure B2B SSO, SCIM, and audit logs, WorkOS is hard to beat at any price point.
FAQ
- Why are agent tokens different from human tokens?
- Three differences. First, agents act at machine speed — a 1-hour stolen token is hundreds of API calls of damage before any human notices. Second, agents typically combine permissions a human would hold across separate sessions (read calendar AND send email AND access docs, in one token). Third, agents do not notice anomalous activity in their own sessions, so account-takeover signals that work for humans (unfamiliar locations, sudden language changes) do not apply. Tokens for agents need shorter lifetimes, narrower scopes, and behavioral monitoring that works on the agent's side.
- How short is too short for an access-token lifetime?
- If the agent has to refresh more than once per request batch, you are paying refresh overhead that does not buy security. 5 minutes is the practical floor for agents calling APIs in tight loops; 15 minutes is a typical default; 1 hour is the longest reasonable lifetime for an agent and only when paired with sender-constrained tokens. Below 5 minutes the refresh-roundtrip per call dominates latency without much marginal security benefit.
- Should agents share tokens across users?
- Never. Token caching keyed by anything other than (user_id, agent_id, audience) is the bug. The pattern that breaks is a long-running agent process serving many users via concurrent requests; if the cache is keyed by audience only, the wrong user's token gets used for the wrong request. Use per-tuple caches with a strict eviction policy and a small TTL.
- What's the right rotation scheme for refresh tokens?
- Per-use rotation: every time the agent presents a refresh token to get a new access token, the OAuth server invalidates the old refresh token and issues a new one. This bounds the replay window for a stolen refresh token to one use and lets the issuer detect parallel use as a theft signal (RFC 6749bis §4.13). Combine with client-binding via mTLS or DPoP for layered defense.
- How do you revoke an agent token immediately?
- Two paths. Short token lifetimes let revocation happen by waiting (most pragmatic). Token introspection on every API call gives immediate revocation but at the cost of every call hitting the issuer. The hybrid: short access tokens that the resource server validates statelessly, plus refresh-token revocation that is enforced at the issuer; a compromised refresh token is killable in seconds, and the corresponding access token expires within minutes.
Sources
- OAuth 2.1 (IETF draft)
- RFC 6749bis — OAuth 2.0 Framework (refresh-token rotation guidance)
- RFC 8693 — OAuth 2.0 Token Exchange
- RFC 9449 — OAuth 2.0 Demonstrating Proof of Possession (DPoP)
- RFC 8705 — OAuth 2.0 Mutual-TLS Client Authentication