SSO troubleshooting: do's and don'ts
Updated 2026-05-07
SSO failures are common, mostly debuggable, and often resolved by the customer's IT admin without escalation when the SaaS provides good debugging UX. The patterns here distill what the production-grade Admin Portal experiences (Frontegg, WorkOS, Auth0 Organizations) ship by default.
For broader B2B SSO context, see the Enterprise SSO guide and the B2B Enterprise SSO onboarding playbook.
Do
Capture and surface the IdP assertion when validation fails
SSO failures are often assertion-format issues, wrong audience, expired timestamps, signature mismatch. Surfacing the parsed assertion (with sensitive fields redacted) lets the customer's IdP admin diagnose without engineering involvement.
Standard B2B SSO debugging UX. Modern CIAM Admin Portals (Frontegg, WorkOS, Auth0 Organizations) ship an 'SSO debugger' that shows the last assertion and validation result; debugging time drops from days to minutes.
Log clock skew explicitly during assertion validation
SAML and OIDC assertions are signed with timestamps. Clock skew between IdP and SaaS clocks causes 'token expired' or 'token not yet valid' failures that are mysterious without explicit logging.
RFC 7523 (JWT assertion) and SAML 2.0 specifications both define clock-skew tolerance windows. Modern CIAM logs the actual vs expected timestamps when validation fails; production deployments without this catch clock-skew bugs after weeks of intermittent failures.
Document the customer-facing troubleshooting runbook
Customer IT admins debug SSO at 2 AM when the engineer who set it up isn't available. A runbook with screenshots, common errors, and fix steps reduces back-and-forth dramatically.
Customer support data at major CIAM (Auth0, Okta, WorkOS) shows runbook-driven self-service resolves 60-80% of SSO support tickets without engineering escalation.
Test SSO end-to-end after every IdP cert rotation
IdP signing certificates rotate (Okta, Entra rotate annually by default). After rotation, the SaaS must trust the new cert. Without automatic rotation handling, every rotation breaks SSO until manually updated.
SAML metadata refresh and OIDC JWKS rotation are standard CIAM features. Most modern CIAM auto-refresh; legacy or custom-built integrations require manual cert updates and break on rotation.
Don't
Don't expose internal stack traces to the customer's admin
Stack traces leak implementation detail and confuse non-engineering admins. Translate errors to actionable customer-facing messages with next-step guidance.
B2B SaaS UX research consistently flags opaque error messages as a barrier to customer onboarding. Production CIAM Admin Portals translate error codes to plain-English explanations with linked runbook entries.
Don't disable assertion signature verification to debug
Disabled signature verification is a backdoor, assertions can be forged. The temptation arises when debugging signature failures; the fix is verifying the signing cert, not disabling the check.
Multiple production security incidents at SaaS specifically traced to signature verification disabled in non-production environments and accidentally promoted to production. The check should be on always.
Don't accept assertions with a relaxed clock skew tolerance
Clock skew tolerance windows wider than ~5 minutes accept replayed assertions long after issuance. Standard tolerance is 5 minutes; extended values are an antipattern.
OASIS SAML and OAuth 2.0 implementation guidance. Production deployments that relaxed clock skew to 'fix' intermittent failures created replay-attack windows; the fix is NTP, not relaxed validation.
Don't troubleshoot SSO via screen-share without redacting
Screen-share with the customer to debug SSO routinely leaks IdP signing certs, attribute mappings, and customer-internal user identifiers. Treat screen-share content with the same data-handling discipline as written exchanges.
Standard customer-data-handling practice. Several documented incidents at SaaS where customer screen-share content was inadvertently captured in support recordings or shared internally; the customer-data-isolation chain broke during the support interaction.