The multi-IP problem
A domain is not a single server. DNS round-robin, CDN anycast, and load balancers mean that a single hostname can resolve to dozens or hundreds of IP addresses, each potentially serving a different TLS configuration. The client connecting to example.com gets whichever IP the DNS resolver happens to return — and from a single vantage point, you only see one of them.
This becomes a problem when those IPs are not in identical states. Certificate deployments take time to propagate, CDN edge caches are not synchronized, and configuration changes roll out node by node. At any given moment during a deployment, some fraction of your traffic may be hitting a different certificate, a different TLS version, or a different protocol stack than the rest.
The fundamental issue is that standard monitoring checks one IP at a time. If your monitoring pings the IP that got the new cert, everything looks fine — while a significant fraction of real user traffic is hitting an older node. The inconsistency is real; your monitoring just cannot see it from a single perspective.
How mismatches happen
The most common cause is a rolling certificate deployment. When you push a new certificate — whether via your CDN's API, Ansible, or a deployment pipeline — nodes are updated sequentially, not atomically. During the rollout window, new and old certificates coexist across the fleet. If the rollout stalls (a node fails health checks, a deployment error on one region, a misconfigured Ansible play that exits early), the inconsistent state becomes permanent rather than transient.
CDN-specific causes are more subtle. CDNs operate globally distributed edge networks, and configuration changes — including certificate updates — propagate via an eventually-consistent control plane. The propagation SLA varies by CDN: Cloudflare typically converges in seconds, but some CDNs take minutes to tens of minutes. For a domain with Let's Encrypt certificates renewing every 90 days, this is not a practical problem. For emergency certificate revocation and reissuance after a key compromise, the propagation delay is operationally critical.
A distinct failure mode involves origin vs. edge configuration drift. The CDN edge serves its own certificate to clients (negotiated via SNI); the origin server has a separate certificate used for the CDN-to-origin connection. If the origin certificate expires or is replaced with one covering different SANs, the CDN-to-origin TLS may break while client-facing TLS remains unaffected — and vice versa. These two certificate paths are independently managed and independently monitored, which means errors in one are often invisible when checking the other.
Why it matters
Inconsistent TLS across IPs produces errors that are intermittent, hard to reproduce, and hard to diagnose. A user reporting a certificate warning or connection failure cannot tell you which IP they hit. Your support team's attempt to reproduce the issue hits a different IP and sees no error. The problem appears and disappears as DNS load balancing routes requests to different nodes.
The severity scales with the nature of the mismatch:
- Expired certificate on some nodes — users hitting those nodes see a hard browser error. Expired certificates cause immediate, visible failures, not degraded performance.
- Different SANs across nodes — if some nodes serve a certificate that does not cover the requested hostname, the TLS handshake fails with an SNI mismatch. This is a hard failure with no fallback.
- Different TLS versions — some nodes still negotiating TLS 1.0/1.1 while others use 1.3. Security scanners will flag the entire domain as supporting deprecated protocols even if most nodes do not.
- Different ALPN — some nodes advertising HTTP/2 (
h2), others only HTTP/1.1. Clients connecting to HTTP/1.1 nodes get degraded performance without visible errors.
What to look for
A multi-IP consistency check compares the TLS handshake output across all IPs a domain resolves to. Key comparison dimensions:
- Certificate serial number and fingerprint — the most direct comparison. Different serial numbers across IPs confirm different certificates are being served.
- Certificate expiry date — identical certs will have the same expiry. Divergent expiry dates mean some nodes renewed and others did not.
- SAN list — should be identical across all nodes. Divergent SANs indicate nodes serving different certificates, which may cover different subdomains.
- Negotiated TLS version — using a client that supports TLS 1.3, check which version is actually negotiated on each IP. Nodes still on older TLS versions will negotiate 1.2 or lower.
- ALPN protocol —
h2vshttp/1.1affects performance. Inconsistency here indicates nodes with different server configurations. - Certificate chain — the server should send the full chain (leaf + intermediates). Some nodes may be misconfigured to send only the leaf, causing chain validation failures on clients that do not have the intermediate cached.
CDN-specific patterns
CDNs offer two modes for TLS termination: a shared certificate (the CDN's own certificate covering many customer domains via SANs or SNI-based routing) and a custom certificate (your certificate, uploaded to or provisioned by the CDN). The failure modes differ by mode.
With shared certificates, the CDN manages the certificate and rotation. You have no control over the specific certificate details, and inconsistency across edge nodes is the CDN's operational problem, not yours. If you observe inconsistency, it is a CDN issue to report.
With custom certificates, you upload and manage the certificate. Push operations go to the CDN's control plane, which distributes to edge nodes. A partially completed push — interrupted by an API error, rate limit, or network issue — can leave some edge nodes on the old certificate. Most CDN dashboards provide a certificate status view that shows propagation state per region or PoP.
Certificate pinning interactions are a specific hazard. If a mobile application pins your certificate's fingerprint (or the issuer's), a certificate rotation that lands on some CDN nodes before others can cause the application to fail on connections to un-rotated nodes. If you use certificate pinning, all nodes must rotate atomically — or pinning must be done at the SPKI level with backup pins to accommodate rotation.
Detection and monitoring
Effective TLS consistency monitoring resolves all DNS A and AAAA records for a hostname, opens a TLS connection to each resulting IP (with the same SNI), and compares the handshake output. A test that checks only one IP provides false confidence in multi-IP deployments.
For proactive monitoring, the check frequency should be higher during and after any certificate deployment. Certificate expiry monitoring should check all IPs, not just one. An expiry alert that fires only when the monitoring probe's specific IP gets a renewed cert means other IPs may have already expired by the time the alert fires.
For CDN deployments, also check the origin TLS separately from the edge TLS. A monitoring check against the CDN hostname validates the edge certificate; a check against the origin IP (bypassing the CDN) validates the origin-to-CDN path. Both are independently important.
Fixing mismatches
For CDN-propagated mismatches, the first step is verifying that the certificate update was fully propagated via the CDN's control plane. Most CDNs provide a propagation status API or dashboard. If propagation is stuck, re-triggering the deployment operation usually clears it.
For bare-metal or VM-based multi-server deployments, the fix depends on the deployment mechanism. If using Ansible, identify which hosts failed and re-run the certificate deployment play against only those hosts. If using a configuration management tool with convergence semantics (Chef, Puppet), a convergence run will resolve the drift automatically on the next check interval.
For ACME-based certificates on multi-node setups, ensure the ACME client is running and has valid credentials on every node. Shared certificate storage (a mounted NFS volume or a secrets manager) is simpler than per-node renewal but requires careful access controls and a single point of renewal, not per-node ACME requests, which can exhaust rate limits.
After any corrective action, run the multi-IP consistency check again to confirm all nodes are in the expected state before closing the incident.