We've recently experienced yet another case of a root certificate authority (CA from now on) losing control of its own certificates. And yet again, we have been waiting for either the CA or the browser to do something about it. This whole mess stems, once again, from both a governance and a technical problem. First, only the very same CA that issued a certificate can later revoke it. Second, although web browsers implement several techniques to check the certificate's revocation status, errors in the procedure are rarely considered hard failures.
OCSP stapling (verify here if your browser supports it) is the last attempt at better managing the certificate's revocation process. To address both privacy and scalability concerns, it enables the server itself to provide the client with a verified statement of the certificate's validity. The client does not contact the CA anymore, instead the server (the certificate holder) asks the CA for a time-stamped OCSP response, which is then "stapled" to any client-initiated TLS connection. The only drawback is that the protocol allows only one OCSP response, making web pages using different certificates (for different resources) incompatible with this solution.
These three protocols often cooperate in a failover fashion. The client starts by requesting a "certificate status request". If the server does not comply, the client queries the CA. Again, if the CA does not comply, the CRL is used. Note that this is just a best practice, and no standard dictates what should be the intended behavior in the case of a failure (different browsers adopt different fallbacks as discussed here). What is appalling also is that the client is never informed throughout the whole process. In other words, the client may be using a CRL that is seven days old, but still perceive the same level of security provided by the rather more advanced OCSP stapling.
Based on what we have discussed, it would be common sense to think that drafting a proper standard, updating all the implementations, and enforcing hard failures, would somehow mitigate the risk of compromised TLS certificates. Unfortunately, this is hardly the case. The fundamental problem not addressed by either of these protocols is that, in some circumstances, we should stop trusting, albeit just temporarily, an entire CA (ant not just a certificate). This may happen in two related scenarios:
- A certificate has been compromised, but the CA did not revoke the certificate yet.
- The whole CA has been compromised, and we should revoke trust on all issued certificates.
Unfortunately, besides a discreet attempt known as Authority Revocation List, no protocol exists to revoke the trust on a CA. The whole matter, in fact, is entirely delegated to the application (or operating system) which is appointed to setup and update the keychain of trusted CA certificates. The scenario grows more grim if we consider that these updates do not rely on out-of-band distribution mechanisms (such as the OCSP protocol), but instead require the user to update the whole application (just check NSA's best practice on the matter to get a gist of it). In all recent CA fiascos, users have been left unprotected for several weeks due this limitation. Further, although modern operating systems offer centralized certificate stores, each application (with some notable exceptions such as Google Chrome) typically relies on its own certificate store. The resulting scenario is rather surreal: if a CA is comprised the user can not consider himself secure until all application vendors release proper updates, and these updates are delivered and installed on the user's machine.
Sadly, no practical solution is out there yet. We have been told that these incidents are so rare (well publicized, but rare) that is hardly justifiable replacing the current system as we know it. Personally I only partially agree with that, as I believe some incremental updates are still possible, especially in the mitigation phase that follows such incidents. In a future article we will analyze a tentative solution, and discuss its cost of deployment. Until then, so long!