The impact of pulling those certificates would be swift and severe. Once browsers like Chrome and Firefox found them missing, they would flash warnings to any visitors that the sites weren’t safe. Some browsers would block access altogether. A not insignificant chunk of the internet would effectively be taken out of commission. All because of this one small flaw in one niche corner of the Let’s Encrypt operation.
Within two minutes of confirming the bug, the Let’s Encrypt team stopped issuing any new certificates in a bid to stanch the bleeding. A little over two hours after that, they fixed the bug itself. And then they let everyone know what was coming.
“We can’t contact everybody, so we started contacting the largest subscribers, telling them about the situation, getting them as informed as possible,” says Aas. “And then we worked with them to get them to replace their certificates as quickly as possible.”
Once a site operator renewed a certificate, Let’s Encrypt could safely revoke the old one. No harm would befall the site. Which sounds like a simple enough solution—but nothing’s simple at this kind of scale.
Bigger organizations had an easier time fixing the problem, because they generally have the resources to monitor any signs of trouble that surface, and the tools to automate the renewal process. “If you’ve got a dozen or two dozen servers or something, that’s some poor sleepy-eyed soul in the middle of the night at a keyboard,” says MongoDB’s White. “We reissued a little over 15,000 certificates [for clients], and we did it in a few hours. There was some work involved but it wasn’t catastrophic. We had measures in place to be able to rotate quickly.”
Smaller sites got a big assist from the Electronic Frontier Foundation, which operates Certbot, a free software tool that automatically adds Let’s Encrypt certificates to sites and renews them every 60 days. In the last two months alone, Certbot has generated certificates for 19.2 million unique sites. “Fortunately we had anticipated the need to check revoked certificates for renewal in 2015,” says EFF engineering director Max Hunter. “Because Let’s Encrypt communicated the issue early and the code path for the query was already in place, our work was relatively straightforward.” By Tuesday a team from EFF, along with volunteers in Paris and Finland, had updated Certbot to renew any revoked certificates.
Meanwhile, Let’s Encrypt sent an email to every address it had on file. It created a searchable database of every affected domain so that hosting companies see if they needed to act. “We marked those certificates as expired in our internal system, and then our normal automated processes kicked in to generate and deploy new certificates,” says Justin Samuel, CEO of Less Bits, a startup that operates hosting company ServerPilot.
On Tuesday night, 30 minutes before the deadline, Let’s Encrypt made another announcement. Of the 3 million potentially impacted sites, 1.7 million had managed to renew their certificates, an astonishing number given the short window of time. “No other CA comes close to making large-scale cert reissuing not only feasible but also fast,” says Samuel.
That success also emboldened Aas to make a difficult call. Let’s Encrypt would let the remaining certificates slide. “We made the decision that instead of breaking more than a million websites, potentially, we just aren’t going to revoke them by the deadline,” says Aas. “We think it’s the right decision for the health of the internet.”
It was the internet equivalent of a call from the governor minutes before midnight. Let’s Encrypt will continue to revoke certificates if it can confirm that the sites have renewed them, but otherwise is content to leave them be in their slightly broken form. The security risk is small, Aas says, and since Let’s Encrypt certificates are only viable for 90 days to begin with, any stragglers will have washed out of the ecosystem by summertime at the latest.
“If anything, this just reinforces that they are one of the most transparent, modern certificate authorities in the world,” says MongoDB’s White, who points to previous certificate snafus that for-profit companies like Symantec have badly mishandled. “It’s easy to armchair quarterback. But I think if people are overly critical that’s misplaced.”
The intricacies of internet infrastructure are generally ignored until something goes terrible wrong. This time, though, it’s useful to reflect on what went right. For once, the story is that nothing broke.
More Great WIRED Stories