
A Cloudflare outage affected a significant segment of the internet on Tuesday, hindering users from reaching various platforms and services such as X, ChatGPT, Spotify, YouTube, and Uber. The cybersecurity firm has published a blog discussing the incident.
Cloudflare co-founder and CEO Matthew Prince expressed his regrets in the post late Tuesday, labeling this outage the most severe since 2019. “In the past 6+ years, we haven’t experienced another outage that resulted in the majority of core traffic ceasing to flow through our network,” stated Prince. “On behalf of the whole team at Cloudflare, I want to apologize for the disruption we caused to the Internet today.”
Prince clarified that the outage originated from a problem with the system designed to safeguard websites against DDoS attacks.
Cloudflare’s Bot Management system is a solution that shields websites from harmful bot intrusions, including DDoS attacks, content scraping, and credential stuffing incidents. This system employs an AI model to score traffic requests, assessing the likelihood of them being generated by a bot. The AI evaluates numerous characteristics of the request, which are kept in a “feature file.”
The problem arose with the feature file, which updates every five minutes to align with changing bot behaviors. A modification to the underlying query that creates the file caused it to excessively duplicate information, inflating the feature file and activating an error in the Bot Management system.
Consequently, accessing websites protected by Cloudflare’s Bot Management system resulted in an error code. Cloudflare noted significant network issues roughly 15 minutes after the feature file was updated.
At first, Cloudflare feared a malicious attack, as its status page went offline even though it operates independently of the company’s infrastructure. However, Prince clarified that this was coincidental. “The issue was not caused, either directly or indirectly, by a cyber attack or any kind of malicious activity,” Prince emphasized. After mistakenly suspecting a hyper-scale DDoS attack, the main problem was recognized, and the distribution of the larger-than-expected feature file was halted and substituted with an earlier version.
A Cloudflare representative also highlighted that “there [was] no evidence that [the outage] was the result of an attack or prompted by malicious activity.”
Cloudflare’s services were mostly restored within three hours and completely recovered after around five hours. Prince mentioned that the company is considering steps to avert similar outages in the future, including preventing error reports from overwhelming their systems.