The Internet Archive, a non-profit organization, has been experiencing a series of cyberattacks throughout October 2024.
In cybersecurity the underlying discussions are surrounding topics like quantum encryption and zero-trust architecture, meanwhile the Internet Archive just lost 7TB of data through an exposed GitLab authentication token.
This token, reported as publicly accessible since December 2022, led to one of the most significant breaches of 2024, affecting 33 million users. 31,081,179 as reported on haveibeenpwned.com
What is the Internet Archive?
The Internet Archive, a non-profit organization, represents the online archive of web pages, images, historical documents and books. Originally set up in 1996 by Brewster Kahle, a US IT specialist.
Small oversight, massive impact
It all started on October 9th with a data breach and a Distributed Denial-of-Service (DDoS) attack.
The first breach was reported on a development server (services-hls.dev.archive.org), where attackers made use of an exposed GitLab configuration file.
The exposed token had broad permissions: access to repository contents, CI/CD pipelines, and project configurations.
The leak affected millions of people, and we do blame the attackers of course, for acting they way they did. Also we understand the value of the website and congratulated the non-profit organization, however we do hope at least lessons are learned, because mistakes were made by The Internet Archive’s developers.
Finding a private key on a public-facing server is equivalent to finding admin credentials in plain text. Fundamental cybersecurity principles were ignored.
Elevating permissions to admin level, or the equivalent in the pre-SVN era of granting folders 777 permissions, contradicts an important principle in software development: the principle of least privilege.
The website remained offline on Friday and later on was restored with a read-only state. Its founder confirmed the major cyber attack and stolen data was initially said to include email addresses, screen names, and Bcrypt passwords.
What we know: DDOS attack–fended off for now; defacement of our website via JS library; breach of usernames/email/salted-encrypted passwords.
What we’ve done: Disabled the JS library, scrubbing systems, upgrading security.
Will share more as we know it.
— Brewster Kahle (@brewster_kahle) October 10, 2024
Troy Hunt, HIBP founder Troy Hunt described on X the timeline of events.
Let me share more on the chronology of this:
30 Sep: Someone sends me the breach, but I'm travelling and didn't realise the significance
5 Oct: I get a chance to look at it – whoa!
6 Oct: I get in contact with someone at IA and send the data, advising it's our goal to load…— Troy Hunt (@troyhunt) October 9, 2024
Anonymus attackers
As of today we don’t know what person or group that attacked the website, meanwhile a group known as SN-Blackmeta took responsibility for the DDoS attacks.
We do assume this was a black hat hacker, though we can’t exclude state-sponsored threat actors. Grey hat hackers are rarely any better, but as we can see with this breach, instead of cooperating and communicating, the attacker’s actions clearly crossed the grey zone.
A continuation of the initial breach
Despite warnings about the compromise, some API tokens remained unchanged.
So the credentials sat exposed for almost two years. API tokens weren’t rotated, allowing attackers to maintain their access.
The attackers then discovered more credentials hidden in the source code, including database access.
The damage went deeper than just the user database. The attackers accessed Internet Archive’s Zendesk support system, exposing 800,000+ support tickets since 2018. Ranging from basic tickets to ones containing personal ID documents from removal requests.
To prove their point, the attackers sent emails through Internet Archive’s support system.
We will follow this development, let’s hope for now fundamental security measures are in place.