The Wayback Machine is an Internet tool that has archived over 279 billion web pages,1 for future reference. Its goal is to provide reference material to anyone looking to find old versions of websites and web pages.
The Wayback Machine also provides old versions of currently inactive web pages. Thus, even if a page is not existent anywhere on the the internet currently, It may have a cached version of the web page as a reference.
The importance of the tool is further highlighted, as the average lifespan of a web page is rather low, roughly a hundred days.2
The internet is currently home to an estimated 1 billion websites 3 (not web pages), and Wayback Machine has cached over 361 million of them.
Wayback Machine also caches multimedia
The tool not only caches web pages, it also caches a host of other multimedia, like audio, video, plain text, images and even software. As of 02-02-17, Wayback Machine has archived:
- 279 billion web pages
- 3,264,273 audio files
- 4,274 feature films
- 1,393,354 images
- 154,935 software tools
- 11,340,733 textual materials
All of the above cached items are cleanly and intuitively segmented with respect to language, subjects, creators and types, and can be easily and openly accessed by anyone.
Google’s homepage in 1998
Google was founded on 4th September, 1998. Wayback Machine’s very first cache of Google’s homepage was taken at November 11th, 1998 and December 2nd, 1998, subsequently 4.
The cached version of the page had text, mentioning that the Google index currently housed over 25 million results, with the promise that the index would ‘soon to be much bigger’. The Google search and I’m feeling lucky buttons were present, along with an email subscription box on the homepage. Other links included, links to Stanford Search and Linux Search modules.
Use cases of Wayback Machine
In 2016, The United States District Court, Kansas, accepted Wayback Machine screenshots as proof of copyright infringement.5 The judge mentioned that screenshots taken via the tool were legitimate and could be submitted as proof. There are other additional reported incidents where the tool has been used in IP litigation cases.
Research and Data mining
Wayback Machine’s vast and openly available resource can be used for data mining, once the required data has been mined, additional research can be undertaken on it. The research possibilities are endless. A study published in 2015, by Sanjay K. Arora,Yin Li, Jan Youtie and Philip Shapira, used Wayback Machine for data mining and research purposes.6
As previously mentioned, the tool is a powerful reference engine that can be used to find cached versions of websites and other multimedia.
Wayback Machine statistics
- It was launched in in October, 2001
- The Archive.org website, is a highly popular desination on the internet, with a global traffic rank of #388 and #300 in the USA.7
- The website receives roughly 72 million monthly monthly visitors.
- It was reported in 2014, that Wayback Machine hosts over 9 petabytes of information.8 One petabyte = 1000 terabytes, or 100,0000 gigabytes.
- Brewster Kahle, The 56 year old MIT graduate is the founder of Wayback Machine
- Mark Graham, acts as the organisation’s Director
- Andy Bezella is the senior sys-admin at Wayback Machine
- Hank Bromley is the organisation’s Principal Engineer
Alternatives to Wayback Machine
Screenshots.com, is an online archiving portal that takes screenshots of web pages and stores it on their website, for future reference. You can also initiate searches based on keywords and key phrases through the tool.
Archive.is, similar to Wayback Machine, keeps a repository of archived web url’s. However, on running a query, to find a cached version of Microsoft’s homepage, we found out that the oldest version in their repository, dated back to 23rd May, 2016. Wayback Machine’s oldest cached version dated back to 1996.