These crawls are part of an effort to archive pages as they are created and archive the pages that they refer to. That way, as the pages that are referenced are changed or taken from the web, a link to the version that was live when the page was written will be preserved.
Then the Internet Archive hopes that references to these archived pages will be put in place of a link that would be otherwise be broken, or a companion link to allow people to see what was originally intended by a page's authors.
The goal is to
fix all broken links on the web
. Crawls of supported "No More 404" sites.
A daily crawl of more than 200,000 home pages of news sites, including the pages linked from those home pages. Site list provided by
The GDELT Project
The Wayback Machine - https://web.archive.org/web/20180109141438/https://backlinko.com/wp-content/uploads/2016/01/02_Content-Total-Word-Count_line.png