WARC (file format)

Web ARChive
Filename extension
.warc
Internet media type
application/warc[1]
Extended fromARC[2]
StandardISO 28500:2017[3]
Open format?Yes
Websiteiipc.github.io/warc-specifications/specifications/warc-format/warc-1.1-annotated/

The WARC (Web ARChive) archive format specifies a method for combining multiple digital resources into an aggregate archive file together with related information. These combined resources are saved as a WARC file which can be replayed on appropriate software, or utilized by archive websites such as the Wayback Machine.

The WARC format is a revision of the Internet Archive's ARC_IA File Format[4] that has traditionally been used to store "web crawls" as sequences of content blocks harvested from the World Wide Web. The WARC format generalizes the older format to better support the harvesting, access, and exchange needs of archiving organizations. Besides the primary content currently recorded, the revision accommodates related secondary content, such as assigned metadata, abbreviated duplicate detection events (see §7.6 "revisit"), and later-date transformations.[5] The WARC format is inspired by HTTP/1.0 streams, with a similar header and the use of CRLFs as delimiters, making it very conducive to crawler implementations.

First specified in 2008,[6] WARC is now recognised by most national library systems as the standard to follow for web archiving.[7]

  1. ^ Cite error: The named reference IIPC was invoked but never defined (see the help page).
  2. ^ Cite error: The named reference SourceForge was invoked but never defined (see the help page).
  3. ^ Cite error: The named reference ISO was invoked but never defined (see the help page).
  4. ^ Cite error: The named reference ARC_IA was invoked but never defined (see the help page).
  5. ^ Cite error: The named reference DigitalPreservation was invoked but never defined (see the help page).
  6. ^ Cite error: The named reference Arvidson was invoked but never defined (see the help page).
  7. ^ Cite error: The named reference Allegrezza was invoked but never defined (see the help page).