Skip to Main Content
Drexel Library

University Archives Web Archiving Program

This guide is for the Drexel Archive's Archive-It web archiving program.

Acquisition of Web Materials

Method of Capture 

Archive-It is web crawling service which incorporates a version of the open source crawling software Heritrix. In addition to text, audio, video, images, and embedded documents are included whenever possible in the site capture.

The cost of storage may require us to limit the number or frequency of crawls, and to set priorities among the many websites within our collecting scope. Websites containing private information or information the creator does not wish to be archived will not be crawled. See our Opt-Out Statement


Authenticity of the crawl is tracked through the Archive-It and DUA staff do their best to check the accuracy and accessibility of crawls. Efforts are made to ensure that the crawled sites resemble the original site as closely as possible, to provide the original context and layout of the content. If you come across an archived website that does not match the catalog or Archive-It record, please notify the Archives.

What is the Internet Archive?

The Internet Archive

Wayback Machine

Preservation and Access

Archived websites are freely accessible via the DUA’s Archive-It page, where website-level metadata is added by DUA staff to allow browsing and searches. Archived websites are available in the Wayback Machine approximately 24 hours after a crawl, but full-text searching may take as long as one week to finish fully processing.

All archived sites are clearly labeled as such to distinguish them from the live site. All archived sites appear with a header from which lists the date and time the website was captured. You can also see the capture date and time listed in the Wayback URL. For example, was captured at 00:12:16 (12:12am) on12/14/2021.

When you follow a link on an archived webpage, it may take you to an archived site from another capture date, such as clicking "The Drexel Difference" from the above link will direct you to which was captured at 04:37:29 (4:37am) on 11/19/2021. Archive-It will display the most recent capture of a webpage which may lead you to older archived websites.