Developer: | Stanford University |
License: | BSD style |
The LOCKSS ("Lots of Copies Keep Stuff Safe") project, under the auspices of Stanford University, is a peer-to-peer network that develops and supports an open source system allowing libraries to collect, preserve and provide their readers with access to material published on the Web. Its main goal is digital preservation.
The system attempts to replicate the way libraries do this for material published on paper. It was originally designed for scholarly journals,[1] but is now also used for a range of other materials. Examples include the SOLINET project to preserve theses and dissertations at eight universities,[2] US government documents,[3] and the MetaArchive Cooperative program preserving at-risk digital archival collections, including Electronic Theses and Dissertations (ETDs), newspapers, photograph collections, and audio-visual collections.[4] [5]
A similar project called CLOCKSS (Controlled LOCKSS) "is a tax-exempt, 501(c)(3), not-for-profit organization, governed by a Board of Directors made up of librarians and publishers." CLOCKSS runs on LOCKSS technology.
Traditionally, academic libraries have retained issues of scholarly journals, either individually or collaboratively, providing their readers access to the content received even after the publisher has ceased or the subscription has been canceled.[6] In the digital age, libraries often subscribe to journals that are only available digitally over the Internet. Although convenient for patron access, the model for digital subscriptions does not allow the libraries to retain a copy of the journal. If the publisher ceases to publish, or the library cancels the subscription, or if the publisher's website is down for the day, the content that has been paid for is no longer available.
The LOCKSS system allows a library, with permission from the publisher, to collect, preserve and disseminate to its patrons a copy of the materials to which it has subscribed as well as open access material (perhaps published under a Creative Commons license). Each library's system collects a copy using a specialized web crawler that verifies that the publisher has granted suitable permission. The system is format-agnostic, collecting whatever formats the publisher delivers via HTTP. Libraries which have collected the same material cooperate in a peer-to-peer network to ensure its preservation. Peers in the network vote on cryptographic hash functions of preserved content and a nonce; a peer that is outvoted regards its copy as damaged and repairs it from the publisher or other peers.[7] [8]
The LOCKSS license used by most publishers allows a library's readers access to its own copy, but does not allow similar access to other libraries or unaffiliated readers; the system does not support file sharing. On request, a library may supply another library with content to effect a repair, but only if the requesting library proved that in the past that it had a good copy by voting with the majority. If the reader's browser no longer supports the format in which the copy was collected, a format migration process can convert it to a current format.[9] These limits on the use that may be made of preserved copies of copyright material have been effective in persuading copyright owners to grant the necessary permission.[10]
The LOCKSS approach of selective collection with permission from the publisher, distributed storage, and restricted dissemination contrasts with, for example, the Internet Archive's approach of omnivorous collection without permission from the publisher, centralized storage, and unrestricted dissemination. The LOCKSS system is far smaller, but it can preserve subscription materials to which the Internet Archive has no access.
Since each library administers its own LOCKSS peer and maintains its own copy of preserved material, and since there are libraries doing so worldwide (see the list of participating libraries below), the system provides a much higher degree of replication than is usual in a fault-tolerant system. The voting process makes use of this high degree of replication to eliminate the need for backups to off-line media, and to provide robust defenses against attacks aimed at corrupting preserved content.[11]
In addition to preserving access, libraries have traditionally made it difficult to rewrite or suppress printed material. The existence of an indeterminate but large number of identical copies on a somewhat tamper-resistant medium under many independent administrations meant that attempts to alter or remove all copies of a published work would likely both fail and be detected. Web publishing, based on a single copy under a single administration, provides none of these safeguards against subversion. Web publishing is, therefore, an amenable tool for rewriting history. By preserving many copies under diverse administration, by automatically auditing the copies at intervals against each other (and, in the future, against the publisher's copy), and by alerting libraries when changes are detected, the LOCKSS system attempts to restore many of these safeguards in the now digital world of publication.
Prior to implementing a LOCKSS system, some questions need to be considered carefully in order to make sure the content is verified, evaluated, and auditable by users. The user must ask questions such as, "What are your procedures?", "What are your methods?", "How is this system evaluated?", and "What is your disaster preparedness program?". These questions will enable the user to evaluate the system, create a successful maintenance plan for their materials, and enable the system to be reinforced by a carefully evaluated support structure.
The source code for the entire LOCKSS system carries BSD-style open-source licenses and is available from GitHub.[12] LOCKSS is a trademark of Stanford University.