Digital Preservation Framework

November 2019

Purpose

The Digital Preservation Framework formalizes Tufts University’s commitment to the long-term preservation of its digital resources of enduring value, thereby assuring long-term access to these resources. This document outlines the University’s commitment to a sustainable, standards-compliant, well-administered, and transparent digital preservation program.

Digital resources are subject to the same overarching criteria for curation -- selection, management, and preservation -- as other resources in the University’s permanent collections. Tufts Archival Research Center (TARC) retains primary preservation responsibility for all digital resources directly under its stewardship. Digital preservation decisions are made in reference to University records schedules and collection policies with regard for the enduring value of specific digital resources, and the feasibility of providing the necessary preservation services. Decisions about the need for and level of preservation are made at the time of acquisition and are governed by policies regarding the creation of digital resources, and in some cases revisited after a designated period of time.

Preservation strategies involve local preservation solutions as well as partnering with trusted vendors. Preservation of digital resources may include any actions necessary to preserve enduring access to the content, ensure its authenticity, and mitigate the effects of technology or media obsolescence.

Standards Compliance

The staff at the University responsible for digital preservation will monitor standards developed by the digital preservation, library, and archives communities and strive to be compliant with all relevant standards. TARC will implement regular audits of digital preservation services and best practices in order to monitor this commitment. To this end, TARC will adhere to the principles outlined in the Open Archival Information System (OAIS) reference model for digital object lifecycle management, as depicted below.

OAIS reference model
            Visual representation of the OAIS reference model.

         

Image of the TARC OAIS instantiation
Tufts University interpretation of the OAIS reference model

   

Additionally, TARC will adhere to principles and standards as laid out by DACS (Describing Archives: A Content Standard), and other relevant library standards including PREMIS and Dublin Core.

TARC uses and contributes to community standards and tools including Fedora, Samvera, Archive-It, and ArchivesSpace. See Appendix A for a list of community-based and standards-compliant tools used in its digital preservation efforts.

Administrative Responsibility

TARC maintains overall administrative responsibility for actions taken upon resources under its stewardship, ensuring the enduring preservation and accessibility of the University’s permanently valuable records and collections. School-based libraries also maintain some collections of enduring value. The priority and level of digital preservation actions taken will be dictated by the mandates and policies of local University units and will be supported by TARC.

Mandate

Digital preservation is central to the mission of Tufts Archival Research Center. TARC’s University Archives and Manuscripts Collections oversees the acquisition, arrangement, description, and use of the university’s collections of enduring value.

The University Records Policy defines TARC’s duty and authority, as mandated by the Board of Trustees, to preserve university records of enduring value in compliance with appropriate laws and regulations and university records schedules. University records are selected for preservation in accordance with the Records Retention Schedule, which outlines retention periods based on state and federal recordkeeping requirements, operational needs, and historical value.

The Tufts Archival Research Center (TARC) is the steward of university records of enduring value, in any form, that are entrusted to its care. As the archival repository for all university records, it has the duty and authority to collect, appraise, describe, preserve, and make available university records of enduring value in compliance with appropriate laws and regulations and university records schedules. 

The TARC is responsible for preserving university records of enduring value that are entrusted to its care. The TARC must meaningfully preserve these records, ensuring their authenticity and understandability into the future.

Specific University policies and mission statements set out preservation responsibilities for other digital resources.

Objectives

The primary intention of the digital preservation framework is to codify local digital preservation best practices in efforts to ensure future access to digital resources that are determined (by records schedules and collections policies) to be of enduring value to the University. Objectives include:

  • Minimizing risk of loss of core University records and information as determined by University records schedules and applicable laws.
  • Protecting University investment in digital collections.
  • Complying with preservation community standards and best practices.
  • Seeking to expand and develop digital preservation methods that are appropriate for the University community.
  • Assessing the risks for loss of content posed by technology variables such as proprietary file formats, applications, and obsolescence.
  • Evaluating the digital content to determine what type and level of format conversion (migration) or other preservation actions may be required.
  • Determining the appropriate type and level of metadata needed for content types and the relationship to the object(s).
  • Preserving both materials that originated in digital form (born digital) and those converted to digital form.

Organizational Viability

The digital preservation function is integrated into the operations and planning of the University and throughout the management stages of the digital content lifecycle.

Scope

The University, through TARC, accepts responsibility for preserving and making available digital content, associated documentation, and other metadata provided by depositors or created by TARC in accordance with University Records Schedules and collections policies. Not all digital resources acquired or created by the University will be preserved. As outlined by collection development documents, digital resources under the stewardship of TARC will take priority over those with less enduring value. Levels of preservation will depend on the feasibility of taking preservation actions on certain file format types. As technology constraints are lifted, the levels of preservation will be revisited.

Operating Principles

  • Access:  Long-term access to selected digital content is the primary goal of all preservation activities.
  • Authenticity: We will strive to meet archival requirements pertaining to the provenance, chain of custody, authenticity, and integrity of institutional records and other digital resources.
  • Collaboration: Digital preservation is a complex endeavor that requires frequent collaboration. We will collaborate both within the University (including TARC, School-based Libraries, and Tufts Technology Services (TTS)) and with external partners to ensure preservation of University digital assets at appropriate levels.
  • Evolution and Iteration: We recognize that digital preservation is an ever-evolving field and will modify and adapt workflows to reflect current best practices. New tools and software will be systematically tested prior to implementation in production.
  • Intellectual Property and Copyright: Digital preservation actions will be in accordance with applicable intellectual property rights and laws.
  • Standards and Best Practices: We will strive to comply with well-established standards and community best practices when implementing digital preservation actions.
  • Sustainability: The University will ensure sufficient staffing and funding levels to maintain all digital preservation efforts.
  • Technology: Necessary servers and applications will be made available, developed, and maintained over the long-term to ensure continuous support of digital preservation efforts. When local technology capacity needs to be supplemented by external technology providers, the University will partner with responsible, sustainable programs.
  • Training: The University will support training of staff members responsible for digital preservation actions to remain up-to-date with the latest trends, standards, and best practices in the field.
  • Transparency: All policies and procedures regarding digital preservation efforts will be made available to interested parties. Preservation actions performed on digital objects will be documented in the objects’ metadata.

Roles and Responsibilities

TARC will collaborate with other Tufts University units and libraries to help provide preservation services for all of the University’s digital assets at the proper level. TARC will be responsible for the University’s implementation of Preservica for the preservation of digital university records and archival objects. Preservica will be the principal tool for managing preservation copies, or archival information packages (AIPS), of digital objects. It will also be used to normalize files into preferred preservation formats, and, when necessary, into access formats.

Through shared use of the digital repository, TARC works with other University units, such as TTS and the Tufts Libraries to ensure proper management of digital objects that are accessed through various portals and stored in the Tufts Digital Repository.

Selection and Acquisition

The University records schedules and collection development policies maintained by TARC, as well as the Tufts Digital Repository Collection Development Policy, establish criteria for acquiring and providing access to digital content. Individual collection policies may exist to inform subject-specific or other needs. The University’s digital resources are subject to the same overarching criteria for curation, selection, management, and preservation as other resources in its collections. Collection specialists, who provide expertise on the enduring value of content, in consultation with preservation and information technology experts, make these decisions.

Access and Use

Without the preservation of digital materials, access would not be possible and essential University records and cultural heritage materials would be at risk. Access to preserved digital content is provided using the most appropriate technology available at the time of use. All preservation actions are performed with an eye towards long-term access of digital resources. When functionality of a resource must be maintained, all reasonable efforts to provide access to the original file will be made. In cases where the content of the resource is paramount, file conversion will occur to ensure greater format stability and enduring accessibility. The University commits to monitoring the evolution of technology supporting emulation and format conversion tools, as well as emerging file formats. Appropriate plans to make rendering the original version possible are devised on a case-by-case basis and revised as needed.

The University’s preservation policies comply with access restrictions as defined in all relevant laws, regulations, licenses, and deposit agreements.

Challenges

There are recognized challenges in implementing an effective and enduring digital preservation program, including:

  • Rapid growth and evolution: Technology that supports the variety of formats and dissemination mechanisms changes rapidly. Establishing a program that is responsive to rapid change is a challenge.
  • Content creator partnerships: Working with creators and providers of valued content to employ appropriate provisions prior to deposit will better facilitate future preservation but requires significant time and effort.
  • Sustainability: The need for effective and affordable cost models is widely acknowledged in the archives, library, and digital preservation communities. Complete implementation of digital preservation actions can be hindered by financial, technological, and staffing constraints. The scale of the digital preservation programs is based on the level of the University’s commitment. The program should reflect reasonable expectations of requisite resources; the University should not promise more than can be delivered.
  • While the University is committed to overcoming the challenges to implementing a full suite of digital preservation services, it recognizes the gaps in current services. See Appendix B for identified gaps.

Financial Sustainability

The University has identified specific resources to support and enhance its digital preservation function.

Institutional Commitment

To sustain its digital preservation function, the University has allocated a portion of its permanent budget to digital preservation services; this includes the creation of TARC as a central University-wide department in 2001. In addition, TARC will continually seek opportunities and partnerships to extend its digital preservation scope and capabilities.

Cooperation and Collaboration

The University and TARC are committed to collaborating within the Tufts community and with other institutions to:

  • Advance the development of digital preservation efforts.
  • Share lessons learned with other digital preservation programs.
  • Extend the breadth of its available expertise.
  • Extend the digital content that is available within a broad information community to users through cooperative efforts.

Technological Support Levels

The University strives to meet the requirements for digital object management as laid out by the National Digital Stewardship Alliance (NDSA) levels of digital preservation, which focus on five aspects of preservation:

  • Storage and Geographic Location
  • File Fixity and Data Integrity
  • Information Security
  • Metadata
  • File Formats

See Appendix C for an explanation of the NDSA levels and the current self-assessment of the University.

System Security

From accessioning through processing to long-term stewardship, the University has checks in place to ensure the authenticity and integrity of digital resources. Fixity checks on digital objects will be conducted at regular intervals to protect against data loss.

The Tufts Digital Repository (TDR) uses multiple pools of storage for storing archival and access content. The Dark Archives serves as an isolated data store without a public access layer where restricted digital records are kept and provided an extra layer of security. Fedora’s internally managed content is backed up with its corresponding virtual machine (VM), and the various TDR databases are backed up nightly.

Procedural Accountability

As a proponent of good digital preservation practice, the University is committed to transparency in its policies and operations. All policies and procedures regarding digital preservation actions will be reviewed, approved, and made available to appropriate parties prior to implementation.

Audit and Transparency

The University is committed to a two-year cycle of self-assessment to evaluate, measure, and adjust the policies, procedures, preservation approaches, and practices of the digital preservation function.

Policy Framework Administration

This digital preservation framework was completed in the fall of 2018; approved/endorsed by the Office of the Provost in 2018; and updated in spring 2022. The University will review the framework every two years to ensure that it remains current and comprehensive as the digital preservation functions at the University evolve.

References

Preservation Policies:

Boston University Libraries. “BU Libraries Digital Preservation Policy.” Draft December 2011. https://www.bu.edu/dioa/openbu/boston-university-libraries-digital-preservation-policy/

Dartmouth College Library. “Dartmouth College Library Digital Preservation Policy.” Revised December 21, 2015. https://www.dartmouth.edu/~library/preservation/docs/dartmouth_digital_preservation_policy.pdf

Digital Creation and Preservation Working Group, U Mass Amherst. “UMass Amherst Libraries Digital Preservation Policy.” May 2011.  https://www.library.umass.edu/assets/Digital-Strategies-Group/Guidelines-Policies/University-of-Massachusetts-Amherst-Libraries-Digital-Preservation-Policy4-26-2013-templated.pdf

Illinois Digital Environment for Access to Learning and Scholarship (IDEALS). “IDEALS Digital Preservation Policy.” November 2009. https://www.ideals.illinois.edu/bitstream/handle/2142/2383/IDEALS_PreservationPolicy_Nov2009.pdf

Indiana University Digital Preservation. “Digital Preservation Policy Framework.” Modified March 28, 2017. https://wiki.dlib.indiana.edu/display/DIGIPRES/Digital+Preservation+Policy+Framework

Inter-university Consortium for Political and Social Research, University of Michigan. “ICPSR Digital Preservation Policy Framework.” Version 4 -- August 13, 2018. https://www.icpsr.umich.edu/icpsrweb/content/datamanagement/preservation/policies/dpp-framework.html

Michigan State University. “MSU TDR Digital Preservation Program.” Modified July 2, 2014. https://spartanarchive.files.wordpress.com/2011/04/msu_tdr_preservationprogram1.pdf   

MIT Libraries. “Digital Preservation Principles.” Updated December 2015. https://libraries.mit.edu/preserve/about/digital/digital-preservation-principles/

North Carolina Department of Cultural Resources. “North Carolina Digital Preservation Policy.” April 2014. http://digitalpreservation.ncdcr.gov/digital_preservation_policy_dcr.pdf 

Ohio State University Libraries. “Digital Preservation Policy Framework.” August 2013. https://library.osu.edu/documents/SDIWG/Digital_Preservation_Policy_Framework.pdf

Princeton University Library. “Princeton University Library Digital Preservation Framework.” Modified July 18, 2016. https://library.princeton.edu/sites/default/files/PUL-DP-Framework_v5.pdf

Purdue University Research Repository. “Purdue University Research Repository (PURR)

Digital Preservation Policy.” Revised April 4, 2012. https://purr.purdue.edu/legal/digitalpreservation

Roper Center at Cornell. “Digital Preservation Policy.” Approved January 23, 2015. https://ropercenter.cornell.edu/digital-preservation-policy/

University of Minnesota Libraries. “Digital Preservation Framework.” Modified May 2, 2014. https://www.lib.umn.edu/dp/digital-preservation-framework

Yale University Library. “Yale University Library’s Digital Preservation Policy Framework.” November 2014. https://web.library.yale.edu/sites/default/files/files/YUL%20Digital%20Preservation%20Policy%20Framework%20V1%200.pdf

Tools:

Electronic Resource Preservation and Access Network. “ERPA Guidance: Digital Preservation Policy Tool.” September 2003. https://www.erpanet.org/guidance/docs/ERPANETPolicyTool.pdf

Northeast Document Conservation Center. “NEDCC Digital Preservation Policy Template.” December 19, 2008. https://www.nedcc.org/assets/media/documents/SoDAExerciseToolkit.pdf

Articles, reports, and presentations:

Beagrie, Neil, Najla Semple, Peter Williams, and Richard Wright. “Digital Preservation Policies Study.” October 2008. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.214.9056&rep=rep1&type=pdf

Bishoff, Liz. “Digital Preservation Plan: Ensuring Long Term Access and Authenticity of Digital Collections,” Information Standards Quarterly, 22 no. 2 (Spring 2010): 20-25, https://groups.niso.org/apps/group_public/download.php/4250/FE_Bishoff_Digital_Preservation_Plan_isqv22no2.pdf

Kenney, Anne and Nancy McGovern. “The Five Organizational Stages of Digital Preservation.” In Digital Libraries: A Vision for the 21st Century: A Festschrift in Honor of Wendy Lougee on the Occasion of her Departure from the University of Michigan, edited by Maria Bonn et al. Ann Arbor, MI: Michigan Publishing, 2003. https://quod.lib.umich.edu/cgi/t/text/text-idx?c=spobooks;idno=bbv9812.0001.001;rgn=div1;view=text;cc=spobooks;node=bbv9812.0001.001%3A11

National Digital Stewardship Alliance. “Checking Your Digital Content.” 2014. https://ndsa.org/documents/NDSA-Fixity-Guidance-Report-final100214.pdf

Noonan, Daniel. “Digital Preservation Policy Framework: A Case Study,” EDUCAUSE Review, July 28, 2014. https://er.educause.edu/articles/2014/7/digital-preservation-policy-framework-a-case-study

Appendix A: Tools used by the University to provide digital preservation services

Archive-It:

TARC uses Archive-It, a service of the Internet Archive, to crawl and preserve websites affiliated with Tufts University and manuscript collections in its holdings.

ArchivesSpace:

ArchivesSpace is an open source, web-based information management system that supports the core functions of archival administration, including accessioning, collections management, description and arrangement of both analog and digital materials, authority and rights management, and reference. As a standards-compliant metadata authoring tool, it is TARC’s system of record for descriptive metadata about archival materials, including born-digital and digitized materials.

Chronopolis:

Chronopolis is a service that provides long-term, distributed storage of digital objects. TARC uses this service to store a copy of all archival resources (objects and metadata) available in the Tufts Digital Library in each of Chronopolis’s three storage nodes. Use of Chronopolis will allow the content of the Tufts Digital Repository to be rebuilt in the case of disastrous failure of Tufts servers in Boston.

Preservica:

TARC will be responsible for the Tufts implementation of Preservica to provide the majority of digital preservation services for Tufts University records and other archives and manuscript collections. Preservica is a proprietary digital preservation system that can ingest submission information packages (SIPS) and create archival information packages (AIPS) and dissemination information packages (DIPS). Current practice is for TARC to use Preservica to normalize files for preservation, create AIPS, and securely store them on AWS servers for long term preservation with links to ArchivesSpace, which stores descriptive metadata.

Tufts Digital Repository:

The Tufts Digital Repository is a Fedora-based system with Samvera layers for administrative and discovery interfaces. Tufts has migrated to Fedora 4, which allows for better file management with the acceptance of more file types into the repository, including complex objects (for example, theses with supplementary materials). Fedora 4 also provides improved technical metadata about files in the repository, facilitating digital preservation actions, and ensuring authenticity of digital resources, including fixity checking.

Samvera provides access layers for the Fedora repository. Tufts has several Samvera-based discovery portals to allow users specialized access to digital resources, but the primary access layer is the Tufts Digital Library, which contains archival description of collections, digital archival objects, and digital objects stewarded by Tisch Library.

Fedora and Samvera are community-developed applications and Tufts, namely TARC, Tisch, and Tufts Technology Services (TTS) staff, are active participants in those communities.

Appendix B: Gaps in digital preservation services

While TARC endeavors to provide sufficient digital preservation services to fulfill its mandate and responsibilities, deficiencies in existing technologies and systems mean that the highest possible level of digital preservation coverage is not always available with the resources currently available. The following issues are acknowledged as gaps in current digital preservation services that should be remediated as possible:  

  • There are no regular fixity checks currently in place on:
    • Items in the processing backlog
    • AIPs in archival storage (regular fixity checks will occur once everything is ingested into Preservica, but this is an ongoing process process)
  • Redundancy and geographic distribution of files:
    • DuraCloud Chronopolis provides geographic distribution and redundancy of all TARC-stewarded objects available in the Tufts Digital Library – these are public files with no known restrictions. University records and archival content managed in Preservica is currently stored on AWS servers in Amazon’s US-East-1 zone. Of most concern are records most vital to the business needs of the University. Tape backups currently stored at Iron Mountain, although off-campus, are in the same general geographic area as the main servers. General best practice suggests that three copies of data be kept in geographically distinct locations. Neither the number nor location of copies is currently ideal.

 

Appendix C: Tufts University mappings to the NDSA Levels of Digital Preservation

 

Category

Current TARC level

(1-4)

NDSA definition

Explanation

Storage and Geographic Location

4

-At least three copies in geographic locations with different disaster threats

-Have a comprehensive plan in place that will keep files and metadata on currently accessible media or systems

Through the use of Chronopolis, many of TARC’s digital objects meet the redundancy and geographic distribution requirements. All metadata and digital objects in the TDL can be restored from Chronopolis in the case of disaster or institutional bankruptcy.

University records and other non-public archives and manuscript data will be stored in Preservica, creating two copies that are not geographically distributed. Tufts University datacenters, as well as those used by the Chronopolis nodes, all use current storage systems.

File Fixity and Data Integrity

3

-Check fixity of content at fixed intervals

-Maintain logs of fixity info; supply audit on demand

-Ability to detect corrupt data

-Virus check all content

TARC creates checksums for all files upon accession, checks fixity of files prior to processing, and upon ingest to the TDR. Preservica validates bag checksums upon ingest.

Fixity information for all digital objects is always readily available for audit. Automated fixity checks in Fedora 4 are randomly done on 2000 objects a day, but no system exists for ensuring all objects are checked within a given time period.

TARC employs tools to detect corrupt data upon accession and processing of materials. TTS monitors corruption at the hardware level.

All files are scanned for viruses upon accession.

Information Security

3

-Maintains logs of who performed what actions on files, including deletions and preservation actions

Through the use of bagger most actions (virus check, stabilization procedures) are recorded in the bag info file, along with the name of the person who performed those actions. Deletions from the TDR are kept in log files. Preservation actions taken in Preservica are recorded in the XIP file on the object.

Metadata

3

-Store standard technical and descriptive metadata

All files ingested in the Fedora 4 repository are analyzed by FITS and technical metadata is recorded. Standards-compliant descriptive metadata is recorded in both ASpace and MIRA.

File Formats

4

-Perform format migrations, emulation and similar activities when needed

Through the use of Preservica, TARC is normalizing file formats for access and preservation, and will normalize files for access upon researcher request to the best of current capabilities. If cases where formats are too obscure to be migrated in house, users may be asked to contribute to cost of file migration. Original files will always be available at no cost to users.

See Appendix D, below for more details about file normalization.

 

Appendix D: File Normalization

File normalization for preservation will primarily occur during Preservica processes. When necessary, TARC will also use Preservica to convert files to access formats. TARC has enacted 247 normalization rules for access and preservation in Preservica, a number that will increase as TARC ingests new formats. TARC created some of the normalization rules manually to conform to our standards. Preservica has built-in conversion scripts for as many formats as possible, including most image formats and many video and audio formats. Working with Adobe products is also possible in Preservica, facilitating conversions to PDF and PDFA, and from Illustrator and Photoshop.

Below is a breakdown of Preservica’s automated normalization rules, arranged by file format type.

 

Audio:

                Access – convert to mp3 for 29 file formats; convert to wav for 1 file format

                Preservation – convert to wav for 1 file format

Email:

                Access – convert MSG files to EML with extracted attachments

Graphics:

                Access – convert to pdf for 26 file formats

Preservation – convert to plain svg for 27 file formats

Images:

                Access – convert to jpeg for 46 file formats; convert to jpg2000 for 2 file formats; convert to tif for 1 file format

                Preservation – convert to tif for 14 file formats; convert to png for 15 file formats

Text:    

                Access: convert to pdf for 30 file formats; convert to pdfa for 1 file format

                Preservation: convert to pdfa for 9 file formats (all PDF versions); convert to ODT for 25 file formats

Video:

                Access – convert to mp4 for 22 file formats

                Preservation – convert to mkv for 3 file formats

 

Presentations:

                Access: convert to pdf for 7 file formats

                Preservation: convert to odp for 6 file formats

Spreadsheets:

                Access: convert to pdf for 22 file formats

                Preservation: convert to ods for 21 file formats