Quantum Archivist Manifesto, Part VI: It’s All About the Package

I’m really fond of creating lists of slogans that encapsulate larger ideas about the work archivists do. Lately, I’ve been thinking a lot about information packages and OAIS models of SIPs, AIPs, and DIPs. In discussions with friends and colleagues, I’ve also trotted out a lot of quantum archives theory to measure up to the package approach to archives. It seems to me that digital information packages and quantum archives have a lot in common. Looking back over the blog posts over the last couple of years, and thinking about how all this might fit together, I’ve formulated a new list of slogans for the quantum universe, or what have taken to calling the second generation digital repository. I haven’t attributed the origins of all of these ideas below, but regular readers of the Quantum Archivist should be able to pick out where they come from.

We begin with the list and follow with a bit of exposition and expansion of the items in the list. Right now there are five principles on the list, maybe the list will grow, maybe it will shrink. We’ll see…

Five Principles of the Second Generation

  1. All digital content is data
  2. All data that has value should be managed
  3. The package is the smallest unit of management
  4. All pointers refer to the “original” resource
  5. Digital curation preserves access not objects

What is data?

Data can be defined as any information suitable for manipulation, use, or reuse in an electronic environment.  This includes metadata, which is the “sum total of anything we know about an object” as well as the digital content files (electronic records, primary content objects, etc.) themselves, which, by their binary nature are inherently data.

What is a digital asset?

A digital asset is a set of data elements (metadata of all types, primary content objects, associative information, system information) combined into a “package” that is internally coherent and can be managed in an electronic environment by applications and processes.

We acquire digital assets through analog to digital conversion, born digital acquisition, and metadata creation in associated systems.

Managed data and packages

Managed data is data combined into a digital asset (or digital information) package that exists in an application that allows managers (and end users) to perform operations on the digital asset including CRUD (create, read, update, delete), and preservation activities (from checksums to migration) based on rules and authorization.

While digital asset packages should be able to exist outside of a management application, they would no longer be managed. Only managed data meets the standards of archival quality.

While a digital asset package is the smallest unit of management, the size and nature of a package can vary from package to package. In archival information systems, we make choices about the content and nature of the lowest level of managed data. For example, a set of page images of a book may exist as a single package with structural metadata that arranges them in order, or they could exist as a set of individual packages that have associated metadata that organizes them into their proper order. Either way, the end user sees a book with its pages in order.  This “lumper vs. splitter” decision is a choice we make at the institutional or even collection level.

Originals and derivatives

As mentioned above, a digital asset package may contain any number of content and metadata objects. These objects can be both archival master files (what we tend to think of as the “originals”) as well as derivatives that support access and manipulation.  In every case, derivatives are merely representations or transformations of the original object—whatever that may be—and any reference to the object refers to the original and NOT to the representation. For example, if a user interacts with a jp2000 version of a DNG image file and would like to cite that image in a scholarly work, the citation should refer to a unique identifier of that resource and not the URL of the jp2000 representation. The jp2000 version the user interacts with today is only a convenience of access in a particular time and place, and is likely to be replaced with some other access derivative in the future. This question gets a lot more complicated when analog to digital conversion is involved, and in the case of born digital text documents. Repositories must make policy decisions about what is considered the original.

Preservation, Access, and Curation

As Paul Conway said some years ago, “preservation is the action and access is the thing.” Digital preservation is the activity of insuring access to digital content over time. This opens any number of possibilities and questions, especially in the born digital world. What is an original? What do we preserve—content or format? Or are there different answers at different times and in different situations?

However we answer policy questions like the ones above, the one answer that is always right is that good digital objects are never isolated or alone, but should always exist within the context of a package that is internally coherent, and self describing. As I like to say, if you found this digital object on the floor, you could pick it up and know all you needed to know about it just by looking inside.