I honestly don’t know who this person is, but I thoroughly enjoyed reading this, and know from bitter experience that it is true.
Enjoy!
I honestly don’t know who this person is, but I thoroughly enjoyed reading this, and know from bitter experience that it is true.
Enjoy!
For those of you who may not have heard, a group of large institutions led by the Berkman Center at Harvard University is seeking to build what they are calling the digital public library of America or DPLA. Funded by the Sloan and Arcadia foundations to the tune of $5 million the project has the potential to revolutionize the world of digital resources.
Honestly, I’m rooting for it. This is potentially the most transformative thing to come along since the printing press- no hyperbole there.
I’m here at the first open plenary meeting at the National Archives listening to presentations by the leaders and commentary by the audience.
What does this mean for archivists? If then DPLA actually succeeds, the our task in the future will be to make sure that our collections can become a part of this enormous resource. The DPLA is still in its infancy, and the meeting today is moving so fast it is hard to digest it all, but I can say that this is a project to watch.
More to come on this when I have a chance to think. You can go to the DPLA Web site, mentioned above, and see the live feed of the meeting, or watch the whole thing later. It is worth the time.
I’m really fond of creating lists of slogans that encapsulate larger ideas about the work archivists do. Lately, I’ve been thinking a lot about information packages and OAIS models of SIPs, AIPs, and DIPs. In discussions with friends and colleagues, I’ve also trotted out a lot of quantum archives theory to measure up to the package approach to archives. It seems to me that digital information packages and quantum archives have a lot in common. Looking back over the blog posts over the last couple of years, and thinking about how all this might fit together, I’ve formulated a new list of slogans for the quantum universe, or what have taken to calling the second generation digital repository. I haven’t attributed the origins of all of these ideas below, but regular readers of the Quantum Archivist should be able to pick out where they come from.
We begin with the list and follow with a bit of exposition and expansion of the items in the list. Right now there are five principles on the list, maybe the list will grow, maybe it will shrink. We’ll see…
Data can be defined as any information suitable for manipulation, use, or reuse in an electronic environment. This includes metadata, which is the “sum total of anything we know about an object” as well as the digital content files (electronic records, primary content objects, etc.) themselves, which, by their binary nature are inherently data.
A digital asset is a set of data elements (metadata of all types, primary content objects, associative information, system information) combined into a “package” that is internally coherent and can be managed in an electronic environment by applications and processes.
We acquire digital assets through analog to digital conversion, born digital acquisition, and metadata creation in associated systems.
Managed data is data combined into a digital asset (or digital information) package that exists in an application that allows managers (and end users) to perform operations on the digital asset including CRUD (create, read, update, delete), and preservation activities (from checksums to migration) based on rules and authorization.
While digital asset packages should be able to exist outside of a management application, they would no longer be managed. Only managed data meets the standards of archival quality.
While a digital asset package is the smallest unit of management, the size and nature of a package can vary from package to package. In archival information systems, we make choices about the content and nature of the lowest level of managed data. For example, a set of page images of a book may exist as a single package with structural metadata that arranges them in order, or they could exist as a set of individual packages that have associated metadata that organizes them into their proper order. Either way, the end user sees a book with its pages in order. This “lumper vs. splitter” decision is a choice we make at the institutional or even collection level.
As mentioned above, a digital asset package may contain any number of content and metadata objects. These objects can be both archival master files (what we tend to think of as the “originals”) as well as derivatives that support access and manipulation. In every case, derivatives are merely representations or transformations of the original object—whatever that may be—and any reference to the object refers to the original and NOT to the representation. For example, if a user interacts with a jp2000 version of a DNG image file and would like to cite that image in a scholarly work, the citation should refer to a unique identifier of that resource and not the URL of the jp2000 representation. The jp2000 version the user interacts with today is only a convenience of access in a particular time and place, and is likely to be replaced with some other access derivative in the future. This question gets a lot more complicated when analog to digital conversion is involved, and in the case of born digital text documents. Repositories must make policy decisions about what is considered the original.
As Paul Conway said some years ago, “preservation is the action and access is the thing.” Digital preservation is the activity of insuring access to digital content over time. This opens any number of possibilities and questions, especially in the born digital world. What is an original? What do we preserve—content or format? Or are there different answers at different times and in different situations?
However we answer policy questions like the ones above, the one answer that is always right is that good digital objects are never isolated or alone, but should always exist within the context of a package that is internally coherent, and self describing. As I like to say, if you found this digital object on the floor, you could pick it up and know all you needed to know about it just by looking inside.
They say that there are two kinds of people in the digital world, those who have lost data and those who are going to lose data. Today I was unfortunate enough to join the group of those who have lost data. The Quantum Archivist Blog endured some sort of corruption which I was unable to fix, so the whole thing was blown away and I restored from a data backup that was not a current as I would have liked.
I did manage to save the text of the posts of the last few weeks, but the posts themselves were not in the backup. That plus all kinds of theme customization, widgets, plug ins, etc. and it looks like a lost day of restoration in front of me.
Just goes to show you, that when we talk about persistence and the need for good preservation and disaster recovery systems, it isn’t sexy, but it would have saved me a lot of time today if I had been following my own prescription!
I think it is important to keep in mind that the information universe beyond our repository is the ultimate audience and community for the material we steward. We don’t manage our repositories for their own sake, but because the materials in them have social or cultural value. Our job is to make it possible for people to use these materials that have been entrusted to us. Has this equation changed in the digital era? Let’s think about it. If in the paper world, preservation of the physical object had no real value unless the object could be used, can we say that preservation in the digital world has no real value if the digital content is not linked to other content? Is it true that only information that is linked will be discovered and used, and the more links the more use? I’d like to make that statement and see if it holds up.
Some years ago, before the arrival of social networking, Paul Conway wrote that “preservation is the creation of digital products worth maintaining over time.” Conway’s measure of worth at the time was the value added by the digitization process that could make the digital product more useful and critical to the collection and the institution that created it. That worth generally was internally contained within the object itself or tied to the application which it lived and was delivered. Today, I think the value proposition has shifted from an internal measure to an external one, and one that demands interoperability. We can say that digital products worth maintaining over time are those that are the most connected to users and scholarship and have achieved a sort of transcendence over their original use or purpose through their connections with other objects or scholarship. They have achieved what Bob Metcalfe called the network effect.
Metcalfe’s law (as explained by computer scientist Jim Hendler) was developed in the late 1980s and originally described in part the ” value of a network service to a user that arises from the number of people using the service.” While a network can grow “linearly with the number of connections, the value was proportional to the square of the number of users.”
A corollary to Metcalfe’s law was actually more relevant to the web in particular. While the number of connections to the network was important, it was the linking of content in that network that was the key to the value of a resource on the web. This corollary is most famously demonstrated by Google’s page ranking algorithm.
According to Bob Metcalfe, the originator of Metcalfe’s Law, the value of digital content to a particular community will exceed the cost of maintaining that content if there are enough links and communities built around that content to exceed a “critical mass.” Since the cost of networks (and network storage), as well as the cost of connectivity is going down, while the potential uses (though linking) of digital content is ever increasing, the critical mass of links necessary to make a digital resource “valuable” is also decreasing.
To re-interpret Paul Conway’s aphorism, the worth of digital products is vested in how and how often they are linked to other resources and scholarship on the web. And preservation is not only the “preservation of access,” but what I would call the “preservation of connections” that are the heart of modern scholarship.
(With apologies to Horace Greeley–who has a town named after him not far from here.)
I usually keep this blog focused on things and ideas I encounter in my job, and not about the job itself. Every job has its good and bad, its ups and downs, its challenges, successes and failures, so there is really nothing to talk about on that score.
I think I’ve been pretty lucky to have had the opportunity to work in a variety of positions in a variety of organizations, from museums, to special collections, to academic libraries. In every position I’ve had the chance to experience how the different sub-cultures of content and information management view and organize their worlds. I’ve always been more impressed with the similarities rather than the differences and have always tried to be an advocate for aggregation rather than silos. On the whole, I’m happy to see that trend gaining momentum among archivists, museum curators, and librarians.
I’ve also had the good fortune to collaborate with some truly brilliant folks–who are not necessarily well-funded, or in what are called leadership positions–across the country and around the world who believe in the work they do and use their imaginations and intelligence to do that work as best they can, and are happy to share their experiences and thoughts with others in the best spirit of the term collaboration.
I tell the people who take my classes or participate in workshops that I teach that now is the best time in history (or even longer) to be involved in documenting cultural heritage, because not only can we collect more and more content, but most importantly, we can share what we collect in ways that were absolutely inconceivable when I first got into this profession. We can arrange, rearrange, and deliver our collections in ways that transcend the items themselves and make the whole much, much, more than the sum of its parts.
With all that in mind, I took a long look at what I was doing and where I wanted to be doing it, and decided that it was time to return to my roots in a number of ways. First, while I love technology, I don’t love technology for the technology itself, but for what we can DO with technology in the service of collections. I’ve often said in these pages and elsewhere that “Content is King” and so I felt that it was time I started working with and being responsible for collections again and not just the technology that serves them. Second, I decided that at heart I am an Easterner, and no matter how many days of sunshine there are in Colorado (and there are a LOT of them), and how beautiful the mountains are (and they ARE beautiful); in the West, history really belongs to the archaeologists and the anthropologists and not so much to the collectors of written history, and the history of the West, while interesting, is not my passion.
So, it is with a sense of great excitement, nostalgia, and anticipation that we are packing the family up and making the reverse migration back East. Beginning in July I’ll be starting a new job as the Director of Special Collections and Archives at the University of Connecticut. It is a homecoming of sorts since I lived in Connecticut during my graduate school days at Trinity, and had been to UConn a number of times for this and that while I was at Tufts. My wife is also excited to be moving to another institution in the area, and one that I admire and respect, as the Director of Preservation Services at the NEDCC.
But never fear, the Quantum Archivist will continue to toss out thoughts that, I hope, will make you all think about the work we do and the world in which we do it for a long time to come.
follow: