Another step along the path from analog to digital thinking in archival access is to stop thinking about our collections as unique, even if they are one of a kind. What does this mean?
When all access to analog content was by way of the reading room, everything existed in an environment of scarcity, since a one-of-a-kind document, like this 1815 membership certificate from the Windham County Agricultural society could only be experienced in one place, and at limited times. This was scarcity of opportunity. Since most manuscript collections were never published in any form, this scarcity seemed a permanent condition. In fact, some repositories, perversely it seems to us now, prided themselves on the fact that people were forced to come to their reading rooms from all over the world to view their treasures.
Digitization changed all that. Repositories now pride themselves on how much of their collections are available 24 x 7, and in the number of places they are discoverable. Ubiquity has replaced scarcity as the coin of the realm so to speak. The original documents remain as unique as before, but their ability to be ubiquitous gives them as much value as their uniqueness. How does this change the way we think about value in what we do?
At UConn Library we are involved in a project to develop a systematic data architecture, although we don’t quite use that term, which is more of an IT term. According to Wikipedia, “In information technology, data architecture is composed of models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data systems and in organizations.”
This definition does not address the preservation or sustainabilty aspect of data management that is central to the data curation lifecycle, but data architecture is meant to be only one of the aspects of what is called solution architecture.
Like many organizations that made the transformation from the analog to the digital world, Libraries have over the years developed multiple and sometimes conflicting solutions, systems, and policies for managing digital collections and files in their domain. These solutions were usually implemented to solve particular problems that arose at the time, with less thought of how those decisions would have large-scale impact, often because there was no large scale impact, or there was no way for these decisions to affect other areas of the organization. And of course external vendors were only too happy to sell libraries “solutions” that were specific to a particular use case.
As digital content became the medium of activity and exchange, systems improved and became more flexible, it is now possible, and in fact necessary, to look at our data management systems more broadly.
If we keep in mind that, at the root, all digital content is “ones and zeros” and that any system that can manage ones and zeros is potentially useful to a library, no matter where it comes from or what it is sold or developed for, then we can build an approach, or data architecture, that will serve us well, efficiently, and effectively.
How we get to that point is easier said than done. In order to get beyond thinking about the system first we need to understand the nature or characteristics of our data. That’s where records management thinking intersects with this. RM thinking assesses the needs and limits of access and persistence (or what RM folks would call retention). Based on those criterial records are held and managed in certain ways and in certain environments to meet the requirements of their characteristics. For example, sensitive records may be stored in a more secure facility than non-sensitive records.
How does RM thinking apply to digital libraries? The RM idea is embodied in the DCC’s Lifecycle model, and many digital archivists have internalized this idea already. Many librarians, who work more with current data, have had less of a reason to internalize the DCC model of data curation into their work, and the model has generally only been applied to content already designated as preservation worthy. What would it mean to apply RM/Lifecycle thinking to all areas of library content?
We have been mapping the relationships among different content types that the library is responsible for in terms of six different characteristics:
IP rights holder
current management platform
Current access platforms
Then we are going to look at the characteristics the content types have in common, and develop a set of policies that govern the data that has these characteristics, and only then will we look to use/alter/build/purchase applications and systems to implement these policies.
It is always difficult to separate applications from the content they manipulate, but it is essential to do so in order to create a sustainable data architecture that puts the content first and the applications second.
Our project is in its early phases, and the map linked to above is very much a work in progress. Check back often to see the evolution of our thinking.
It used to be when I taught how to do digital projects, we said that you should have good metadata about your content before your digitized it, because without good metadata, how could anyone find what you had digitized? Ignoring for now, the question of what “good” metadata was or is, this left us with digitizing only those things we know well, and could create metadata for.
Luckily the scanning process was so slow that we could pretty much keep up with the scanning throughput–we generally did cataloging while we waited for the scanner to slide across the bed of the Epson 1600.
Well, as is the theme of this blog, times change. Scanners are faster–in fact we use camera capture rather than scanners, and much of our content comes to us born digital so we don’t have the digital capture bottleneck to worry about.
Now the bottleneck is metadata creation. The previous post alluded to one process approach to automating data entry, today I want to talk about a philosophical approach.
Rather than digitizing things that are well described, I’m advocating digitizing and making available things that are not well described as well. One approach to research access to analog archival collections was to “get reasearchers to the right box” and if you were really lucky to the right folder, and then let them have at it to find what they wanted.
We can apply that same idea to digital resources. Making available 2,000 images each with the title “Commencement 2016” serves the same purpose as giving a researcher a box of photos for them to sort through. But, with an online access tool, I can browse dozens of photos at once, zoom in on interesting ones in ways I can’t do easily with the analog version (if one exists) . I have done the technological equivalent of getting them to the box.
So from the macro level to the micro level, you never know what is going to happen. We have an artificial collection that was created over some 20 years of “alternative” news and information sources relating mostly to late-20th century counter culture groups. The collection fills about a dozen filing cabinets, with folders that may contain two issues of a newsletter, or fifty flyers from a protest group. Each folder has a typewritten title, sometimes referring to the title of the publication, sometimes referring to an idiosyncratic subject term. It has been a daunting task to think about creating an online index of these resources, the data entry alone would be an enormous task. And once we did that, there would be enormous pressure to provide online access to the contents as well.
With some seed funding from a private donor, we are beginning to digitize the collection, and create online access to the resources. We made some decisions that are consistent with the idea of “quantum archives,” and applied some technological solutions to a difficult problem.
First, we defined the smallest unit of description to be the folder. No matter if the folder had 20 different documents or a homogenous set, we would digitize at the folder level. A user would discover the folder, and then browse through the pages in the folder (or use full text searching) until they find what they want. Folder titles and one or two genre terms would be the initial entry points.
In order to automate data entry (remember that the folder titles are typed) we purchased a text scanning stylus. Using a spreadsheet, we attach and scan the barcode of the folder, the title of the folder, and genre terms from a typewritten sheet. there are no typographical errors, and with the scanning pen, we can enter data at a rate far higher than hand typing.
Once we populate the spreadsheet, we use other processes to convert the spreadsheet into MODS xml descriptive metadata records, pair them with the set of scanned objects from the folder and use a batch process to ingest them into the preservation digital repository. After a bit of tinkering with settings, workflow, and process, we are far exceeding the throughput of a manual process.
We are working with a professor in the Digital Media and Design department on a project that leverages archival documents in an unusual environment. Called Courtroom 600 after the room in which trial was held, this project creates an immersive virtual reality experience at the Nuremberg war crimes tribunal. But rather than just using the archives to do research and help them create the environment, the VR designers are incorporating the documents themselves into the experience. When a participant encounters a defendant or some event, he or she can call up relevant documents from the archives within the VR environment, and get background information or more details in order to understand what is happening in the virtual space. Kind of like looking stuff up on your iPad while you are watching a movie.
We are still in the very early phases of the project, and are learning what kinds of demands an educational VR experience makes on our collections, but as you can see in this early screen shot, we are taking the archives to an entirely new dimension!
I’m sure that the iPhone changed the world in as many ways as there are people to write about it. But, I haven’t yet seen anyone write about how iPhone revolutionized archives. So, here and now, I’m going to take a stab at a short list of suggestions about how the iPhone altered the landscape of archives. Interestingly most of these relate to the iPhone as a camera, rather than a phone, but hey, lots of folks don’t really use it as a phone anyway.
Five Ways the iPhone Revolutionized Archives
The end of the photocopier
Geospatial and time/date precision in resource description
The end of family snapshots on film
Video becomes the snapshot of the current era
The end of the paper scrapbook, the challenge of social media
First some easy ones.
The End of the Photocopier
Smartphones enabled reading room users to make reference copies of documents without subjecting them to the stress of photocopying. As reading rooms embraced the self service aspect of personal reproductions and even required it, the ubiquitous photocopier, with the copyright disclaimer sometimes attached to the copybed, disappeared from reading rooms. The loss of all those $0.05 charges was more than offset by the reduction in work and effort to maintain, run, and manage the photocopier. Although photocopier statistics were used to justify existences, archivists soon found other better things to do than make copies.
Geospatial and time/date precision in resource description
iPhones know where and when they are, and they attach this information to everything the handle. This makes it possible to get driving directions, and it also makes it possible to know, with very little doubt, exactlywhere a photo was taken.
No longer do we have to confront the words “possibly” or “unknown” in place or time metadata fields, at least in photos taken with smartphones. On the flip side, integrated and cloud photo management tools, simultaneously make it easier for people to manage their photos, and harder for archivists to get their hands on them later. More on that below.
The end of family snapshots on film
The family snapshot was being replaced by digital photography before the smartphone, but many cameras, and printers came with a means to directly output digital photo files to print. The iPhone, and the accompanying photo management tools pretty much ended that practice. Slideshow apps on televisions and computer screens replaced the framed photo, and photo sharing apps obviated the need to make prints. Even grandmothers show off photos of the grandkids by pulling out their phones and not their wallets.
Video becomes the snapshot of a new generation
Seven years ago I wrote a post about video being the new snapshot. In the intervening years I have seen that trend accelerate. No only do grandmothers pull out their phone to show off their grandkids, but they will just as likely show you a video of the young tyke as they will a still photo. With social media becoming more video friendly (Facebook especially) the moving image is becoming the recording medium of choice. Why do we care? In some senses we don’t, file size is not the issue it once was, and so many management and presentation systems can deal with moving image files that it really isn’t a big deal in a technical sense. It is harder to describe time-based media than still images, but the challenges of description are not inherent to video.
Now the hard part:
The end of the scrapbook, and the challenge of social media
While “scrapbooking” is alive and well as a craft activity, the more mundane practice of saving photos in albums with black pages that you write on with white ink is pretty much over for the general population. The modern form of casual life documentation is, wait for it….Facebook.
Although, according to Facebook “you own all of the content and information you post,” most people would be hard-pressed to figure out how to extract any of it. And although it can be done relatively easily most people would never think to do it. If you die before you do it, it becomes almost impossible for anyone to gain access to the account or to its contents except through the Facebook interface unless you have designated in advance of your death (or in a will I suppose) someone called a legacy contact. This legacy contact must be a friend of yours on Facebook and then will have permission to download your content. That’s not quite the same as your grandchildren going through the attic and deciding what to do with a bunch of stuff up there, because you have to think of it ahead of time.
All of this is directly related to the way that the smartphone integrates itself into your information world and directs your activities without you even noticing. This is a real and significant result of the iPhone.
These things are not, in and of themselves bad, they just make the archivist’s job harder, and makes us understand even more that while so much of our work in the digital age is just like our work in the analog age, there is so much of our work that is different. The most significant point I’ve been seeing is that we have to make archival decisions at the point of creation, because when the records become inactive, it may be too late.
I’ve been thinking a lot about the future lately, much of our recent work has been an attempt to determine what forms archival management and presentation and modern scholarship could take, and what forms will resonate with people. It isn’t easy to predict the future, but it is a lot of fun. Especially when you don’t HAVE to be right, as our iterative development process allows us to change direction pretty easily.
It reminded me that fururology is a staple of science fiction writers (of which I am a big fan–no surprise there). When I read classic (Isaac Asimov, Arthur Clarke, Robert Heinlein for example) future-based science fiction, I try to think about how they imagined the future and what parts of it they got “right” and what they missed on or didn’t see. When Isaac Asimov writes about “personal capsules” that traverse “hyperspace” to deliver information packages that can be opened only by the addressee, I think, yeah, he got email all right. It doesn’t matter that the capsules are physical objects that disgorge “cellotape” that automatically distracts, I don’t worry about that, because he got the “fast,” and “personal” part exactly right, and if the means was physical rather than electronic, that’s not really the point. (See a previous post about Henry Ford)
I’m reading a “lost” Heinlein novel called “For Us the Living” where a person from 1939 is mysteriously transported to the year 2086. While certainly not one of Heinlein’s best works, it does contain a “proto-internet.” One of the characters talks to various people on a screen (or simply leaves an order) and requests and gets clothing, information, and other things sent right to her home. Never mind that the whole thing was humanly mediated, it was Amazon, Wikipedia, and the DMV rolled into one.
As we know, the seeds of the future exist in the present. I recently visited a Boston Museum of Science exhibit called Popnology about the “…fusion of science fiction and science fact in the Museum’s newest temporary exhibition celebrating and exploring the greatest works of innovation and imagination in history.” Along with one of the actual DeLoreans from Back To The Future, there were props from movies, excerpts from science fiction writers and more about popular visions from the past of the future . In the center of the exhibit they put together a room from 1983 where everything in it (except for the ironing board) could be done with a cell phone today. (You can see a photo of this room above, or as one of the gallery photos on the home page of this blog.
In a similar vein, there is the story a few years ago in the Huffington Post that took a Radio Shack (remember Radio Shack?) sale flyer and showed how almost everything (except the radar detector) could be done with a cell phone.
As a historian I understand how the past influences the future, and in many of my current activities, I now also understand that sometimes you DO have to reinvent the wheel, just in a different way.
A recent article in the Washington Post by UConn graduate student Matthew Guariglia talks about the dangers of keeping so much information that the sheer volume makes it impossible to sift through and make sense of, even using the most sophisticated tools available. His is talking specifically about personal information on individuals that began to be collected in the Victorian Age by police forces attempting to deal with increasing crime in crowded industrial cities, and has escalated into the massive data collection efforts of security organizations of all modern governments.
As the availability of potentially useful data increased, from photographs to body measurements to fingerprints and beyond, management and analysis systems struggled, and ultimately failed, to keep up with this growing torrent of information.
Guariglia’s argument in part is that data analysis systems will never keep up with the ever increasing flood of data, and that massively collecting undifferentiated data actually makes us less safe because you can’t find the significant data among all the noise. What does this mean for the archivist who is charged with collecting and preserving historical documentation? I think this brings into focus even more sharply that archives are not a stream-of-consciousnes recording of “what happened” (as if that were even possible), but carefully selected and curated collections that serve the institutional needs and missions of the organizations of which they are a part. This is something that all archivists know as a matter of course and which informs their appraisal and curatorial decisions.
If only the NSA and the rest of the security apparatus would think like archivists, who knows what good things would happen?
“If I had asked people what they wanted, they would have said faster horses.”
–attributed to Henry Ford
It is now pretty well established that Henry Ford never actually said this (Here is a good explanation of the phrase’s origins) but like so many other aphorisms it no longer matters whether he did or not. People have parsed this quote in forums from the Harvard Business school to design blogs and more. I had always thought that true innovation was disruptive and developed what users didn’t know they needed, but somehow was known by the innovator. I even wrote a post about this more than seven years ago.
Yesterday the Ford phrase resurfaced in a conversation about innovation and user feedback that made me think about the phrase, user feedback, and innovation in a new way. In this discussion I had made my usual comment that users can only tell you what they want from the perspective of what they know and that the faster horses quote was proof that sometimes you have to ignore user feedback to achieve true innovation.
My conversational partner replied that in fact Ford HAD listened to the users, just not in the way we usually think of listening. By understanding that the users wanted “faster” and were using “horses” as the metaphor, Ford gave them “faster” in the form of an automobile, which was ultimately what it was all about. (Many other attibutes made Ford’s cars successful, but for our current purposes we will ignore them, because after all Ford never said this anyway. ) By decoupling “faster” and “horses” and understanding that faster was the relevant needs statement we discover that the users did actually express a valuable bit of information necessary for successful innovation. Because, as we all know innovation has to serve a need, even if that need is hidden from common view.
I don’t think that I will look at user feedback in quite the same way again.
A few months back, I was asked to talk about the kind of work I had done in digital libraries/digital repositories in the past and how it differed from the kind of things I was doing now. It occurred to me that I had been around for a while and had the opportunity to live through, and have a small amount of influence on, many developments in digital repositories. Rather than write a narrative, I decided to create a short slide deck of screen shots of different projects I have worked on over the years and how they had evolved along with the profession. Sadly, but not surprisingly, I was forced to go to the Wayback Machine for many of the earlier projects. But of course that just illustrated what I was talking about, that early digital projects were not concerned with preservation, only access.
Now, I concern myself a lot with trying to create policies for digital content that libraries have taken in over time, but haven’t really considered how to deal with over the long term. For now, enjoy this trip down memory lane, and, if you have been around awhile as well, you might build your own timeline of history.