What Are We Trying To Save?

As our Greenhouse Studios projects move through their iterations and phases, it is beginning to be time to talk about how to document and preserve the intellectual output. Our current thinking is that preservation talk has to start during the “build” phase, but probably not before then, since insisting that any idea be “preservable” kind of makes the tail wag the dog. But, once we start building something, then it is time to figure out how to preserve it.

The question we ask now is what is the “it” we are trying to preserve? For something to be “scholarship” it has to persist. But, in what form? And what has to persist? We could follow the “FRBR” approach and say that the intellectual content of the so-called work is what matters most, and that preserving it in any form is sufficient. This is, I think, a text centric viewpoint and won’t translate to multi-modal expression, where the intellectual content of the work is just as likely to be a set of scripts or moving images as it is text.

Nevertheless, our requirement to support persistence is not alleviated just because it is difficult and the part of the Greenhouse Studios that I personally like the best is the part where we figure out how to make what are now thought of as alternative formats and means of expression persist over time.

One project is producing a short documentary film in addition to hosting some events and conversations. The film is essentially the work, and we are also preserving the raw footage, still images, and other associated research documents, very much like preserving the raw data for a scientific research project.

A more difficult problem comes from our VR project. This project seeks to tell a story though virtual reality. There are lots of research artifacts, design research, still photography, a “script” for the story, and much, much, more. The VR experience is being created the Unity (https://unity3d.com) game engine. We can preserve the bits so to speak, but that seems a bit of a disappointment to us. One idea we have is to record someone’s in-story experience that can be watched on any browser. That would at least give an indication of what the experience was about, similar to watching a recording of a play, or a performance.


Sometimes Maybe You Should Reinvent the Wheel.

I’ve been thinking a lot about the future lately, much of our recent work has been an attempt to determine what forms archival management and presentation and modern scholarship could take, and what forms will resonate with people. It isn’t easy to predict the future, but it is a lot of fun.  Especially when you don’t HAVE to be right, as our iterative development process allows us to change direction pretty easily.

The 1983 era room at the Boston Museum of Science

It reminded me that fururology is a staple of science fiction writers (of which I am a big fan–no surprise there). When I read classic (Isaac Asimov, Arthur Clarke, Robert Heinlein for example)  future-based science fiction, I try to think about how they imagined the future and what parts of it they got “right” and what they missed on or didn’t see. When Isaac Asimov writes about “personal capsules” that traverse “hyperspace” to deliver information packages that can be opened only by the addressee, I think, yeah, he got email all right. It doesn’t matter that the capsules are physical objects that disgorge “cellotape” that automatically distructs, I don’t worry about that, because he got the “fast,” and “personal” part exactly right, and if the means was physical rather than electronic, that’s not really the point.   (See a previous post about Henry Ford)

I’m reading a “lost” Heinlein novel called “For Us the Living” where a person from 1939 is  mysteriously transported to the year 2086. While certainly not one of Heinlein’s best works, it does contain a “proto-internet.”  One of the characters  talks to various people on a screen (or simply leaves an order) and requests and gets  clothing, information, and other things sent right to her home.   Never mind that the whole thing was humanly mediated, it was Amazon, Wikipedia, and the DMV rolled into one.

As we know, the seeds of the future exist in the present. I recently visited a Boston  Museum of Science exhibit called Popnology about the “…fusion of science fiction and science fact in the Museum’s newest temporary exhibition celebrating and exploring the greatest works of innovation and imagination in history.” Along with one of the actual DeLoreans from Back To The Future,  there were props from movies, excerpts from science fiction writers and more about popular visions from the past of the future .  In the center of the exhibit they put together a room from 1983 where everything in it (except for the ironing board) could be done with a cell phone today.  (You can see a photo of this room above, or as one of the gallery photos on the home page of this blog.

In a similar vein, there is the story a few years ago in the Huffington Post that took a Radio Shack (remember Radio Shack?) sale flyer and showed how almost everything (except the radar detector) could be done with a cell phone.

As a historian I understand how the past influences the future, and in many of my current activities, I now also understand that sometimes you DO have to reinvent the wheel, just in a different way.

Why We Shouldn’t Try to Save Everything

John Cook is wanted for murder, 1923. Connecticut Historical Society

A recent article in the Washington Post by UConn graduate student Matthew Guariglia talks about the dangers of keeping so much information that the sheer volume makes it impossible to sift through and make sense of, even using the most sophisticated tools available.   His is talking specifically about personal information on individuals that began to be collected in the Victorian Age by police forces attempting to deal with increasing crime in crowded industrial cities, and has escalated into the massive data collection efforts of security organizations of all modern governments.

As the availability of potentially useful data increased, from photographs to body measurements to fingerprints and beyond, management and analysis systems struggled, and ultimately failed, to keep up with this growing torrent of information.

Guariglia’s argument in part is that data analysis systems will never keep up with the ever increasing flood of data, and that massively collecting undifferentiated data actually makes us less safe because you can’t find the significant data among all the noise. What does this mean for the archivist who is charged with collecting and preserving historical documentation? I think this brings into focus even more sharply that archives are not a stream-of-consciousnes recording of “what happened” (as if that were even possible), but carefully selected and curated collections that serve the institutional needs and missions of the organizations of which they are a part. This is something that all archivists know as a matter of course and which informs their appraisal and curatorial decisions.

If only the NSA and the rest of the security apparatus would think like archivists, who knows what good things would happen?

Alphabetical Order

Yesterday I wrote a post about some things you could do with a body of digital “data” that was not specifically related to the purpose of the original documents. Later in the day, during our opening demonstration of the web site, I was reminded of the very powerful nature of the printed word in telling the story of history.  A relative of Thomas Dodd sat down and searched for the phrase: alphabetical order .

Surprisingly to me, but not to the person who typed it, the phrase returned three results from a presentation by Dodd to the Tribunal. In showing that the execution of prisoners was a calculated policy, Dodd reviewed death records from one concentration camp:

“These pages cover death entries made for the 19th day of March. 1945 between fifteen minutes past one in the morning until two o’clock in the afternoon. In this space of twelve and three- quarter hours. on these records, 203 persons are reported as having died. They were assigned serial numbers running from 8390 to 8593. The names of the dead are listed. And interestingly enough the victims are all recorded as having died of the same ailment – heart trouble. They died at brief intervals. They died in alphabetical order. The first who died was a man named Ackermann, who died at one fifteen a.m., and the last was a man named Zynger, who died at two o’clock in the afternoon.”

Just thinking a bit about what the description of this activity says about the people and government that calmly and efficiently carried out and very consciously documented the horrors described here is alarming and disturbing. I know that we often say that we live in a “post-literate” society, and that data visualization is the latest and greatest way to create an impact on that highly visual society. I think that these 122 words say more in their own way than any photo or visualization of data could.

What’s for Breakfast?

In about an hour, we will be doing a public demonstration of our new repository infrastructure. Of course most people won’t know that, they will be looking at the Nuremberg Trial papers of Thomas J. Dodd (archives.lib.uconn.edu). What they won’t see is the underlying presentation and management Drupal/Islandora application, the Fedora repository, the storage layer, and a host of decisions about metadata schemas (MODS with uri tags for subject and names), OCR (Uncorrected, machine generated), data and content models (atomistic pages brought together in a “book” using RDF) and Drupal themes (Do you like that button there or here?).

The papers themselves represent about 12,000 pages of material (about 50% of the total–we are continuing to digitize the rest) collected by then Executive Trial Counsel Thomas J. Dodd during the International Military Tribunal in Nuremberg just after WWII. There are trial briefs, depositions, documentation, and administrative memos relating to the construction and execution of the trial strategy of the U.S. prosecutors that has never before been available on line. As one of the most heavily used collections in our repository, we felt that this was an appropriate first collection for our new infrastructure. As with all digital collections, it will now be possible to access this material without having to travel to Connecticut and will open up all sorts of research possibilities for scholars of international law, WWII, the Holocaust, etc.

While all these things are very valuable and were the primary purpose for digitizing the collection, I wanted to focus this post on some unintended consequences (or opportunities) that full-text access to a body of material like this supplies. I’m a big believer in the opportunity of unintended consequences. This has never been more true in the era of digitization where documents become data that can be manipulated by  computers to uncover and connect things that would take years to do by hand, if they could be done at all.

In the course of building their case, the prosecutors collected a massive amount of information about the workings of the Nazi regime. A lot of that information is mundane, relating to supply chains (what we would today call “logistics”) and procurement, or economic output, or the movement of material and resources along transportation routes.  Without expressly meaning to, they created a picture of a wartime society that includes all sorts of information about mid-20th century Europe.

It may seem inappropriate to study the record of a global tragedy to find out what people ate for breakfast or to study the technology infrastructure of  transportation systems, but that is exactly what you can do. Digital resources create opportunities to ask research questions that could never have been asked before, and as we well know, it is not our job as archivists to decide what is an appropriate question to ask about any historical resource.

Tom Scheinfeldt Made Me Write This Post!

Sort of…I’ve been on an “anti-social” network kick for a while as I have been busy working on the Connecticut Digital Archive project. Lots of tiny details related to infrastructure that I thought would be completely uninteresting to anyone. My mistake. The beauty of blog posts is that they are in the moment and ephemeral, so if it is boring a reader or follower can just skip it. If the next one is interesting you can read it. The point is to toss it out there and add to the conversation, in the long run everything necessary will get said and everything unnecessary will be forgotten.

What does this have to do with Tom Scheinfeldt? Nothing directly and that is the point. Tom is teaching a class here at UConn about Digital Culture–I’d recommend it to anyone at UConn who has an opportunity to take it. His syllabus includes a mention of Andrew Sullivan, a former editor at the Atlantic who is of course a blogger, but who wrote an article way back in 2008 called “Why I Blog.” (Full disclosure here. I didn’t find this out for myself, I was alerted to it by my colleague Jean Nelson–who found it from one of Tom’s tweets–thanks for the tip Jean!)

Sullivan describes the blog as “the spontaneous expression of instant thought … its borders are extremely porous and its truth inherently transitory.” And, unlike print journalism or book or journal authorship “It is accountable in immediate and unavoidable ways to readers and other bloggers, and linked via hypertext to continuously multiplying references and sources.”

It is difficult for those of us who were brought up in research disciplines to “blurt” our thoughts before we have defined, refined, and attributed them to evidence.  What I ultimately understood about blogging from reading this article came from some advice Sullivan attributes to Matt Drudge that “the key to understanding a blog is to realize that it’s a broadcast, not a publication. If it stops moving it dies. If it stops paddling it sinks.”  Brevity and immediacy is the currency of the blogosphere. This doesn’t mean that posts should not be well-considered, just that they can contribute to the world without having been vetted and edited, because its value is in how it makes connections with others thinking the same thing.

The social network relies on immediacy, shout outs, and sharing, something hard for a dinosaur like me to embrace, but I will do my best. When I have something to say, I won’t worry about who wants to hear. In some ways the internet is the ultimate “build it and they will come” environment.

Raising the Floor

Yesterday I was again fortunate to participate in an event here at UConn called “Digital Media/Innovative Collaborations” a symposium organized by Tim Hunter of UConn’s Digital Media and Design program. The symposium  brought together folks from across campus who have an interest or experience in working with digital media and was organized according to Tim’s idea of the digital media “table” being supported by four “legs” of Business, Creative Arts, STEM, and Digital Humanities/Social Science.

Two excellent keynotes by Gael McGill of Harvard Medical School, and Tom Scheinfeldt of the CHNM kicked off the day and after a networking lunch, we went to breakout sessions in each topic area with an admonition for people to try to visit an area with which they were not familiar.

I was invited to speak as part of the Digital Humanities breakout session, and I chose to speak broadly about the role of digital repositories in the context of not only the Humanities, but all digital media and design. Taking Tim Hunter’s analogy a step farther, I see digital repositories as the “floor” upon which the legs of the digital media table sits.

It is repositories that supply the digital content for visualizing and are the places for created content to live and be repurposed in the future. And so without repositories the table, while it would still have legs to stand on, would not have a floor for those legs to rest on, and the structure would collapse.

The audience was filled with mostly Digital Humanities practitioners, a core group of potential users and contributors that we wanted to reach. There were some people who were hearing  one of my talks for the first time and who understood my message and a few were interested in pursuing a collaboration of some type or another. So, all in all it was a worthwhile  day and was great exposure for the repository program.




For the past several months I’ve been working with some very dedicated people both at UConn and elsewhere in Connecticut on a project that we are calling the Connecticut Digital Archive or CTDA.  The CTDA is an extension of one of the original digital aggregation projects: Connecticut History Online (CHO).

For years UConn has been managing the technical infrastructure of CHO. As UConn began to look at the next logical step in its development of digital content management, it seemed only natural that we would continue to collaborate with others in Connecticut to build, not only a shared aggregator of digital content, but to offer digital preservation services as well to libraries, museums, historical societies in Connecticut.

CHO made it possible for lots of people to make their content available to a larger audience, now the CTDA will make it possible to preserve the digital cultural heritage of Connecticut for future generations.

Follow our progress at:



Some Good Reading about Searching and Discovery

I’m always an advocate for getting out of our own bubble to see how other people doing the same thing as we do–but in different contexts–think about things.

This is not quite out of the bubble, but nevertheless worth the read.  Peter Wilkerson is an archivist by training and a technologist by profession. He is the Lead Architect for Search at Devalen, LLC in Asheville NC (here is his linked-in profile: http://www.linkedin.com/in/peterwilkerson). I’ve known Peter for more than 15 years. We worked together on an early mapping interface to directory data when I was at Tufts. More recently we have been talking about search interfaces for our Fedora repository project here at UConn.

In his recent blog post about search interfaces Peter touches on a number of issues facing archivists trying to connect users with digital or digitized content that is not necessarily well described in the traditional sense.   I think the entire blog is an interesting perspective on how to think about connecting users to information and, since it is written by an archivist, forms a sort of bridge between two worlds that are not really different in essence.

On the GOOD Side

Lots of archivists out there  are partnering with technologists-and becoming technologists themselves. Just look at a few of the poster sessions that were in the exhibit hall at yesterday’s reception at SAA. (And for those of you who are not in San Diego I hope SAA will provide a list of URLs for the presenters) and you will see that there are a lot of interesting and innovative things happening in the profession.

I wonder how we can better support and inspire those who, like so many archivists out there, don’t have access to the resources–or don’t think they do–that will enable them to engage in the interactive/social/participatory aspect of Archives. In an attempt to move the profession forward and keep it relevant, we cannot leave behind those who do not have the access to resources that more fortunate organizations have.

The digital divide will always be with us. It will always be the responsbilitiy of the better resourced organizations to help and support the profession as a whole. And this has been a hallmark of the archives profession  for as long as I have been a part of it.


BUT, and this is also a key point, it works both ways. No matter how small your organization–from lone arranger to part time volunteers–you CAN participate in the larger world by creating partnerships with those who are connected and take advantage of those partnerships to fully participate in the profession and make the most of what you have.


As Jon Voss said, “it is all about building a better time machine” and we are the only ones who can do it.