We are continually looking for more effective ways to connect people to archives and help them understand the value of archives to a modern society and culture, I want to pass along an idea that an archivist here at UConn implemented in conjunction with the student radio station. “D’Archive” is a weekly show featuring conversation, commentary, interaction with primary sources, and more.
Graham Stinnett, Outreach Archivist, at the Archives and Special Collections, hosts and coordinates the content and guests, which will include archivists, researchers, and the general community. If you are interested in hearing the live version of d’Archive, air time is 10am on Thursdays at 91.7 FM if you are in the Northeastern Connecticut area, or streaming live at http://fm.whus.org/ from anywhere in the world.
I’ve been continuing to experiment with the Kumu social networking application, seeing how I can use it to visualize all sorts of data. I’ve gotten better at manipulating the display to make the maps easier to use.
My current experiment is to take a search result set from the Connecticut Digital Archive do some minimal manipulation on it, and put it into a Google sheet that I link to the visualization app. The result is running on a test server, and is quite interesting I think.
For this basic test, I did a simple search in the repository for “1925” not specifying any metadata field, but just looking for it somewhere in a record, expecting that most results would have 1925 in the date. But, that wasn’t always the case, and the outliers proved to be more interesting that the expected results.
Using the tool, you can arrange content by date, owning institution, or creator. When I arranged by “Date” I got this interesting circle around 1943. Not understanding why that would happen, I took a closer look and discovered that all of the photos were taken in 1943 as worker identification photos for the Post boatyard in Mystic Connectcut. In the description, each worker was identified by his name and birthdate. These 20 or so men (out of more than 200 of these images in the repository) were all born in 1925. I wonder if they knew that?
I think tools like this can make it interesting and informative to do “sloppy” or simple searches, and find hidden relationships that come out of the data.
We are hiring a new Head of Archives and Special Collections at the UConn Library. We are looking for a creative, progressive and forward thinking leader to build and present research collections that support scholarship and community engagement; who is committed to new directions for modern special collections; and who is highly knowledgeable about emerging information technologies.
You would lead a staff of five professional archivists, one full time paraprofessional services staff member, and an ever-changing cast of project and grant funded staff, who develop, prepare, and manage archival and unique collections and create innovative programs to connect them with students, scholars, and the citizens of Connecticut and the world. This is an opportunity for an individual interested in providing leadership in a fast moving, highly collaborative work environment.
A lot of the talk here at Digital Directions is about thinking of your digital collections as data. One definition I like is “information that has been translated into a form that is efficient for movement or processing.” This idea that the purpose of building digital collections is no longer to create a faithful representation of a physical object, but to provide a resource that transcends the original purpose of form of the object is becoming more common. I used a new slide in my management presentation this year to show how I believe the original purpose of digital projects has been upended.
It used to be that the primary purpose of digitization projects was to provide a digital representation or copy of the original analog object, with as much fidelity to the original as possible. At the time, that was something of a tall order. Nowadays, while we still do that, we also make it possible, through both technology and creative commons licensing, for people to manipulate the content in ways not part of the original purpose of the digital original.
We have stood the old model on its head and are getting closer to the envisioned future of the potential of digital archives.
I’ve been a part of Digital Directions for more than 10 years. Digital Directions is a workshop, conference, training event for beginning digital repository managers and administrators run by the Northeast Document Conservation Center since 1995. For the next few days, I’ll be talking about some of the things that I’m hearing, seeing and talking about here in Seattle during the 2017 edition of Digital Directions.
I always thought that data visualization took more or different brain power than I possessed. I was never very good with ARC GIS or other georeferencing tools. I tried Tableau with very limited success. This is not to say that these tools are bad, on the contrary, they are very good. But somehow they didn’t click with me. As part of a project I mentioned in an earlier post, I started using a relationship mapping tool called Kumu. Kumu is a web based product that was originally designed to map social networks. You can create relationship maps in a number of ways. The first way I tried was visually, by clicking on the map and adding balloons, and then connecting them by hand. Once I understood how that worked, I created my content and their relationships using a spreadsheet, and then imported it into the system. Finally, I created a relationship map using a Google spreadsheet that allows live updating of content, and I’ve set the system to automatically generate connections based on the data elements I enter.
All told this learning curve lasted about two weeks, I made and discarded about 10 projects as I learned what to do and in what order to do it. The help documents are pretty good too. But the thing that helped me the most was that I could start by creating things right in the visualization map. Anything you did on the map was translated into a spreadsheet data source within the application. Once I could see how what I did on the map affected the stored data on the spreadsheet I was able to reverse engineer my way to starting from a spreadsheet and a more sophisticated use of the application.
I’m not saying that this will work for everyone, but it worked for me. What I will say is that online-based tools are getting much more accessible to the novice, and with a little work and experimentation, you can make things that look really good.
Another step along the path from analog to digital thinking in archival access is to stop thinking about our collections as unique, even if they are one of a kind. What does this mean?
When all access to analog content was by way of the reading room, everything existed in an environment of scarcity, since a one-of-a-kind document, like this 1815 membership certificate from the Windham County Agricultural society could only be experienced in one place, and at limited times. This was scarcity of opportunity. Since most manuscript collections were never published in any form, this scarcity seemed a permanent condition. In fact, some repositories, perversely it seems to us now, prided themselves on the fact that people were forced to come to their reading rooms from all over the world to view their treasures.
Digitization changed all that. Repositories now pride themselves on how much of their collections are available 24 x 7, and in the number of places they are discoverable. Ubiquity has replaced scarcity as the coin of the realm so to speak. The original documents remain as unique as before, but their ability to be ubiquitous gives them as much value as their uniqueness. How does this change the way we think about value in what we do?
At UConn Library we are involved in a project to develop a systematic data architecture, although we don’t quite use that term, which is more of an IT term. According to Wikipedia, “In information technology, data architecture is composed of models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data systems and in organizations.”
This definition does not address the preservation or sustainabilty aspect of data management that is central to the data curation lifecycle, but data architecture is meant to be only one of the aspects of what is called solution architecture.
Like many organizations that made the transformation from the analog to the digital world, Libraries have over the years developed multiple and sometimes conflicting solutions, systems, and policies for managing digital collections and files in their domain. These solutions were usually implemented to solve particular problems that arose at the time, with less thought of how those decisions would have large-scale impact, often because there was no large scale impact, or there was no way for these decisions to affect other areas of the organization. And of course external vendors were only too happy to sell libraries “solutions” that were specific to a particular use case.
As digital content became the medium of activity and exchange, systems improved and became more flexible, it is now possible, and in fact necessary, to look at our data management systems more broadly.
If we keep in mind that, at the root, all digital content is “ones and zeros” and that any system that can manage ones and zeros is potentially useful to a library, no matter where it comes from or what it is sold or developed for, then we can build an approach, or data architecture, that will serve us well, efficiently, and effectively.
How we get to that point is easier said than done. In order to get beyond thinking about the system first we need to understand the nature or characteristics of our data. That’s where records management thinking intersects with this. RM thinking assesses the needs and limits of access and persistence (or what RM folks would call retention). Based on those criterial records are held and managed in certain ways and in certain environments to meet the requirements of their characteristics. For example, sensitive records may be stored in a more secure facility than non-sensitive records.
How does RM thinking apply to digital libraries? The RM idea is embodied in the DCC’s Lifecycle model, and many digital archivists have internalized this idea already. Many librarians, who work more with current data, have had less of a reason to internalize the DCC model of data curation into their work, and the model has generally only been applied to content already designated as preservation worthy. What would it mean to apply RM/Lifecycle thinking to all areas of library content?
We have been mapping the relationships among different content types that the library is responsible for in terms of six different characteristics:
IP rights holder
current management platform
Current access platforms
Then we are going to look at the characteristics the content types have in common, and develop a set of policies that govern the data that has these characteristics, and only then will we look to use/alter/build/purchase applications and systems to implement these policies.
It is always difficult to separate applications from the content they manipulate, but it is essential to do so in order to create a sustainable data architecture that puts the content first and the applications second.
Our project is in its early phases, and the map linked to above is very much a work in progress. Check back often to see the evolution of our thinking.
It used to be when I taught how to do digital projects, we said that you should have good metadata about your content before your digitized it, because without good metadata, how could anyone find what you had digitized? Ignoring for now, the question of what “good” metadata was or is, this left us with digitizing only those things we know well, and could create metadata for.
Luckily the scanning process was so slow that we could pretty much keep up with the scanning throughput–we generally did cataloging while we waited for the scanner to slide across the bed of the Epson 1600.
Well, as is the theme of this blog, times change. Scanners are faster–in fact we use camera capture rather than scanners, and much of our content comes to us born digital so we don’t have the digital capture bottleneck to worry about.
Now the bottleneck is metadata creation. The previous post alluded to one process approach to automating data entry, today I want to talk about a philosophical approach.
Rather than digitizing things that are well described, I’m advocating digitizing and making available things that are not well described as well. One approach to research access to analog archival collections was to “get reasearchers to the right box” and if you were really lucky to the right folder, and then let them have at it to find what they wanted.
We can apply that same idea to digital resources. Making available 2,000 images each with the title “Commencement 2016” serves the same purpose as giving a researcher a box of photos for them to sort through. But, with an online access tool, I can browse dozens of photos at once, zoom in on interesting ones in ways I can’t do easily with the analog version (if one exists) . I have done the technological equivalent of getting them to the box.
So from the macro level to the micro level, you never know what is going to happen. We have an artificial collection that was created over some 20 years of “alternative” news and information sources relating mostly to late-20th century counter culture groups. The collection fills about a dozen filing cabinets, with folders that may contain two issues of a newsletter, or fifty flyers from a protest group. Each folder has a typewritten title, sometimes referring to the title of the publication, sometimes referring to an idiosyncratic subject term. It has been a daunting task to think about creating an online index of these resources, the data entry alone would be an enormous task. And once we did that, there would be enormous pressure to provide online access to the contents as well.
With some seed funding from a private donor, we are beginning to digitize the collection, and create online access to the resources. We made some decisions that are consistent with the idea of “quantum archives,” and applied some technological solutions to a difficult problem.
First, we defined the smallest unit of description to be the folder. No matter if the folder had 20 different documents or a homogenous set, we would manage and describe at the folder level. A user would discover the folder, and then browse through the pages in the folder (or use full text searching) until they find what they want. Folder titles and one or two genre terms would be the initial entry points.
In order to automate data entry (remember that the folder titles are typed) we purchased a text scanning stylus. Using a spreadsheet, we attach and scan the barcode of the folder, the title of the folder, and genre terms from a typewritten sheet. there are no typographical errors, and with the scanning pen, we can enter data at a rate far higher than hand typing.
Once we populate the spreadsheet, we use other processes to convert the spreadsheet into MODS xml descriptive metadata records, pair them with the set of scanned objects from the folder and use a batch process to ingest them into the preservation digital repository. After a bit of tinkering with settings, workflow, and process, we are far exceeding the throughput of a manual process.