When we lived in Colorado, there was a small but active Italian-American community that lived mostly in the northern neighborhoods of Denver. While not a part of that community we frequented a market there to get homemade sausage and cheeses. The entire time we were there, at the checkout counter was a petition we could sign to create an “Italian-American” vanity license plate. I always wondered why someone would want to do this. How important is it to advertise within the context of a registration system that you belong to one group or another? (I’ve since discovered that they were ultimately successful since now you can get an Italian-American license plate–in Colorado at least.)
In our implementation of the CTDA’s (http://ctdigitalarchive.org) underlying Fedora repository, we confronted a similar question relating to namespaces. Fedora assigns a PID (Persistent IDentifier) for every object that consists of a namespace prefix and a simple string identifier. For example: mystuff:12324 could be a PID in a Fedora system. PIDs are unique in each system and serve to identify that object and that object alone. Namespaces can be any alpha-numeric string, subject to the usual computer limitations about illegal characters. PID namespaces aren’t inherently useful for anything. Pretty much anything relating to repository management that you can do with a namespace you can do in other ways. Since the CTDA would be a place where multiple organizations would have content, we thought it would be a good idea to assign a namespace to each unique organization. Our reasoning was that, in this way we could tell at a glance the owner of any object in the system. Again, this really wasn’t necessary, since there were a whole bunch of other ways to do this, but we were prisoners of our own biases, so we had to move along this path.
Once we had decided on individual namespaces, we had to decide how to assign them. Should we use the initials or acronyms of individual organizations? As you can imagine, in a state-wide repository for cultural heritage there was the distinct possibility that there could be a number of organizations that started with the same letter and had “HS”–for historical society–as the last two. For example, there are 13 towns in Connecticut that start with the letter “C,” if each one had a local historical society, who gets the coveted “CHS”–not to mention the potential claim from the statewide Connecticut Historical Society?
We decided that, rather than wrestle with that issue, we would punish the entire class and not use any letters in namespaces, but assign them in consecutive order. THAT left us with the dilemma of who got namespace “1.” We then decided to follow the lead of the FCC–who does not assign a channel 1 to any television station–and not have a namespace 1 (or 100, or 1000, or in fact ANY 1s at all). We finally decided on a five digit string, where the organization is identified in the first two digits, and any sub-divisions of that organization could be identified in the last three digits, always beginning with a 2 or higher.
So, the University of Connecticut Libraries was assigned the PID namespace 20002. The next organization will get 30002, and so on forever…
Now, I know this is all essentially meaningless, and that there are gigantic holes in the logic of this approach, but these are the sorts of things that take up time in repository development projects. And you thought the technology was difficult!