In December 2016, Harvard Library successfully completed the migration of their finding aid data from a home grown, locally supported system into ArchivesSpace, an open source software that is used world-wide by over 300 archives. The migration marked the successful end to an almost two-year effort at the Schlesinger Library to systematically review all 1,485 of our finding aids with an eye to optimizing the textual content and structural mark-up of the guides for ingest into ArchivesSpace.
The first archival holdings collected by the Schlesinger Library date back to 1943 when Maud Wood Park donated a large collection of suffrage-related materials to her alma mater, Radcliffe College. While we don't know when our earliest collection guides, or finding aids, were created, they likely date from sometime in the 1940s. Over the years, guides were created by individuals from a variety of backgrounds, using a wide variety of templates and standards. Many early guides contained numbered lists of materials which did not distinguish between descriptions of individual documents, subsets of documents in folders, entire folders of material, or general descriptions of the contents of boxes. Some guides also lacked key elements such as biographical information on the collection creator or an overview of the type of materials (correspondence, photographs, etc.) and people, events, and topics represented in the collection.
By the 1980s, guided by standards outlined in Steven L. Hensen's Archives, Personal Papers, and Manuscripts (often referred to as APPM), Schlesinger guides became more standardized. Materials were generally described at the folder or box level and a numbering system was put in place that informed archivists, if not the researcher, of the general quantity and format of the material in question. In addition, the formatting and use of abbreviations and acronyms became consistent, and collection-level elements such as dates, extent, biographical or historical notes, and scope and content notes were consistently included in the guides.
Most recently, Schlesinger finding aid descriptive practices were revised to comply with Describing Archives: A Content Standard (DACS), which was adopted as an official standard by the Society of American Archivists in 2004. The impetus behind DACS is to make finding aids more standardized across archival repositories and to make them more easily understood by users. For the most part, our guides were DACS-compliant, but there were two large changes it forced us to make relating to finding aid titles and our use of abbreviations and acronyms. Under DACS rules, the title of a collection is formed using the name of the creator in addition to the traditional titles of "Papers" (for personal and family collections) or "Records" (for corporate or organizational collections). Once we adopted DACS, our titles moved from either "Papers" or "records" to titles such as "Papers of Betty Friedan" or "Records of the Massport Jets." DACS also advises against using abbreviations or acronyms, so we ended a long standing practice of using initials instead of spelling out the names of individuals prominently featured in guides. We began spelling out words we frequently abbreviated, such as names of states, and began recording dates more fully with unabbreviated months and four digit expressions for years.
While the Schlesinger moved from creating finding aids using typewriters to using word processing programs in the late 1980s, we had no means of delivering them in their electronic state, so they were printed out and added to binders alongside our typescript guides. In 1998, Encoded Archival Description (EAD), an XML standard for encoding archival finding aids, was released. The Harvard University special collections community quickly adopted EAD and developed OASIS, an online catalog for delivering EAD finding aids. In 2002, the Schlesinger received the first of several retrospective conversion (recon) grants to rekey and mark up paper-based finding aids for display in OASIS. By 2008, the project had transformed almost 1,000 finding aids, more than quadrupling the number of Schlesinger guides in OASIS. While our old guides were now online and available to any user with internet access, no effort had been made to update the content of the guides to reflect current archival descriptive practices.
By 2015, OASIS was showing its age. While many finding aid discovery systems had begun to experiment with new ways to display finding aids, OASIS continued to provide a single flat document, essentially an online image of the old paper finding aids. Behind the scenes, things were even more dire, resulting in performance issues that were becoming increasingly difficult to troubleshoot and required considerable staff time to fix. No longer wishing to support a home grown system, Harvard formed a committee to investigate available finding aid storage systems. The committee recommended using the resource module in ArchivesSpace, a system that was already being used by several repositories at Harvard to manage data relating to accessioning.
ArchivesSpace is a database in which finding aid elements are used to populate fields in "resource records" that may then be used in ways data in traditional Harvard finding aids cannot. For instance, using the ArchivesSpace data structure, we can create reports telling us how many collections we have relating to a specific topic, what those collections are, and the extent of those collections. On a more basic level, we will be able to create a report that tallies the linear footage of both our unprocessed and processed collections, so we will know exactly how much material we have, a figure we currently estimate at 19,700 linear feet. The more structured data storage also opens up the possibility of developing more innovative public-facing displays and usages of our data.
While we could see the vast potential in using ArchivesSpace, we also saw that our inconsistent data practices were going to make it impossible to take advantage of many ArchivesSpace features. More importantly, due to the more stringent data expectations found in the system, we knew that searching in ArchivesSpace was going to be incredibly challenging unless we made some basic changes to our data. Some of our data problems were solved through work done by a Harvard committee formed to identify incompatibilities between Harvard EAD and ArchivesSpace and to apply programmatic fixes to Harvard's EAD data in order to ingest all Harvard finding aid data into ArchivesSpace. Other problems, however, required close examination of each finding aid to identify missing or incomplete data and make changes which would make data more consistent across finding aids as well as more ArchivesSpace friendly.
Working within the projected one-year timeline Harvard Library set for normalizing EAD data for ingest into ArchivesSpace, the Schlesinger prioritized our own data cleanup, assigning 1.5 staff members to the task of reviewing all finding aids and optimizing them for use in ArchivesSpace. By the end of the year we had:
- Standardized our date information into machine readable formats and parsed inclusive and bulk date data into separate elements to ensure date records in ArchivesSpace were optimally populated for reporting.
- Formatted our collection titles so they are DACS compliant. Unlike Harvard's discovery systems, ArchivesSpace doesn't display the name of the collection creator alongside the collection title. In order to make collection-level records discoverable and distinguishable from each other, we needed to add the creator name to titles.
- Spelled out abbreviations and some acronyms to improve searching and make information easier to read. To standardize the wording found in our finding aids, we spelled out common abbreviations such as shortened names of months and states in our older finding aids. We also spelled out acronyms, such as TLS (typed letters, signed) or ALS (autograph letters, signed). Lastly, we replaced initials with the full names of the individual being referred to.
- Eliminated most cumulative headings to make data easier to read. Many of our older finding aids contained intermediary levels of descriptions, locally referred to as cumulative headings, in their inventories. While ArchivesSpace was able to ingest these levels without any problem, the data within the headings was then separated from the folders that followed. Much of the data in cumulative headings were typing shortcuts in which the topic or format of the material in several folders was described once and the differentiating data was listed at the folder level. In other words, "Diaries" might be a cumulative heading with folder titles underneath containing only dates. To make all of the relevant information about a folder appear in one record, we eliminated most of the cumulative headings by adding the data in the heading in the folder titles. For the diary example above, that means each folder now reads "Diary, 1910," Diary, 1911," etc.
- Normalized our name and subject headings across collections using data from HOLLIS and the Library of Congress name authority file (NAF). ArchivesSpace harvests names and subjects terms from the sections of our guides called "Creator" and "Additional Catalog Entries" and uses them to populate a pick-list available to all ArchivesSpace users. When a name or subject has been added to a resource record, the record becomes linked to the term and the term then serves as a link between different records addressing similar topics or involving the same people. Since computers are much more exacting than people, only names and subjects that are exactly the same can be matched up during the ingest process. Our name and subject headings are created using Library of Congress subject headings (LCSH) and name authority file (NAF). Both through human error and due to changes made to LCSH and the NAF, our guides contained many variations of the same names and subjects. To combat the discrepancies, we deleted all name and subject headings from our guides and replaced them with headings taken from HOLLIS (Harvard's library catalog) records representing the same collections. For any names that were incomplete (containing birth but not death dates), we searched the NAF to get the most current form of the name. Many of the names and subjects still require further authority work, but the work we were able to complete in the time we had means the data we ingested into ArchivesSpace is far more standardized and compliant than what was previously found in the guides.
Records in the NAF are created and maintained through the Program for Cooperative Cataloging's Name Authority Cooperative Program (NACO), a network of librarians who undergo training to become NACO-approved contributors. We are fortunate to have one of these contributors on our staff, allowing us to establish name authority records for individuals related to materials in our print and manuscript collections. Usually, this work is done at the point of cataloging, but early on in our project, we realized that our data cleanup could be used as the basis for a larger data cleanup project involving NAF records that would benefit librarians everywhere. When we encountered NAF records with open life dates, we added the step of searching the names on the internet and in genealogical databases such as Family Search and Ancestry. If we were able to document an individual's death, we passed the documentation on to our NACO contributor, who will use the information to update over 200 NAF records.
Now that Harvard's finding aid data is stored in a more stable environment, the focus turns toward implementing a more innovative public display of our finding aids. As with our finding aid storage, Harvard has chosen to implement ArchivesSpace for the public delivery of our finding aid data. Work is currently underway on enhancements to the public interface, and the goal is to have the new system in place by the end of the year. In anticipation of the launch of a new archival discovery system, we will continue to review and update our data in ArchivesSpace, with an eye toward both ArchivesSpace functionality and the human beings who need to interpret the data.