Many people reading this blog entry may fondly remember the days in which the most popular word processors were WordStar, WordPerfect, or MultiMate—a time in which Microsoft Office was in its infancy and graphical interfaces did not exist, or if they existed, they were mostly bitonal.
While those applications and interfaces are mostly a thing of the past for most people, they are quite real and alive through many of the collections held at the Schlesinger Library. It is common for collections to include digital materials representing documents, e-mails, photographs, etc., in antediluvian formats (when we think about computer eras), but also more recent and still popular formats.
Thinking of a common activity, what would happen if we received an e-mail with an attached file, but the file is not a ubiquitous PDF document, but instead a WordPerfect 3 document (circa 1982)? At the Schlesinger Library, we asked that same question, but thinking in the context of how would we make those files included in many of our collections available to library users. It would be extremely easy simply to make files available in their native format and let users handle the hurdle of finding the appropriate application/conversion tool/filters to access the contents of such files. Instead, we decided to take a different approach and to look for a digital analogy to the philosopher’s stone—but instead of turning the documents into gold, the tool would allow us to make the contents of these documents easily available to library patrons through familiar formats.
Using the Python programming language as the unifying conduit for a series of existing open source libraries and tools, the answer to our problem came in the form of Rebis. Through Rebis, a series of files arranged intellectually are converted to archival PDF (PDF/A), thus presenting users with the contents of the original files (most times preserving its original formatting), in a file format commonly used today. Rebis was designed to tackle digital obsolescence problems so that in 50 years or more, users should still be able to access the contents of the file without having to convert it to a standard current at that time. Additionally, Rebis provides a metadata table, including the original file metadata (when available), through which users can access information such as when the file was last saved, or the original name of the application in which the file was saved. And Rebis also includes data about the original repository, collection, and file contained within the larger document. That way, users can easily reference the source material or find relevant associated content.
At this moment, Rebis is still in its infancy, but it is already capable of dealing with over 50 file formats (including documents, spreadsheets, and presentations) and with a large number of digital images and graphics.
Rebis-generated files are slowly starting to appear across collections, including digital files, with most of them corresponding to an item identified with the letter E (for electronic records). Rebis will be made available as an open source tool later this year for other repositories to use and for people to contribute to the project with code, suggestions, and bug reports.