Wednesday, April 22, 2009

AT Issues: Large Collections

Two issues that we have encountered during testing are load time and slow performance. Given MSSA's 100,000+ instances, 2600+ resources, 7000+ accessions, and multiple locations, we've encountered lengthy start-up times and slow performance on even just a single client, let alone the dozen or so clients that will eventually be using the AT at a given time. Since we have not fully deployed the AT at this time, so we cannot yet accurately test how the AT will perform when running across several clients. Two things we will be doing to speed up performance is to move the AT to a dedicated server and to simplify our location assignments. Concerning the latter, in a previous iteration involving writing location information to the tables directly we created an enormous locations index, including many duplicate values, which significantly slowed client performance.

A third performance issue encountered concerns report generation. Again given the amount of data we have in the AT, running reports across our resources and accessions can take a considerable amount of time. Rather than running reports in the AT, which also aren't customizable, we use the database administration tool Navicat for MySQL to query the AT tables. Aside from being much faster, it's customizable (provided you know a little MySQL) and allows for batch editing.

Monday, April 20, 2009

Deploying the AT Across Multiple Yale Repositories: Implementation

In setting up the AT for use across multiple Yale repositories we encountered a number of practical issues that needed to be resolved. The two most important were the need for standardization of practice and administrative set-up of the AT.
  1. Standardization of practice
    Each of the four participating repositories accessioned and managed special collections in a different way. To maximize the Toolkit's effectiveness we therefore needed to create standard procedures for accessioning, including defining a minimum-level accession record and application of consistent accession numbers. In addition, we created documentation in the form of instructions, guidelines, and tutorials to instruct both initial and future participants.

  2. AT set-up
    Again, given the variety exhibited in participating repositories' practices and collections information, we had to carefully consider whether to customize the Toolkit (e.g. use of user-defined fields, unique field labels, default values, etc.) to meet specific repository needs. The major challenge posed, however, was that the Toolkit only allows for customization across the AT instance as a whole and not specific to one repository within that instance.

    We had set up one database instance for the AT at Yale and created repositories for each of the special collections within it. An alternative strategy would have been to create separate instances for each repository, allowing for repository specific customization. The goal of the project, however, was to create one means for managing and querying special collections information across Yale's special collections. In addition, given the lack of distributed technical expertise and support, we decided to centrally manage the AT in a single instance.

    We initially chose not to customize the Toolkit, maintaining a vanilla installation for the Music Library, Arts Library, and Divinity Library collections. Given the sheer volume of collections information in Manuscripts and Archives (MSSA), however, we decided to create a separate MSSA instance, mostly for testing legacy data import, but also to incorporate MSSA-specific data elements in user-defined fields. We are also currently contracting out customization of the Toolkit to handle collections management processes, including Integrated Library System (ILS) export.

Sunday, April 19, 2009

Populating Resources: MARCXML vs. Finding Aids

Choosing how to populate resources in the AT was an important consideration for Manuscripts and Archives, one that ultimately had us reversing course and scrapping our initial plans. Our initial dilemma was that we lacked finding aids for all of our collections, with many of our University Archives finding aids lacking inventories for some accessions (some in paper only, some taken in with no inventory). As a result, we initially chose not to import our finding aids, choosing instead to import MARCXML, which we transformed via a stylesheet into EAD for batch import. Given the lack of data normally provided by importing finding aids, we developed a script to tie our container data (Paradox) to the resources. Several things didn't quite work, so we had to reconsider our options.

Having attended the AT User Group meeting at SAA last Fall, we realized that our approach caused too many complications and problems of sustainability. As a result, we speced out importing our finding aids. Again, given our lack of finding aids for all collections, we worked out a plan to hire a consultant to help standardize our findings aids for import. We successfully imported 2000/2500 finding aids and turned our attention to how to utilize the AT as a collection management system. We then realized we faced the problem of editing editing or significantly revising finding aids, especially on the University Archives side where offices regularly provide accessions inventories ready to copy and paste into EAD. Without a simple means for importing partial EAD for accessions, and without wanting to re-enter these manually in the AT, we reconsidered our plan again. Since our current EAD creation and editing process is sufficent, we decided that the AT is not currently capable of meeting our needs and that to try and have a consultant customize it to meet our needs would be beyond our means.

As it stands now, we're back to square one; MARCXML is it (again). Unfortunately, given the considerable amount of work spent cleaning up and standardizing our finding aids, we will have to come up with a means for importing data from a separate AT EAD import (e.g. Resource Titles, Finding Aid Titles, and Citations) into a fresh AT instance populated via MARCXML. In addition, to meet our needs and function as a full collection management system, we will work with a consultant to modify the Toolkit to marry our container information with our resources and allow for easy export to our ILS (Voyager). Hopefully the need to easily import partial EAD will be worked out and we can use the AT as intended, populating resources via finding aids.

Saturday, April 18, 2009

Customizing the Toolkit: Instance Creation & Voyager Export

For some time now we have been looking to replace our collections management and accessioning systems, which are obsolete both in platform and functionality. When the Toolkit was being developed we made the decision that since it would not fully meet our needs, we would not adopt it. A few years later, however, committed to consolidating our systems into an open-source alternative, we have reversed course and decided to move to the AT. Since the latest version (1.5) still does not fully capture our existing systems functionality we have contracted out the modification of the Toolkit to meet two specific needs: 1) improved container (e.g. Instance) creation; and 2) Voyager export functionality.

Container/Instance creation

Although version 1.5 of the AT comes with customizable Rapid Data Entry (RDE) Screens and “sticky” values for improved data (e.g. Instance) entry, it is still time consuming to enter information for large collections. Here is a screenshot of a RDE I created for Instance creation:



AT 1.5 allows for the creation of one item at a time with no option for batch creation and no automatic assignment of successive box numbers. Here is a screenshot of our current system:



This system features the capability of populating successive items/containers, either next or n-multiple lines. It also allows for assignment of location info, which the AT does in a separate module

What we are building might look something like the following:



Voyager export functionality

To easily export collection information from the AT to our ILS (Voyager) we need to add the following fields to each instance: 1) VoyagerBib; 2) VoyagerHolding; 3) Date (last changed); and 4) Restriction. This will require a redesign of the Resources and Instances module to include these new fields, as well as the incorporation of a script (triggered by an Export to Voyager button) to process newly entered data in the proper format for export into our ILS. We already use a similar script to do this work with our current system, so it shouldn’t be much work to modify it for use with the AT.

Although it is unlikely we will complete all of our work for the next release of the AT, we hope to finish in time for incorporation into the next AT release.

Making the Jump: Standardization of Practice

In addition to the important and lengthy work spent standardizing our collections management data, Manuscripts & Archives also had to analyze its practices and procedures to determine how the AT could handle them.

1. Accessioning

For quite some time now we've handled accessioning differently on the University Archives side than on the manuscript side. This has resulted in two widely different processes, and two sets of disparate accession info. To remedy this situation we developed an accession checklist and procedures common to both University Archives and manuscripts, as well as instructions and tutorials for accessioning using the AT.

2. Description

As with accessioning, description has also differed between University Archives and manuscripts. With the welcome addition of Bill Landis as head of arrangement and description, and considerable time spent on the University Archives side revising its processes, this disparity has been addressed. Also helpful has been the creation of descriptive standards for handing a/v materials and electronic records, as well as the creation of finding aid templates and EAD best practices guidelines.

3. Collections management

This was less of a problem as collections management was handled well by one person for quite a while. The only problems we encountered were from incomplete data and improper use of the system by clerical staff (e.g. entering temporary holdings, capturing accession or descriptive info, or entering several different data elments of in a single note field). To address these problems we had to map out export of the problematic data elements into appropriate AT fields (where possible) and plan for post-import clean-up in the AT.

Reporting

In addition to the AT's built-in reports, which may or may not be sufficient for repository statistics, there are a few options for generating customized reports. Common reporting software tie-ins that can be purchased include: Jasper Reports, Crystal Reports, and iReports. Those wishing to create or customize their own reports with these apps will also need to make use of the Toolkit’s application programming interface (API), which is available on the Archivists’ Toolkit website.

Another option for those with knowledge of MySQL is to use a free database administration tool such as Navicat for MySQL. The beauty of this application is that with a little MySQL you can query and batch edit data in the AT MySQL tables. A similar tool is DaDaBiIK, a free PHP application that allows you to easily create a highly customizable front-end for a database in order to search, insert, update and delete records. Although these tools allow you to easily batch edit data in the AT, be forewarned that editing in the tables directly is not tacitly approved and may cause problems when upgrading.

We ran into problems during the upgrade process to 1.5 after we had written data (including primary key values) directly to the tables. We think this is likely due to our creation of new data/values (especially primary keys) in the tables directly. Subsequent work utilizing these tools with data already imported into the AT via EAD and Accession (XML) import has not encountered problems.

Making the Jump: Standardization of Legacy Data

One of the most important factors to consider in planning for migrating to the Archivists' Toolkit is the condition of your legacy data (finding aids, accessions, etc). In Manuscripts and Archives (MSSA), we were faced with daunting numbers: 2500 collections, 7000+ accessions, 100,000+ boxes. Captured in a variety of systems by a variety of individuals over several decades, we quickly saw the need to begin a lengthy process of data standardization prior to our switch to the Toolkit.

We started with finding aids. Still in EAD version 1.0 as of 2008, we spent considerable manpower converting our finding aids to EAD version 2002. Utilizing the Library of Congress' stylesheet modified for our purposes, we batch converted our finding aids in an iterative process lasting several weeks. Given the variety of encoding practices over the years, however, we were still left with 400-500 invalid files that we had to fix. This took several months. As with the conversion process itself, we worked in an iterative fashion, fixing where possible en masse the common and then the unique. Find and Replace was a godsend here. Dreamweaver was particularly helpful, allowing for multiple line Find and Replace across individual files, selected files, and entire directories.

We moved on to MARC records. Two of the specific problems we encountered when trying to import MARCXML (converted into EAD) was inconsistent assignment of local call numbers (sometimes assigning a 90b, sometimes a 90a, other times assigning same call number for multiple collections) and collection dates. This again stemmed from variations in practice over many years and had to fixed by hand.

We next addressed location info. Due to varied practice and limited authority control, our collection location information (stored in a series of Paradox tables) exhibited a good deal of disparity, if not outright oddities. Many items lacked basic control info (e.g. Record or Manuscript Group Number, Accession Number, or Series) and as a result we had to do a fair amount of detective work to assign such, where possible, or, in some cases, remove these items from our database.

Finally, we tackled accessions. Common problems encountered in our out-dated collections management system (MySQL back end with Access front-end) were inconsistent data formatting (dates, accession numbers, contact info) and input practices (through restriction info, contact info and other odds and edds into a single note field), as well as MSSA-specific (and University Archives specific) accession data and practices (see my other post on Standardization of Practice). These latter issues required a hack or two to enable shoe-horning odd MSSA data elements into appropriate AT data elements (and/or User-Defined Fields).

We anticipate a great deal of post-migration clean-up in the AT to continue to work towards a consistent data set. The great thing though about doing this work in the AT is having one system and one means for doing it. Indeed, migrating to the Toolkit, regardless of its future sustainability (an issue I will post on separately), has been a great opportunity for consistency and standardization within our department. We're just sorry it has taken so long!

Deploying the AT Across Multiple Yale Repositories: Background

In 2008, four Yale special collections repositories (Arts Library, Divinity Library, Manuscripts and Archives, and the Music Library) participated in a project to 1) install and test the AT as an open source collections management system, and 2) examine the feasibility of establishing a Yale way of managing and tracking collections across the Yale Library system. This project was one of a number of projects that were undertaken as part of the Mellon Foundation Collections Collaborative at Yale University.

Focus

The specific focus of the project was to use of the Toolkit for accessioning. Although the Toolkit can do much more, we limited our focus in this project to accessioning primarily because, with the exception of MSSA, the other participants have rudimentary systems in place for recording and managing accession information. In the past, this has been done primarily on paper. The Toolkit, however, facilitates easy capture, management and searching of collection information that is vital to the day-to-day operations of repositories. This allows participants to utilize the same system and terminology to enhance understanding of others’ collections, provide faster and more consistent access to collection information.

Work Summary

The principal investigator first met with the participants to examine existing collections management tools, record-keeping practices, and discuss needs and expectations. These sessions provided the opportunity to specify collections management practices the Toolkit does not accommodate, distinguish between software issues and points where the staff could or should be persuaded to do things differently, and explore the feasibility of developing a “Yale” way of managing special collections. In addition, with little in the way of legacy systems and practices for accessioning materials (Manuscripts and Archives excluded), it was determined that adoption of the Toolkit would be easy to implement and much welcomed.

Following installation, project staff instructed participants in use of the Toolkit and discussed issues concerning implementation and conversion of legacy data. Project staff then gave participants several weeks to use the Toolkit before following-up with a focus group to examine participants’ experiences, issues, questions, and needs. Important outcomes of the focus-group included the expressed need for common practice and use (e.g. best practices guidelines), improved documentation (especially concerning required fields, terminology, and reports), and identification of concerns regarding future administration and tie-in to other systems (i.e. Finding Aid Creation Tool).

Products

To support and further Yale’s use of the Toolkit, a variety of products were (and continue to be) created. These include:

a. Website <http://www.library.yale.edu/mssa/at/>

b. Wiki <https://yaleat.pbwiki.com/>

c. Guided instructions and tutorials for 14 separate Toolkit features and processes.

d. Expanded data dictionary (.xls) [in progress]

e. Best practice guidelines for accessioning

Staff also reported to the Toolkit developers participant experiences and provided recommendations for potential incorporation in future AT releases.

Conclusions & Recommendations

With little in place to accession and track collection material, participants have enthusiastically adopted the Toolkit. Given varieties in local practice/needs however and the amount of information the Toolkit allows you to capture, it is recommended that best practices for accessioning be undertaken to standardize its use. Additionally, ongoing central support and administration will need to be formalized, including bringing in additional special collection repositories. Particularly important here, especially for larger repositories with established record-keeping systems, will be helping repositories map and migrate their legacy data. With future releases and expanded Toolkit functionality, efforts will likely need to be made to further integrate into the AT more and more legacy systems across Yale libraries and special collections.

Introduction

The purpose of this blog is document Yale's development and use of the Archivists' Toolkit (AT).

Daniel Hartwig and other members of the Manuscripts and Archives staff will post regularly to document our experience and evaluation of the AT, but we encourage other archivists and institutions to contribute as well. To join in you can post comments or send an email to daniel.hartwig@yale.edu to sign up as a regular contributor or forward your thoughts to post on your behalf.

As the AT Roundtable takes root, we see this blog as a valuable means for sharing experiences, analysis, and offering suggestions for future AT developments.

This blog is not affiliated with the AT development group, but we welcome their input and participation. Users needing answers to AT-related issues should still post to the Archivists’ Toolkit User Group (ATUG-l can be subscribed to at http://mailman.ucsd.edu/mailman/listinfo/atug-l and/or to info@archiviststoolkit.org) or check out the AT Website or Wiki.