Friday, May 1, 2009

AT Issues: Large Finding Aids

Manuscripts and Archives encountered several performance issues when loading our finding aids into the AT. First, given the large number of finding aids in question (2500+), we encountered load time issues. Our first attempt to batch import our finding aids lasted the better part of a weekend, ultimately crashing with the dreaded java heap space error. Adding insult to injury, no log was generated to indicate the cause of the crash or the status of import (e.g. which files were successfully imported). Our initial diagnosis pointed to our setup. We had the database installed our one of our local, aging servers and ran the import via a client on a remote workstation. To address the issue of load time we changed our setup and moved the database to our fastest machine, a Mac with 8GB of memory and multiple processors, installed the AT client on it, and saved copies of our finding aids to it for easy import.

Our second attempt, which involved approximately 1800 finding aids, was much much faster but still crashed. The likely culprits this time were large finding aids and memory leak. We found that large finding aids (3MB+) crashed the import process when included as part of a batch import. In addition, we found a so called memory leak (i.e. successive loss of processing memory with each imported finding aid), which greatly slowed the process over time and contributed to the crash. As a result, we separated out large finding aids and imported them individually, as well as creating smaller batches of finding aids (both with respect to total number and total size) to import in stages. Just to give you some idea of the time required to import larger finding aids, we found that using a remote client to import up to a 2 MB file averaged 20-30 minutes; 2-3 MB took 30-60 minutes; 5-6 MB took 90-120 minutes.

These strategies proved effective, allowing us to import all but the largest of our finding aids (8-12 MB each), which we are currently working on. Because these present problems for our finding aid delivery system as well, one option is to split them up into multiple files, each around 2 MB. The only problem with this option is dealing with/maintaining multiple files.

For other institutions with similar numbers and sizes of finding aids these strategies may be of help to you.

1 comment:

  1. AT used to crash when we were trying to load our largest finding aids (approx. 3.5 MB). After increasing the heap space from 256 to 512 MB, we did not have any more problems.