Thursday, May 7, 2009

Batch Editing & Data Clean-Up

One of the key weaknesses of the AT is the inability to batch edit data. The need for batch edit functionality was ranked of high importance in the AT user group survey and will hopefully be added in a future release. What then is a repository to do in the meantime? I suggest two possible options: 1) batch edit data prior to import; 2) manipulate the MySQL tables directly or use a database administration tool such as Navicat for MySQL to connect to the AT's MySQL database and perform queries/updates in the tables.

As I have described before, over time our collection management data has been created according to a number of different ad hoc and defacto standards. We in MSSA have tried as much as possible to batch edit and standardize our data prior to import into the AT. This was straight forward for our accessions and location information which was already stored in a database and thus easy to identify and manipulate. The one problem that did exist with this data was a tendancy by MSSA staff to combine data that belong to a series of different data fields, as instructed by EAD or the AT, into a single catch-all free text note field (a place to find everything). Options for handling this data included exporting or legacy data to and modifying another file or importing into the AT and then editing. We chose the former, performing batch operations to format the files according to the AT import maps. Although this was largely successful, we still encountered edits that needed to be made once in the AT. The options at that point were either to edit in the AT one at a time or perform another round of edits, delete the data in the AT, and then reimport again. We chose instead to perform batch edits in AT using Navicat, saving a considerable amount of time and effort.

The biggest challenge though we faced in standardizing our data prior to import came with our finding aids. Because they're not in a database and therefore not easily comparable, it's hard to see what changes need to made until they're actually in the AT. No matter the number of iterative batch edits we ran on our finding aids we still came across edits that needed to be made. Importing them into the AT however would still likely require a further round of edits. Running edits outside of the AT, deleting all the data, and then reimporting them would be a huge burden, particularly on over 2500 finding aids. We chose instead to run batch operations in the AT using Navicat and then run find/replace separately on the finding aids.

1 comment:

  1. I heard there is a way to export your file to excel, where you can then do search and replace commands. Then you would need to re-import the data. Not ideal, but someone told me they were doing it that way. I'm gearing up to start AT, so haven't tried this. But I thought this may be helpful so am passing it along.

    ReplyDelete