AT @ Yale: 2010

Friday, June 11, 2010

Connecting Excel to AT/MySQL

Given the problems associated with creating and running custom Jasper reports within the AT, we decided to simply use Excel as a front-end application for data analysis. The beauty in this approach is that data can be pulled from the AT tables, or from MySQL queries, and then further analyzed, graphed, etc., in Excel. Pivot tables can even be used to create more robust reporting options. What's more, and with a little know-how, you can even use Excel for batch updates.

Here's how to connect Excel to the AT:

Download and install the ODBC driver for MySQL here: http://dev.mysql.com/downloads/connector/odbc/3.51.html.
Configure the ODBC Data Source (Control Panel > Administrative Tools > Data Sources (ODBC) > Add >MySQL ODBC 3.51 Driver)
Enter your AT database connection settings
Open Excel
Click on Data >From Other Sources > From Data Conenction Wizard > ODBC DSN > [the database/connection you just set-up in step 3]
Select what you want to import

Voila! It's as easy as that.

Right now we're working on constructing more complicated MySQL queries typically used for end-of-year analysis and reporting, which we will then similarly connect to Excel to allow our staff to manipulate the data as needed. Here is where a little MySQL knowledge goes a long ways!

In the end, we thought this was a much more efficient and effective means for providing our staff with customizable reporting functionality from the AT. For one, our staff are more comfortable with and regular uses of Excel. Second, there is much more documentation and training options available for using Excel and its many advanced features.

UPDATE (6/23/2010)

We ran into a bit of a dead end trying to connect custom MySQL queries to Excel. It was easy enough to connect the tables, but connecting to custom queries proved troublesome. Instead, we are now using SharePoint, which does allow you to create custom SQL queries and stored procedures. If interested, more information on how to set this up can be found here: http://office.microsoft.com/en-ca/sharepoint-designer-help/add-a-database-as-a-data-source-HA010100908.aspx#BM6.

Wednesday, June 2, 2010

PDF & HTML Stylesheets for Resource Reports

As with perhaps most other users, we wanted to devise a means for customizing the AT's EAD output to comply with our local EAD implementation guidelines. Mike Rush at the Beinecke created and maintains a stylesheet that converts our AT EAD output to conform with Yale's EAD best practices. This now works very well for us. We export EAD, apply the transformation, validate it, and then save for upload to our finding aids database. That is all well and good. Our staff, though, wanted to be able to perform a similar operation, really a print preview, when editing a resource. This was possible in our previous EAD editor (XMetaL) thanks to built-in customizations. Although the AT's PDF & HTML resource reports do allow for such a print preview, many of our staff wanted a finding aid that looked more like our own. Thankfully, the AT allows you to swap stylesheets (see the AT's FAQ > Import & Export Questions for instructions) to address such needs. We found a few problems, however, that you may need to take into consideration when swapping stylesheets.

First, make sure to check the path for any or subordinate stylesheets you utilize. If you're saving your stylesheets in the AT's report folder (i.e. C:\Program Files\Archivists Toolkit 2.0\reports\Resources\eadToPdf) make sure to use the path \reports\Resources\eadToPdf\[filename]. Otherwise, if you're pointing to a URL, make sure to use the full URL. This was all that was needed to make our PDF stylesheets run in the AT.

Second, especially for HTML stylesheets, make sure that any parameters specified include default values. This was what was causing errors for us.

With these two simple tweaks we are now able to apply our PDF and HTML stylesheets when staff generate resource reports. Ideally, we would like to apply our AT to Yale BPG stylesheet prior to running the resource PDF & HTML reports, perhaps via a plug-in. I'm sure others would like to modify this process as well. For the time being though, we're satisfied with the current output, which allows our staff to easily preview their work in a format similar to our own finding aids.

Monday, May 24, 2010

Command Line Interface (CLI)

We have been testing the developer preview of AT Version 2.0 Update 6, which now features support for the command line interface. Based on work from Cyrus Farajpour at USC, the new version and corresponding ATCLI plug-in developed by Cyrus and Nathan Stevens allow for development of handy command line instructions such as batch export of resources--something we and perhaps others with lots of finding aids are excited about. After initial testing, the plug-in has been modified to address memory consumption issues (a new session needs to be open for each record otherwise the records get cached and used up memory) and allows for specification of output directory.

Our first batch export test of our 3000+ resources was performed on my desktop (2.99 GHz processor, 2 GB RAM) under normal workload. It took approximately 96 hours for the process to complete! Given these results, we've tweeked our approach quite a bit, running it from a better computer and adjusting the amount of memory assigned to the atcli.lax file (see AT's FAQ's for instructions). This seems to help a bit. The main problem though for us is the sheer number and size of our finding aids. No matter what we do, it's going to take some time. In addition, we have several finding aids that don't need to get exported. Ideally, we'd like to be able to select which finding aids to export (i.e. those not marked internal only or those updated since a given date). Perhaps further modifications to the the plug-in will allow such customization.

For the time being, we'd like to thank Cyrus and Nathan for developing this key tool and look forward to working with others to improve it's design and functionality.

UPDATE (6/23/2010)
Good news. Cyrus has modified the command line plug-in to incorporate parameters, allowing you to specify which subset of resources/EAD you want to export from the AT. The parameters include: findingAidStatus, author, eadFaUniqueIdentifier, resourceIdentifier1, resourceIdentifier2, resourceIdentifier3, resourceIdentifier4, internalOnly, and lastUpdated. Plus, you can use any combination of these to easily fine-tune the export process as necessary. For more information on setting up and using the plug-in check out his github site: http://github.com/smoil/AT-CLI-Export-Plugin

Friday, March 12, 2010

Locations

Creating locations in the AT is fairly straightforward, especially for simple numeric ranges. AT's batch location creation tool is very handy here. Creating more complex locations, especially those with alphanumeric components, however, cannot be done in the AT. This was unfortunate for us because although the vast majority of our material is stored offsite, the number of distinct onsite locations we have is still close to 3000 given the unique combination of rooms, ranges, sections, and shelves. Rather than try to generate these manually in the AT, I chose to create them in Excel and then paste directly into the AT tables. Here's how I did it.

First (after having surveyed all of our locations to come up with room numbers, ranges, sections, and shelves), I opened the LocationsTable in Navicat and copied the last record into Excel. [note: we had entered in some locations already.] I then used this record to structure and format subsequent location entries and assign sequential locationIds. Excel's fill-in feature is extremely helpful here, especially for alphanumeric data. Simply typing in 1A1, for example, and then dragging the fill-in cross-hairs to the appropriate point, allows you to quickly generate values 1A1, 1A2,...1AX. Once finished adding in all your data, your spreadsheet should look like this: sample (.xls).

The final step is to paste the records into the LocationTable via Navicat. [note: as I explain in my post on creators, you need to make sure you are the only one using the AT when pasting in data or looking up ids. Again, this is to prevent overlap or duplication of work.] Once finished, check your work in the Locations module (Tools > Locations), making sure you can create additional (i.e. test) locations.

Thursday, March 11, 2010

AT Issues: Creators

Another pesky problem we've encountered with the AT EAD import process is that finding aids not encoded with a <corpname/> or <persname/> or <famname/> in <origination label="creator"/> do not get names created with the assigned function of creator in resource records. Unfortunately for us, this amounted to the vast majority of our finding aids, some 1600+ records. Rather than fix these one at a time in the AT, we came up with the following workaround.

1. Generate list of filenames and creators from our EAD files that lack <corpname/> or <persname/> or <famname/> in <origination/>. To do this Mark Matienzo, our new digital archivist, wrote a script for us that parsed our EAD files and dumped this data into an Excel file.

2. Format filenames as needed in separate spreadsheet (e.g. remove .xml extension, add quotes) to create list as follows:

"mssa.ms.0001",
"mssa.ms.0002",
...

3. Use Navicat to run query generating list of resourceIds based on formatted filenames (eadFaUniqueIdentifier in AT Resources table):

SELECT eadFaUniqueIdentifier, resourceId FROM Resources WHERE eadFaUniqueIdentifier IN ("mssa.ms.0001", mssa.ms.002", "mssa.ru.9019") ORDER BY eadFaUniqueIdentifier;

4. Copy resourceIds back into Excel file

5. Format names in separate spreadsheet (add quotes) to create list as follows:

"Yale University. Office of the Secretary.",
"Yale University. Office of the President.",
...

6. Create query in Navicat to retrieve nameids for any names that did make it into the AT:

SELECT nameId, sortName FROM Names WHERE sortName IN ("Yale University. Office of the Secretary.", "Yale University. Office of the President.", "Yale University. Office of the Provost") ORDER BY sortName;

7. Paste nameIds into master file with resourceIds and corpnames, famnames and persnames.

For those records with nameIds, proceed to step 13. Otherwise, for those names not present in the name table, proceed to step 8.

8. Open Names table in Navicat and proceed to last record

9. Copy last record in Names table into Excel to serve as model, noting column names. It is important to note that corpnames and persnames use different columns/fields so make sure to examine records in Names table for formatting both persnames and corpnames.

10. Copy contents of your master original file with persnames, famnames and corpnames into the new spreadsheet according to the Name table structure. Use Excel's fill-in feature to fill in data as needed and assign sequential nameIds from last record in table. Make sure to format the lastUpdated and created fields as text in Excel so as to mirror date encoding in the AT. Here is what your spreadsheet should look like: sample (.xls).

11. Paste records into Names table using Navicat

12. Copy nameIds for newly created names back into your master file of resourceIds and corpnames and persnames

13. Open ArchDescriptionNames table in Navicat

14. Go to last record and copy into new Excel file to serve as model for formatting data

15. Copy contents of master file with resourceIds, corpnames, famnames, and persnames, and nameids into Excel file mirrororing structre of sample record. Use Excel's fill-in feature to format data and assign sequential ids from last record in table. Here is a what your spreadsheet should look like: sample (.xls).

16. Paste contents from Excel file into ArchDescriptionNames table using Navicat

A couple of caveats. First, it is important that these steps be taken by someone comfortable with the aforementioned programs, as well as MySQL. Second, and most important, id creation and pasting of data directly into the AT tables should be done when no one else is using to AT to prevent accidental overlap or duplication of work. Finally, make sure to test the results, including creating new names and resource creator links in the AT client just to make sure everything is ok. If something does go haywire, you can always delete the records from the AT tables you just pasted in.

Monday, February 22, 2010

AT Transition Update

I apologize for the long delay in posting to the blog. I have spent the better part of the last several months addressing several (thousand) issues raised by our legacy data migration and plug-in development. Today, though, I am happy to announce that the day has finally come where we, Manuscripts and Archives, have finally made the jump. We are now fully in and committed the AT! Unfortunately, much still remains to be worked out.

First off, we have to fix thousands of errors reported out from our programmatic efforts to match legacy instance information to resources in the AT. These were mostly the result of errors in our finding aids and, less often, errors in our location database. These have mostly been addressed, leaving only the truly disparate errors that need a good deal of research into how collections were processed and, often times, examining/verifying box contents.

Another major category of clean-up work stems from our QC of data sent back from our consultant where we found that, with some overlap with our other error logs, several thousand barcodes from our locations database did not come into the AT. Thankfully, many of these can be easily diagnosed with our Box Lookup plug-in and fixed with our Assign Container Information plug-in. Others require more in-depth problem-solving. For those large collections where things just went haywire, or for those small collections with only a few boxes, we've decided to delete the resource, re-import and then re-assign instance information using the Assign Container Information plug-in. Other collections will require deleting one or more components, importing those components in a dummy resource, transferring said components back into the original resource, and then re-assigning container information.

A third major challenge is cleaning up restriction flags we've assigned instances based on notes in our locations database. Our locations database had a variety of notes both at the collection/series/accession level and at the item level. Since these notes were wildly inconsistent and unable to be easily parsed, we created blanket restrictions for instances based on the notes. As a result, we have to review the restrictions assigned, verifying those that need to be restricted are and fixing those that are open. Thankfully, these errors can easily be fixed with our Box Lookup and Assign Container Information plug-ins.

Aside from these data errors, which are our first priority, we also have to finalize workflows, procedures, and documentation for accessioning, arrangement and description, and collections management. Although equally critical to our day-to-day operations, these were put off until we were in the AT so that we could fully model what needed to be done.

So, although we've made such great progress up until this point, much remains to be done, much needs to be resolved. This is more or the less the lasting impression of the project. For other large institutions planning similar migration projects, I can't say enough just how much work is involved and how important it is to get staff, especially technical staff, on board. For those institutions without technical support and dedicated staff, it is probably best to hire a consultant, especially when it comes to legacy data (e.g. instance) migration and customizations to the AT.

AT @ Yale

Friday, June 11, 2010

Connecting Excel to AT/MySQL

Wednesday, June 2, 2010

PDF & HTML Stylesheets for Resource Reports

Monday, May 24, 2010

Command Line Interface (CLI)

Friday, March 12, 2010

Locations

Thursday, March 11, 2010

AT Issues: Creators

Monday, February 22, 2010

AT Transition Update

About this Blog

Topics

Links

Blog Archive

Followers

FeedBurner FeedCount

AT @ Yale

Friday, June 11, 2010

Connecting Excel to AT/MySQL

Wednesday, June 2, 2010

PDF & HTML Stylesheets for Resource Reports

Monday, May 24, 2010

Command Line Interface (CLI)

Friday, March 12, 2010

Locations

Thursday, March 11, 2010

AT Issues: Creators

Monday, February 22, 2010

AT Transition Update

About this Blog

Topics

Links

Blog Archive

Subscribe To

Followers

FeedBurner FeedCount