Thursday, March 11, 2010

AT Issues: Creators

Another pesky problem we've encountered with the AT EAD import process is that finding aids not encoded with a <corpname/> or <persname/> or <famname/> in <origination label="creator"/> do not get names created with the assigned function of creator in resource records. Unfortunately for us, this amounted to the vast majority of our finding aids, some 1600+ records. Rather than fix these one at a time in the AT, we came up with the following workaround.

1. Generate list of filenames and creators from our EAD files that lack <corpname/> or <persname/> or <famname/> in <origination/>. To do this Mark Matienzo, our new digital archivist, wrote a script for us that parsed our EAD files and dumped this data into an Excel file.

2. Format filenames as needed in separate spreadsheet (e.g. remove .xml extension, add quotes) to create list as follows:


3. Use Navicat to run query generating list of resourceIds based on formatted filenames (eadFaUniqueIdentifier in AT Resources table):

SELECT eadFaUniqueIdentifier, resourceId FROM Resources WHERE eadFaUniqueIdentifier IN ("",", "") ORDER BY eadFaUniqueIdentifier;

4. Copy resourceIds back into Excel file

5. Format names in separate spreadsheet (add quotes) to create list as follows:

"Yale University. Office of the Secretary.",
"Yale University. Office of the President.",

6. Create query in Navicat to retrieve nameids for any names that did make it into the AT:

SELECT nameId, sortName FROM Names WHERE sortName IN ("Yale University. Office of the Secretary.", "Yale University. Office of the President.", "Yale University. Office of the Provost") ORDER BY sortName;

7. Paste nameIds into master file with resourceIds and corpnames, famnames and persnames.

For those records with nameIds, proceed to step 13. Otherwise, for those names not present in the name table, proceed to step 8.

8. Open Names table in Navicat and proceed to last record

9. Copy last record in Names table into Excel to serve as model, noting column names. It is important to note that corpnames and persnames use different columns/fields so make sure to examine records in Names table for formatting both persnames and corpnames.

10. Copy contents of your master original file with persnames, famnames and corpnames into the new spreadsheet according to the Name table structure. Use Excel's fill-in feature to fill in data as needed and assign sequential nameIds from last record in table. Make sure to format the lastUpdated and created fields as text in Excel so as to mirror date encoding in the AT. Here is what your spreadsheet should look like: sample (.xls).

11. Paste records into Names table using Navicat

12. Copy nameIds for newly created names back into your master file of resourceIds and corpnames and persnames

13. Open ArchDescriptionNames table in Navicat

14. Go to last record and copy into new Excel file to serve as model for formatting data

15. Copy contents of master file with resourceIds, corpnames, famnames, and persnames, and nameids into Excel file mirrororing structre of sample record. Use Excel's fill-in feature to format data and assign sequential ids from last record in table. Here is a what your spreadsheet should look like: sample (.xls).

16. Paste contents from Excel file into ArchDescriptionNames table using Navicat

A couple of caveats. First, it is important that these steps be taken by someone comfortable with the aforementioned programs, as well as MySQL. Second, and most important, id creation and pasting of data directly into the AT tables should be done when no one else is using to AT to prevent accidental overlap or duplication of work. Finally, make sure to test the results, including creating new names and resource creator links in the AT client just to make sure everything is ok. If something does go haywire, you can always delete the records from the AT tables you just pasted in.

No comments:

Post a Comment