Sunday, October 25, 2009
AT Tips & Tricks: Transfer Components
One of the key weaknesses, in our opinion, of the early AT releases was the inability to import and attach EAD for an accession or addition to an existing resource. The Yale University Archives uses an inventory template, which offices fill out and email to us with each new accession. The template, an Excel spreadsheet, incorporates EAD tags and allows us to simply copy and paste encoded description directly into a finding aid, requiring only minimal clean-up. Without a means for importing this partial EAD into early versions of the AT meant we either had to re-enter the information into the AT or delete the resource and re-import it with the addition. The problem with the later is that deleting the resource also deletes all the location and instance information tied to the resource, which, in our case, is sizable. With the addition of Transfer Components functionality in v.1.5, however, the AT allowed us to import partial EAD without losing information assigned in the AT. Here's how it works.
First, create a dummy EAD finding aid with the addition/accession EAD as a component. I generally add the new accession EAD to the existing finding aid for a resource and delete all other components. Second, import said finding aid into the AT. Third, open the the resource in the AT into which you want to transfer components. Fourth, click on the Transfer button and select the dummy resource to import components from. Fifth, update the resource as necessary (e.g. extent, dates, etc.). Lastly, delete the dummy resource from the AT. That's it. It's that simple.
We originally thought about creating a plug-in to allow us and others to import and append EAD components into a resource, but ultimately decided/realized that the Transfer function was sufficient and free. Sometimes you just have to be a little creative in your approach to maximize the AT's functionality.
First, create a dummy EAD finding aid with the addition/accession EAD as a component. I generally add the new accession EAD to the existing finding aid for a resource and delete all other components. Second, import said finding aid into the AT. Third, open the the resource in the AT into which you want to transfer components. Fourth, click on the Transfer button and select the dummy resource to import components from. Fifth, update the resource as necessary (e.g. extent, dates, etc.). Lastly, delete the dummy resource from the AT. That's it. It's that simple.
We originally thought about creating a plug-in to allow us and others to import and append EAD components into a resource, but ultimately decided/realized that the Transfer function was sufficient and free. Sometimes you just have to be a little creative in your approach to maximize the AT's functionality.
Labels:
Functionality,
Tips and Tricks
Tuesday, September 15, 2009
AT Issues: Box Ranges
The most recent issue we've encountered that I'm sure others out there have already concerns box ranges. When EAD is imported into the AT with a range in the <container type="Box"> tag (e.g. <container type="Box">1-2</container> or <container type="Box">3, 256</container>), the AT creates a single instance for that component (e.g. Box 1-2 or Box 3, 256), rather than separate instances for each. The problem is that each instance is likely to have a separate barcode, box type and perhaps even location. When you click on Manage Locations, for example, you are presented with a single instance to which multiple, separate values need to be assigned. There are a few options at this point to address the issue.
First, and perhaps least desirous, is to fix your EAD to eliminate box ranges. Aside from the considerable labor/programming involved, your only option is to create separate components (i.e. clones) for each instance rather than one component with multiple instances. This is because although AT (1.5.9) allows you to create and export components with multiple instances (i.e. <c0x> with multiple <container type="Box"> tags, it does not allow you to import such in the same fashion. Instead, each (and only up to 3) is imported as a separate container type into a single instance. All subquent instances tied to a component are lost. Fortunately, I am told that version 2.0 will support import of multiple instances if parent/id attributes are used for each container tag.
Second, you can fix (i.e. break apart) the ranges in the AT. You can do this in two different ways depending on how you want to characterize each instance. One, you can create separate components (i.e. clones) for each instance. Two, you can create multiple instances within a single component. The problem with the first option is that your resource/finding aid loses scanability, includes somewhat redundant info, and may grow quite large if you have several sizable ranges. The problem with the second approach is that you will need a style sheet to customize display of instances, perhaps turning back into a range if you so choose.
We've decided to address the box range issue in a combination approach, fixing some instances in EAD, addressing most programmatically in the AT with code we're developing to clone components part of ranges. Hopefully this code can be added to one of our existing plug ins that assigns other instance info, allowing others the option of creating clones for components part of a range. More to come on that soon.
First, and perhaps least desirous, is to fix your EAD to eliminate box ranges. Aside from the considerable labor/programming involved, your only option is to create separate components (i.e. clones) for each instance rather than one component with multiple instances. This is because although AT (1.5.9) allows you to create and export components with multiple instances (i.e. <c0x> with multiple <container type="Box"> tags, it does not allow you to import such in the same fashion. Instead, each (and only up to 3) is imported as a separate container type into a single instance. All subquent instances tied to a component are lost. Fortunately, I am told that version 2.0 will support import of multiple instances if parent/id attributes are used for each container tag.
Second, you can fix (i.e. break apart) the ranges in the AT. You can do this in two different ways depending on how you want to characterize each instance. One, you can create separate components (i.e. clones) for each instance. Two, you can create multiple instances within a single component. The problem with the first option is that your resource/finding aid loses scanability, includes somewhat redundant info, and may grow quite large if you have several sizable ranges. The problem with the second approach is that you will need a style sheet to customize display of instances, perhaps turning back into a range if you so choose.
We've decided to address the box range issue in a combination approach, fixing some instances in EAD, addressing most programmatically in the AT with code we're developing to clone components part of ranges. Hopefully this code can be added to one of our existing plug ins that assigns other instance info, allowing others the option of creating clones for components part of a range. More to come on that soon.
Labels:
Functionality,
Implementation
Monday, August 3, 2009
Yale AT plug-in development: status report
This post provides an update on MSSA's plug-in development efforts.
- Revised Analog Instance module/view
To increase the AT's collections management functionality and usability MSSA has asked for user-defined fields (two strings, two booleans) to be added to the Analog Instance table/module in version 1.7. We will use these fields to facilitate interaction with our ILS (Voyager), using them to capture Voyager Bib and Holdings numbers, box type, and flags for item-level restrictions and tracking of exported items. Here is a screenshot of the revised Analog Instance view our plug-in presents when an individual instance is clicked in the main Resource module. - Assign Container Information plug-in
A second functional improvement MSSA is contributing is enhanced batch container information assignment via an Assign Container Information plug-in. - EAD export plug-in
Yale EAD instances validate against a Yale-specific RNG schema informed by Yale's EAD best practices guidelines. Because this schema differs from the EAD 2002 schema, we need to develop an EAD export plug-in to modify data in the AT to validate against the Yale schema for easy ingest into our finding aids database. Such a plug-in will allow others to modify their data to meet their various output needs.
Status: not yet started. - Partial EAD import plug-in
Because MSSA accessions several hundred additions to collections each year, many coming in with inventories and EAD ready to paste into a finding, deleting and re-importing finding aids (and corresponding instance info)--the only way currently to input EAD aside from direct entry into the AT, is impractical for us. Hence, we will be developing a partial EAD importer, allowing import of new addition/accession EAD (e.g.... ) to append to an existing AT resource.
Status: not yet started. - Lookup/read-only plug-in
To facilitate collections management and public services activities in MSSA, we have need to develop a quick look-up or browse plug-in that will allow us to select specific data fields (e.g. accession number) to retrieve certain information (e.g. barcode) without the need to build the entire resource, accession, or digital object record.
Status: not yet started. - Export to ILS plug-in
To facilitate easy export from the AT to an ILS we will be developing a text file export plug-in. The plug-in will allow you to format data entered into the AT for easy export/ingest into an ILS. Because we are one of I believe two institutions with a Voyager import API, in our case the plug-in will export the text file to an intermediary application that will in turn ingest the AT data into Voyager. Status: not yet started. - Revised Digital Object Instance module/view
To facilitate batch digital object creation we plan to modify the Digital Object Instance module in a similar fashion as we have done with analog instances, allowing bulk creation and metadata assignment. Specifically, we want the ability to create and assign metadata for an arbitrary number of items via a '+ n' button, which allows you to enter the number of items you want to create with assigned values.
Status: not yet started.

Status: design finished; user-defined fields will be present in version 1.7 release.
Much more efficient than data entry via RDE, which requires assignment one instance at a time, the Assign Container Information plug-in allows batch box type, restriction flag, and ILS export assignment to selected instances all at once.
Additionally, we've designed a Rapid Barcode Entry module to allow use of barcode scanners to quickly wand in barcodes for containers.

The Rapid Barcode Entry interface allows you to either select a specific container or an arbitrary number of containers. If you select one container then it will be the starting point, allowing you to progress numerically through the boxes until finished. If several containers are selected, then that will be the group that the interface will assign barcodes to. Setting your barcode scanner to include a return/break will allow you to input barcodes as fast as you can scan them.
Status: design finished; should be ready for use with release of version 1.7.
Labels:
Development,
Functionality
Monday, July 20, 2009
Collaborative AT Instances: Pros & Cons
This post examines the pros and cons of consortial or collaborative AT instances. My comments are based on my experience administering the Yale University Library Collections Collaborative AT project and MSSA's AT development.
Pros
Pros
- The central benefit of a consortial/collaborative AT instance is the consolidation of systems, procedures, practices, and resources. Having one system, one set of procedures, one course of training across multiple repositories not only conserves resources, but also greatly facilitates consistency and efficiency across the institution. Such a configuration inherently allows for enhanced understanding of each other’s collections, and provides faster and more consistent access to collection information, as well as the possibility of—from one location—getting an overview of the special collections holdings across diverse repositories. In Manuscripts and Archives alone, implementation of the AT will result in the consolidation of numerous databases, centralizing collection information and reducing ongoing systems maintenance for these un-integrated databases.
- Centralizing collection and other archival information leads to increased security, as potentially sensitive information will no longer be scattered across databases, electronic office files, and often, paper logs.
Cons
- The primary challenge facing collaborative/consortial instances is that the AT does not scale well for large, complex data sets and hence causes noticeable performance issues. Based on an acknowledged design flaw, the AT will impede performance/functionality at a certain point (see my previous post on Resource loading). As a result, especially for large institutions with multiple repositories or, more importantly, with extensive legacy data, consolidating multiple systems in a single AT instance will result in a slow performing system. Unfortunately, there is currently no plan to alter the design of the AT to address this issue in the 2.0 release, but the potential exists for a third round of grant support (to merge the AT with Archon) that would allow for such. The only alternatives at this point are for one of us institutions to pay a consultant to do the redesign work, which might be costly, or to develop lookup tools/plug-ins that query the database without having to say build an entire Resource.
- Sustainability is another major issue. With two major releases left, development of the AT is coming to a close. Unfortunately this is happening just as many of us are finally getting around to evaluating and adopting the AT. Thankfully though with version 1.5.9 the AT team introduced plug-in functionality as a viable means for the community to customize and develop the AT. In addition, it's exciting to hear that the opportunity exists for a third round of grant support to merge the AT with Archon. Beyond greatly expanding the functionality and usability of AT/Archon, this would allow for finishing any existing development commitments still on the table when the AT development ends.
- A third issue for consortial/collaborative instances is that any modifications/customization done with the AT by the superuser, e.g. modifying default values, lookup lists, user-defined fields, etc., apply to the instance as a whole and cannot be applied to just a single repository. Extensive modification of lookup lists, user-defined fields, and default values may thus result in a cluttered, hard to use interface or may even impede efficiency and performance. Plug-ins may be a solution to this problem though as they can be used to alter appearance and workflow.
- A fourth issue is that the only way to restore an instance (e.g. after a crash or failure) is to import an entire MySQL dump. Hence if one repository or individual screws something up, you'll have to go back to the last backup of the entire database, which may cause lots of data to be re-entered.
- A fifth challenge is the difficulty of migrating legacy data, especially instance or box information (e.g. location, box type, barcode, etc.) into the AT. Migration can also be difficult for those institutions that do not have EAD 2002 or who lack the expertise to export, format and import their legacy data. For those institutions like us with an significant amount of complex legacy data the only real option is to hire a consultant and develop a custom import process/plug-in.
- A sixth shortcoming of the AT is it's inability to support the full EAD tag set, meaning that addition tools (e.g. stylesheets) or systems may be necessary to fully manage an institution's finding aids. On the bright side, especially for smaller institutions or those who lack a finding aid display/delivery system, the proposed AT-Archon merger might address the issue of full EAD support.
- A seventh issue is the inability to batch edit data in the AT. For those with a little know-how though a MySQL database administration tool such as Navicat can be used to query and update data in the AT's MySQL tables. This is definitely beyond the capability of the typical user so you may want to address this inability via a plug-in.
All told, it might seem that AT has more than its fair share of cons. This is not what I want to get across. True, there are some issues to take into consideration especially when creating a collaborative instance, but as MSSA is easily the largest user of the AT, none of what we've encountered thus far is a real deal-breaker. True, we'd like it to perform better, but there are options at this point to at least sidestep this issue until the AT/Archon redesign. And with plug-in functionality now available and soon to be expanded, the means for addressing the AT's shortcomings is now in the hands of the community. We just need to step up.
Labels:
Development,
Functionality,
Implementation,
Performance
Monday, July 6, 2009
AT Issues: Resource Handling
We have run into a significant performance issue at the half-way juncture in our legacy data migration efforts. We have found that Resources are taking much too long to load--small ones are taking at least 15 seconds; medium resources are taking up to 3 minutes; our largest ones are taking upwards of 5 minutes. Again, since we're only half-finished migrating our legacy data, the problem is likely to get much worse, especially as we begin to create/describe the thousands of digital objects we've been waiting to use the AT for.
We think the reason for the problem is how the AT loads Resources, i.e. recursively issuing SQL commands for each component, rather than issuing a single command for all resource components at once. Hence, given a large, complex finding aid (e.g. one with several levels of nested components and associated instance and location information), the AT grinds to a stand-still as it tries to systematically open all of the hierarchy. The main issue here appears to be the number and depth of nested components. Resources with only one or two levels of nested components do seem to open right up, the exception being two-level resources that have larger numbers of sub-components. However, resources that have three or more levels of nested hierarchy always open slowly, even if they don't have many actual associated instances.
We have tried to address the problem by maximizing the memory assigned to the client (see AT Wiki), boosting the database server's use of system memory, increasing the key buffer size, and engaging Memory Lock to discourage swapping--all to no avail. These steps however did boost performance of the MySQL database administration tool Navicat, which we use to access the MySQL database. This therefore points to the problem as lying in the AT and its Java encoding. This hypothesis was further supported when we set up a slow query log (i.e. anything over 2 seconds) on the database and tested resource loading, finding that no such queries were logged. As a result, all of our efforts seem to suggest that while we can indeed speed up operation of the MySQL database, we cannot speed up the performance of the Java that hits or interacts with the database. Since this however lies in the core of the AT code, it would have to be addressed either by the AT development team and put into the next/final release, or developed externally and then somehow offered back to the AT community. We are not sure yet how to proceed. An alternative that we and perhaps others might be happy with would be for only the top-level components (i.e. c01s) be built and displayed within the main Resources window, with further sub-components built when clicked. After all, you don't need to see/retrieve the whole Resource to make an edit to a specific part.
One of the questions this problem has raised for us is why hasn't anyone noticed/reported this performance issue? Was this borne out in scale testing? Surely there are other repositories who have loaded large finding aids, or who are working with complex resources or digital objects. Are they not using/editing these Resources and therefore not noticing the lag in performance? Perhaps they haven't ingested the amount of instance and location information that we have? Or, is there an upper data limit to the AT's functionality? That seems short-sighted. Regardless, others are bound to run into the problem sooner or later as they begin to ingest more and more data into the AT. As a result, we hope that progress can be made to address this critical problem.
We think the reason for the problem is how the AT loads Resources, i.e. recursively issuing SQL commands for each component, rather than issuing a single command for all resource components at once. Hence, given a large, complex finding aid (e.g. one with several levels of nested components and associated instance and location information), the AT grinds to a stand-still as it tries to systematically open all of the hierarchy. The main issue here appears to be the number and depth of nested components. Resources with only one or two levels of nested components do seem to open right up, the exception being two-level resources that have larger numbers of sub-components. However, resources that have three or more levels of nested hierarchy always open slowly, even if they don't have many actual associated instances.
We have tried to address the problem by maximizing the memory assigned to the client (see AT Wiki), boosting the database server's use of system memory, increasing the key buffer size, and engaging Memory Lock to discourage swapping--all to no avail. These steps however did boost performance of the MySQL database administration tool Navicat, which we use to access the MySQL database. This therefore points to the problem as lying in the AT and its Java encoding. This hypothesis was further supported when we set up a slow query log (i.e. anything over 2 seconds) on the database and tested resource loading, finding that no such queries were logged. As a result, all of our efforts seem to suggest that while we can indeed speed up operation of the MySQL database, we cannot speed up the performance of the Java that hits or interacts with the database. Since this however lies in the core of the AT code, it would have to be addressed either by the AT development team and put into the next/final release, or developed externally and then somehow offered back to the AT community. We are not sure yet how to proceed. An alternative that we and perhaps others might be happy with would be for only the top-level components (i.e. c01s
One of the questions this problem has raised for us is why hasn't anyone noticed/reported this performance issue? Was this borne out in scale testing? Surely there are other repositories who have loaded large finding aids, or who are working with complex resources or digital objects. Are they not using/editing these Resources and therefore not noticing the lag in performance? Perhaps they haven't ingested the amount of instance and location information that we have? Or, is there an upper data limit to the AT's functionality? That seems short-sighted. Regardless, others are bound to run into the problem sooner or later as they begin to ingest more and more data into the AT. As a result, we hope that progress can be made to address this critical problem.
Labels:
Functionality,
Performance
Wednesday, June 17, 2009
Finding Aid Clean-Up: Box Numbers
As we proceed with our AT development we are spending considerable time cleaning up and standardizing our finding aids. Aside from the work I've mentioned previously to create consistent dates, extent statements and subjects, the main focus of our latest efforts is the standardization of container information (i.e. box numbers). The reason for all this work is to allow us to programmatically hook our location information (e.g. box type, barcode, vault, shelf, etc.) to our finding aids in the AT, the key to which it turns out is the box number.
Like many repositories, our arrangement and descriptive practices have waxed and waned over the years. Although our collection numbers and accession numbers have more or less been consistently applied, our box numbers have not, particularly with used in connection with our practice of housing small quantities of odd-sized materials in common containers. Formerly, we housed such items, especially folio or slides in what we called common folio or common slide boxes (i.e. containers housing materials from multiple collections in a single or communal box), assigning a box number for the common folio/box and a folder number for the individual folder. Aside from the clear practical issues involved in administering such common containers, we've run into problems as we try to tie the box numbers we've assigned these items in our locator database to our finding aid data in the AT. More specifically, as our descriptive practice has varied over the years, the assignment of box numbers and box number extensions (e.g. an alphanumeric character used to indicate a use copy or duplicating master of a particular item) for these items has been inconsistent, unfortunately differing a great deal from box/container info in our EAD. For example, what appears in the finding aid (i.e.) as "MS Common Folio 10" is entered in our locator database with Box '1' and BoxNumberExtension 'CF1F10'. As a result, we've had to manually edit data both in our location database and in our finding aids/AT for all these items.
This is a short-term solution. These items really need to be rehoused and all such common containers need to be done away with, not only due to the issues at hand, but also to facilitate say the creation of future use copies of these materials.
Like many repositories, our arrangement and descriptive practices have waxed and waned over the years. Although our collection numbers and accession numbers have more or less been consistently applied, our box numbers have not, particularly with used in connection with our practice of housing small quantities of odd-sized materials in common containers. Formerly, we housed such items, especially folio or slides in what we called common folio or common slide boxes (i.e. containers housing materials from multiple collections in a single or communal box), assigning a box number for the common folio/box and a folder number for the individual folder. Aside from the clear practical issues involved in administering such common containers, we've run into problems as we try to tie the box numbers we've assigned these items in our locator database to our finding aid data in the AT. More specifically, as our descriptive practice has varied over the years, the assignment of box numbers and box number extensions (e.g. an alphanumeric character used to indicate a use copy or duplicating master of a particular item) for these items has been inconsistent, unfortunately differing a great deal from box/container info in our EAD. For example, what appears in the finding aid (i.e.
This is a short-term solution. These items really need to be rehoused and all such common containers need to be done away with, not only due to the issues at hand, but also to facilitate say the creation of future use copies of these materials.
Labels:
Finding Aids,
Implementation
Sunday, May 31, 2009
Finding Aid Handling
This post will examine issues we've encountered with the AT concerning finding aids.
- EAD Import
We've run into 5 issues importing EAD into the AT. First, as I mentioned in a previous post, we've had problems importing both large EAD files (6+ MB) and large numbers of files. Even with increasing the memory assigned to the AT (see Memory Allocation) and installing the client on the same machine as the database, importing these files has crashed import, causing us to import in small batches or one file at a time as needed. Second, we've encountered issues withhandling for those finding aids with a parallel extent or container summary. Here we've found that although encoded correctly, the AT is inconsistent, sometimes assigning a blank extentNumber, other times combining extentNumber and containerSummary in extentType, or, most often, assigning a blank extentNumber, extentType and throwing everything into a general physical description note. This calls for a lot of manual clean-up depending upon how you encode . Third, although encoded correctly, we and other Yale repositories have found inconsistent languageCode assignment, most often resulting in blank values. Fourth, as perhaps many others of you have experienced, we’ve had problems with Names and Subjects, both in the creation of duplicates vales and in names getting imported as subjects (and vice versa). This is likely due to how the AT treats names as subjects—a complicated concept which may or may not be able to be revised for 2.0. Fifth, we’ve found inconsistent handling of internal references/pointers, sometimes getting assigned new target values, sometimes not. Whether or not new target values are created following merging of components is another issue on the table for future investigation. - DAO handling
Unfortunately one of the EAD elements currently not handled by the AT is, which our Yale EAD Best Practices Guidelines and common EAD authoring tool (XMetal) have set up to handle all Digital Archival Objects. As a result, all our DAOs are unable to be imported into the AT. This is a major issue that needs to be addressed in the 2.0 DAO module revision. - EAD Export/Styleshseet
Although it is possible to modify or replace the stylesheet the AT uses to export EAD or PDF (see Loading Stylesheet) it’s not currently possible to select or utilize multiple stylesheets, which may be important for multiple repository installations and/or developing flexible export possibilities. - Batch editing
I’ve mentioned previously in another post that one of the key weaknesses of the AT is the inability to batch edit. Given the lack of batch editing functionality, a repository will either have to commit to greater planning and development prior to integration, or use a database administration tool such as Navicat for MySQL to clean up data once in the AT.
Labels:
Finding Aids,
Functionality
Subscribe to:
Posts (Atom)