Start date: 1 March 2007
End date: 30 September 2008
Funding programme: Digital Preservation and Records Management Programme
Project website:
http://www.ahds.ac.uk/about/projects/soapi/index.htm
JISC theme(s): Information environment
Committees: JISC Integrated Information Environment committee
Overview
The Arts and Humanities Data Service (AHDS) is responsible for preserving a variety of digital resources arising from arts and humanities research. Broadly speaking, the AHDS’ approach to digital preservation involves the normalisation of data to suitable preservation formats and capture of preservation metadata on ingest, supported by ongoing monitoring for format obsolescence and performance of remedial action as required. These activities are currently performed manually by appropriate staff. The purpose of the SOAPI project is to identify new ways to automate many of the tasks associated with ingest and long-term preservation, through the creation of software tools and corresponding workflows.
Aims and objectives
The primary aim of the project is to produce a software toolkit that allows repository managers to perform activities associated with ingest and preservation is a scalable manner. The toolkit must be capable of supporting workflows comprised of automated and manual stages; easily configurable to the needs of each digital repository; and suitably flexible to allow the integration of third-party tools that are currently in use or will be produced in the future.
Project methodology
The approach of the project will be:
- Produce use cases representing the functionality that the toolkit should support. These will to a great extent be derived from the AHDS’ documented ingest and preservation procedures, although they will also cover configuring and extending the toolkit
- Investigate the technologies in detail, develop prototypes for internal evaluation, and produce an architecture document. The approach will be to develop modular services that can be combined to implement workflows meeting the requirements of particular repositories
- Develop and test the toolkit software
- Evaluate the toolkit in a number of environments
Anticipated outputs and outcomes
The main deliverable will be a toolkit composed of: web services tailored to perform automated ingest and preservation functions; web-based forms to allow for manual entry of information that cannot be automatically produced; and finally, a workflow tool that combines each functional activity into a single automated, or semi-automated workflow.
The toolkit will be repository-independent and configurable to allow the definition and implementation of workflows appropriate to the unique requirements of each repository. In the current project, the toolkit has been integrated with Fedora, however it may be subsequently tailored by third parties to other repository software.
Technology / Standards used
|
Standard or specification |
Version |
Notes |
|
METS |
1.5 |
Use as packaging format for digital objects and their metadata. |
|
MPEG-DIDL |
2 |
Will be considered as an alternative to METS, although it may not be supported completely within the project timescale. |
|
SOAP |
1.2 |
Web service standard. |
|
WSDL |
1.1 |
Web service standard. |
|
UDDI |
3 |
Web service discovery standard. |
|
PREMIS |
1 |
Data dictionary and XML schemas for preservation metadata. |
|
RDF Specifications |
Latest |
W3C Recommendations |
|
OWL |
1 |
W3C Recommendation |
Technologies: web services, Java, jBPM, JBoss, Axis, Castor, Fedora, Shibboleth.
project staff
Project Manager
Mark Hedges, King’s College London, Centre for e-Research / Arts and Humanities Data Service, Telephone: 020-7848-1970, Fax: 020-7848-1989, Email: firstname.lastname@kcl.ac.uk
Project Team
Andreas Mavrides – Technical Officer (software development)
Malcolm Polfreman – Information Officer (metadata issues)
Gareth Knight – Preservation Officer (digital preservation issues)