Thursday, November 02, 2006

Open Source Tool: Web Curator Tool : 1.1 GA Released


1.1 GA 21 September 2006 Release notes, known issues, and changelog

The Web Curator Tool (WCT) is a tool for managing the selective web harvesting process. It is designed for use in libraries and other collecting organisations, and supports collection by non-technical users while still allowing complete control of the web harvesting process.


Web Curator Tool as an open-source project.

New features

  • ReferenceNumber field for Target, Group and TargetInstance objects
  • FileReference field for Permission objects
  • Fields for recording Selection information in Target objects
  • HarvestType field for Target objects
  • Add a textbox labelled ?Name? to the Target Instance Search Form
  • ProfileNote field for Target objects
  • Basic descriptive (Dublin Core) metadata
  • Type field for Group objects
  • Sticky search forms: reset button needed [SF 1531646]
The tool's workflow encompasses the following tasks:


* Harvest Authorisation: seeking and recording permission to harvest web material, and to make it accessible to the general public.

* Selection and scoping: determining what material should be harvested, be it a
web site, a web page, a partial web site, a group (or collection) of web sites,
or any combination of these.

* Scheduling: determining when a harvest should occur, and when it should be
repeated.

* Description: describing harvests with basic Dublin Core metadata, and other
specialized fields (or a by a providing a reference to an external catalogue).

* Harvesting: the Web Curator Tool will download the selected web material at
the appointed time using the Internet Archive's Heritrix web crawler -- each
installation can have multiple harvesters on different machines, each which can
perform several harvests simultaneously.

* Quality Review: tools are provided for making sure the harvest worked as
expected, and correcting simple harvest errors.

* Endorsing and submitting: if the harvest was a success, it is endorsed then
submitted to an external digital archive.

No comments: