Friday, October 13, 2006

RDF - Connecting Software and People

An short 30 minute in depth introduction to the Semantic Web for people with some Software engineering background. Presenting the problem ... all » of the Mythical Man Month, this shows how one can use an OWL Ontology, to describe the relation between people and software bugs. Makes the case that the best way to get the Semantic Web going is by opening up a valuable database to a SPARQL end point. presentation and more available at:http://blogs.sun.com/roller/page/bblfish/20060323

Wednesday, October 11, 2006

Technorati

Technorati Profile

Lucene: How to design your own Web search application ?

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Apache Lucene is an open source project available for free download. Please use the links on the left to access Lucene.


How to design your own Web search application ?:

"Beef up Web search applications with Lucene"

by Deng Peng Zhou

mplement advanced search with Lucene

"Lucene supports several kinds of advanced searches, which I'll discuss in this section. I'll then demonstrate how to implement these searches with Lucene's Application Programming Interfaces (APIs).

Boolean operators

Most search engines provide Boolean operators so users can compose queries. Typical Boolean operators are AND, OR, and NOT. Lucene provides five Boolean operators: AND, OR, NOT, plus (+), and minus (-). I'll describe each of these operators.

  • OR: If you want to search for documents that contain the words "A" or "B," use the OR operator. Keep in mind that if you don't put any Boolean operator between two search words, the OR operator will be added between them automatically. For example, "Java OR Lucene" and "Java Lucene" both search for the terms "Java" or "Lucene."
  • AND: If you want to search for documents that contain more than one word, use the AND operator. For example, "Java AND Lucene" returns all documents that contain both "Java" and "Lucene."
  • NOT: Documents that contain the search word immediately after the NOT operator won't be retrieved. For example, if you want to search for documents that contain "Java" but not "Lucene," you may use the query "Java NOT Lucene." You cannot use this operator with only one term. For example, the query "NOT Java" returns no results.
  • +: The function of this operator is similar to the AND operator, but it only applies to the word immediately following it. For example, if you want to search documents that must contain "Java" and may contain "Lucene," you can use the query "+Java Lucene."
  • -: The function of this operator is the same as the NOT operator. The query "Java -Lucene" returns all of the documents that contain "Java" but not "Lucene."

Now look at how to implement a query with Boolean operators using Lucene's API. Listing 1 shows the process of doing searches with Boolean operators.

Full Article : here

by Deng Peng Zhou is a graduate student from Shanghai Jiaotong University. He works as an intern software engineer in IBM Shanghai Globalization Lab and is interested in Java technology and modern information retrieval. You can contact him at zhoudengpeng@yahoo.com.cn.


MySQL 5.0.26 new version


MySQL The world's most popular open source database MySQL 5.0.26 new version has been released...

MySQL software is published under an open source license and is available in two ways:

  1. MySQL Community Edition is the freely downloadable version of the world's most popular open source database. It is supported by a huge and active community of open source developers and enthusiasts. MySQL Community Edition uses the GPL License, is released early and often, and includes all features, including the latest features under development.
  2. MySQL Network is available for users who want access to our world-class support services, Knowledge Base and certified software. This subscription service is designed to save developers and DBAs time and effort.

Library 2.0 links

" I have done a lot of reading while preparing for my presentation on Blogs and Wikis at the LRSN forum next month. Nearly everything leads me to find out more about the catchphrase “Library 2.0”.

It was only recently that I first heard about this phenomenon so I brought it up at our monthly Librarians’ Forum at Perth this morning – only to find out that, although we had been discussing several ideas that are important to Library 2.0, my colleagues were not aware of the term.

I’ve cobbled together some links to further reading that I have found worthwhile. If we want to be seen as being on top of new ideas then constant awareness of what’s being posted on the web is important and we should all monitor our RSS feeds regularly."

Published in mchabib.blogspot.com by Maeve Everest

Library 2.0 Links

http://mchabib.blogspot.com/2006/08/academic-library-20-concept-models.html



Greenstone 2.71


GLI:
-----
- new Format panel, containing all options relating to collection formatting.
These options don't require a rebuild to take effect. Preview Collection
button now also on this Format panel.
- default indexer for new collections is MGPP
- searchtypes now a format statement instead of a design option.
- new Macros section on Format panel - can edit collection's extra.dm directly.
- redesign of Search Indexes section of Design panel.
- Search indexes now displays options for stem, casefolding and accentfolding
(MG and MGPP only). If selected, the appropriate index will be built.
Search preferences for these options will depend on the appropriate index
being built.
- new Search section in Format panel: can edit display text for search form
drop down lists (indexes, subcollections, levels etc)
- metadata set management now on Enrich panel (Manage Metadata Sets button).
- New collections default to using the Dublin Core metadata set, and no set
prompt is given. This can be changed from the Enrich panel.
- updated help text
- now restarts itself if a new language is selected.
- plugins.dat and classifiers.dat no longer used. Plugin and Classifier
information is dynamically loaded when needed.

GEMS:
-----------
- reimplementation to make a simpler interface
- only one set is now open at a time
- A predefined set of attributes for set/element is provided
- Is launched from GLI to create a new metadata set or edit an existing one

Collection Exporting:
------------------------
- File->Export in GLI. Now supports exporting as Greenstone Archive (GA),
DSpace batch import, Greenstone METS, and MARCXML formats.
- All export types support the use of XSLT to transform resulting XML files.
For example, could export to GA format, then transform to a custom format
using an XSLT file

Export to CDROM:
------------------
- now has a -noinstall option, so that the resulting CDROM doesn't install
anything onto the host computer

MGPP:
------
- maxnumeric support added
- query term truncation (e.g. comput*) with casefolding fixed for non-ascii
query terms
- accent folding support added (thanks to Juan Grigera): Generates a
new index which folds accents, in the same way that case folding
works. Accent folding means that é will match e, and vice versa.
This is turned on by the user via the preferences page.
- Mongolian unicode support
More ....................here

Tuesday, October 10, 2006

Importing MARC data into DSpace

posted by Steve Thomas in http://stevethomas.blogspot.com/

Monday, August 07, 2006

"For the interest of anyone else working on Institutional Repositories, I’ve written a technical report on how we used MARC record data to build records for import into DSpace.

In brief, we had a couple of collections with documents held in local web servers and a record for each document in our catalogue. So we exported the catalogue records to a MARC file for each collection, then developed Perl scripts to convert the MARC records into the required data structure required to import into DSpace."

You can find the report in our Digital Library, at

http://digital.library.adelaide.edu.au/dspace/handle/2440/14784

"Integrating RSS Feeds of New Books into the Campus Course Management System"

A good Article by Edward M. Corrado and Heather L. Moulaison "Integrating RSS Feeds of New Books into the Campus Course Management System"

The Idea to Integrate

My idea was to find a way to integrate RSS feeds of new subject-specific books into the course management system (CMS) on campus. We all know that RSS is a great way to manage constant­ly updating information. There are two typical ways to view RSS: 1) Use an RSS aggregator (see sidebar on page 64), and 2) Have the feeds show up automatically on a Web page. The latter is the basis for my idea. The course instructor can set up an automatic RSS feed that appears when a student opens the course page in the CMS. Feeds can still be down­­loaded to an aggregator (at right), but this requires a user to open the aggregator, and our students already have too much to do (Byrne 2005). Embedding the information in the user’s work­ flow has been the fo­cus of some previous articles, and this seemed like a good way to have a captive audience of sorts (Dempsey 2005).

The goal of this particular idea was to get relevant lists of recent and available monographic acquisitions to display prominently within the CMS page for any given class on any given subject. The headlines would be tailored to that course, and would anticipate user need. TCNJ has its own homegrown course management system called the Simple Online Courseware System (SOCS). Other CMSs such as Blackboard already allow professors to display RSS feeds inside the CMS page. The IT folks working on SOCS were receptive to the idea of improving our system by adding a similar RSS feature. It was up to the library to create the RSS feed...........


Our Methodology for Feeding the Data

I create the RSS feeds of new library acquisitions in a three-stage process:

1. ‑Get the appropriate data out of the library catalog.

2. ‑Convert the data into an RSS feed.

3. ‑Display the RSS feed in the CMS. Since Voyager is built on a relational database (Oracle) and I am familiar with SQL*Plus, I was confident that getting the required data out of Voyager would be a minor detail. What was not a minor detail, however, was determining what data should be used. First, I had to decide what should be advertised in the feeds as a “new” acquisition. After consulting with technical services, I found that it can take as long as 3 days after being cataloged for an item to get processed and be shelved. I only want books to appear in the RSS feed if they are recently acquired and fully processed. Based on our library’s work flows, I decided that a book with an item record created between 3 and 60 days prior should be considered new for this project." for full article ....here


Tool: Zotero:Organize Your Research Resources


Zotero [zoh-TAIR-oh] is a free, easy-to-use Firefox extension to help you collect, manage, and cite your research sources. It lives right where you do your work — in the web browser itself.

Zotero is a production of the Center for History and New Media at George Mason University. It is generously funded by the United States Institute of Museum and Library Services, the Andrew W. Mellon Foundation, and the Alfred P. Sloan Foundation.


" Zotero is a free, easy-to-use research tool that helps you gather and organize resources (whether bibliography or the full text of articles), and then lets you to annotate, organize, and share the results of your research. It includes the best parts of older reference manager software (like EndNote)—the ability to store full reference information in author, title, and publication fields and to export that as formatted references—and the best parts of modern software such as del.icio.us or iTunes, like the ability to sort, tag, and search in advanced ways. Using its unique ability to sense when you are viewing a book, article, or other resource on the web, Zotero will—on many major research sites—find and automatically save the full reference information for you in the correct fields.

The 1.0 beta release of Zotero already provides advanced functionality for gathering, organizing, and scanning your research, as well as basic import/export capability and bibliographic formatting tools. Automatic updates to the software in the fall and winter of 2006-2007 will provide many more citation styles, the ability for Zotero to recognize even more online resources, even better support for importing and exporting entire collections, and integration with Microsoft Word and other word processors. And coming soon, Zotero users will be able to share their collections with other users, collaborate on research projects using Zotero, send their collections to other free web services (such as mapping or translation sites), and receive recommendations and feeds of new resources that might be of interest. In short, over the next year Zotero will expand from an already helpful browser extension into a full-fledged tool for digital research and communications. But there’s no need to wait: you can get started with your own Zotero library right now by downloading the public beta."

Features:

1.Automatic capture of citation information from web pages

2.Storage of PDFs, files, images, links, and whole web pages

3.Flexible notetaking with autosave

4.Fast, as-you-type search through your materials

5.Playlist-like library organization, including saved searches (smart collections) and tags

6.Platform for new forms of digital research that can be extended with other web tools and services

7.Runs right in your web browser

8.Formatted citation export (style list to grow rapidly)

9. Free and open source


Monday, October 09, 2006

How to creat RSS feed for your library web Site !



RSS : RSS stands for Really Simple Syndication. There is no single indisputable expanison to this acronym. Other expanisons are , Rich site summary, XML file format, Resource Description Framework Site Summary and Really stupid Syndication.

How to start use of Rss feeds in libraries:

Creating an RSS feed is not a big issue, you can careate RSS feeds without knowing of html or xml knowlede., by cuting and pasting another RSS feed, and then replacing the heperlinks and some of matter appers between specific tags.

In RSS at least seven developing standards are avialable . I will stick to RSS 2.0, which is slowly becoming the default standard. RSS an XML file format. If yu are familiar with XML or at least HTML, then you will have no tourble in creating your own RSS feed.RSS file consists of defined formats, like other XML schema. Most important one is "items". An item again consists of the following basic information. Title,Description, and Link. The start tag is usually start with and end tage denotated by , 1. Title - Library OPAC 2. Description - This is your library OPAC 3. Link- http://www.yourlibrary name.com


For code see image


for tag declarations. You will use the following tags




Close this file tag
save file as 'feeds.xml' and save it on your library web site.

Validating the RSS file

There are several web sites that provide you free validation e.g rss.scripting.com

Uses of RSS feeds in libraries:

"The use of RSS feeds in libraries seems unlimited – announcements, internet resource guides, major subject Web guides, updates of new acquisitions (books, CDs, etc), recently received journal issues, promoting instructional and reference services, updates on saved search results in both databases and OPACs" (McKiernan).