Paper
16 January 2006 Document clustering: applications in a collaborative digital library
Fuad Rahman, Aman Kumar, Yuilya Tarnikova, Hassan Alam
Author Affiliations +
Proceedings Volume 6067, Document Recognition and Retrieval XIII; 60670K (2006) https://doi.org/10.1117/12.650161
Event: Electronic Imaging 2006, 2006, San Jose, California, United States
Abstract
This paper introduces a document clustering method within a commercial document repository, FileShare(R). FileShare(R) is a commercial collaborative digital library offering facilities for sharing and accessing documents over a simple Internet browser (e.g. Microsoft(R) Internet Explorer(R), Netscape(R) or Opera(R)) within groups of people working on common projects. As the number of documents increases within a digital library, displaying these documents in this environment poses a huge challenge. This paper proposes a document clustering method that uses a modified version of the traditional K-Means algorithm to categorize documents by their themes using lexical chaining within the FileShare(R) repository. The proposed algorithm is unsupervised, and has shown very high accuracy in a typical experimental setup.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Fuad Rahman, Aman Kumar, Yuilya Tarnikova, and Hassan Alam "Document clustering: applications in a collaborative digital library", Proc. SPIE 6067, Document Recognition and Retrieval XIII, 60670K (16 January 2006); https://doi.org/10.1117/12.650161
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Digital libraries

Databases

Distance measurement

Internet

Genetic algorithms

Human-machine interfaces

Visualization

RELATED CONTENT

Mining the SDSS SkyServer SQL queries log
Proceedings of SPIE (May 12 2016)
Data processing and control system software for SPM
Proceedings of SPIE (July 16 2002)
Intelligent web agents for a 3D virtual community
Proceedings of SPIE (August 04 2003)
MetaSEEk: a content-based metasearch engine for images
Proceedings of SPIE (December 23 1997)
Dynamic neighborhoods: browsing the World Wide Web together
Proceedings of SPIE (September 16 1998)

Back to Top