SEMI-AUTOMATED CLASS NUMBER PREDICTION OF BIBLIOGRAPHICAL RESOURCES: A FRAMEWORK DEPLOYING ANNIF

Authors

Meghna Biswas
Global Library, O.P. Jindal Global University Sonipat Narela Road, Near Jagdishpur Village, Sonipat, Haryana- 131001, India

Synopsis

This study investigates an AI/ML-based semi-automated indexing system for libraries to efficiently process large document collections. Using supervised learning within Python's Annif framework, we trained models on manually classified MARC bibliographic records organized by Dewey Decimal Classification (DDC) standards. The implementation involved collecting and processing records containing titles, summaries, DDC numbers and subject descriptors, then dividing them into training and test datasets. We evaluated four algorithms (TF-IDF, Omikuji, FastText and NN Ensemble) using standard retrieval metrics (F1@5 and NDCG), finding that Omikuji and NN Ensemble significantly outperformed the others in indexing accuracy. The complete open-source framework demonstrates the viability of machine learning for library classification tasks, offering an efficient alternative to manual indexing while maintaining accuracy. These results suggest promising applications for AI in knowledge organization systems, with potential for expansion to other classification schemes and larger datasets to further enhance performance.

Keywords: Supervised Machine Learning, Semi-Automated Classification, Automated Subject Indexing, DDC, Annif, Ensemble approach.

Downloads

Forthcoming

April 28, 2025

Categories

License

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

SEMI-AUTOMATED CLASS NUMBER PREDICTION OF BIBLIOGRAPHICAL RESOURCES: A FRAMEWORK DEPLOYING ANNIF. (2025). In Advancing Library and Information Science: Innovations, Practices, and Future Directions (pp. 83-104). Vyom Hans Press. https://doi.org/10.34256/vadlibs.25.9.83