SEMI-AUTOMATED CLASS NUMBER PREDICTION OF BIBLIOGRAPHICAL RESOURCES: A FRAMEWORK DEPLOYING ANNIF

Meghna Biswas

doi:10.34256/

SEMI-AUTOMATED CLASS NUMBER PREDICTION OF BIBLIOGRAPHICAL RESOURCES: A FRAMEWORK DEPLOYING ANNIF

Authors

Meghna Biswas

Global Library, O.P. Jindal Global University Sonipat Narela Road, Near Jagdishpur Village, Sonipat, Haryana- 131001, India

DOI: https://doi.org/10.34256/vadlibs.25.9.83

Synopsis

This study investigates an AI/ML-based semi-automated indexing system for libraries to efficiently process large document collections. Using supervised learning within Python's Annif framework, we trained models on manually classified MARC bibliographic records organized by Dewey Decimal Classification (DDC) standards. The implementation involved collecting and processing records containing titles, summaries, DDC numbers and subject descriptors, then dividing them into training and test datasets. We evaluated four algorithms (TF-IDF, Omikuji, FastText and NN Ensemble) using standard retrieval metrics (F1@5 and NDCG), finding that Omikuji and NN Ensemble significantly outperformed the others in indexing accuracy. The complete open-source framework demonstrates the viability of machine learning for library classification tasks, offering an efficient alternative to manual indexing while maintaining accuracy. These results suggest promising applications for AI in knowledge organization systems, with potential for expansion to other classification schemes and larger datasets to further enhance performance.

Keywords: Supervised Machine Learning, Semi-Automated Classification, Automated Subject Indexing, DDC, Annif, Ensemble approach.

Advancing Library and Information Science

Downloads

PDF

Volume

Advancing Library and Information Science: Innovations, Practices, and Future Directions

Pages

83-104

Forthcoming

April 28, 2025

Series

Digital Transformations in Society

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

SEMI-AUTOMATED CLASS NUMBER PREDICTION OF BIBLIOGRAPHICAL RESOURCES: A FRAMEWORK DEPLOYING ANNIF. (2025). In Advancing Library and Information Science: Innovations, Practices, and Future Directions (pp. 83-104). Vyom Hans Press. https://doi.org/10.34256/vadlibs.25.9.83

Download Citation

SEMI-AUTOMATED CLASS NUMBER PREDICTION OF BIBLIOGRAPHICAL RESOURCES: A FRAMEWORK DEPLOYING ANNIF

Authors

Synopsis

Downloads

Volume

Pages

Forthcoming

Series

Categories

License

How to Cite

Make a Submission

menu

Latest publications

Publisher Address:

Contact Info:

Information: