Reverse Engineering the Image Library: a case study on the feasibility of using deep learning to identify significance in a 35mm slide collection
Abstract
The Columbia University Department of Art History and Archaeology holds approximately 400,000 35mm slides, but like other institutions without a master catalog, the collection is tremendously time-consuming to sort, leaving resources to languish in storage. Over the last year, the Media Center for Art History at Columbia University used deep learning and optical character recognition software to detect original photographic images in the 35mm slides collection. Both technologies served to classify images as copywork or an original photo. This project aimed to apply transferable techniques that will enable other collections to partially automate the process of cataloging and identifying significant images to create an open-source, scalable framework for archival discovery across humanities fields. This paper seeks to describe the methods and challenges and make clear the processes investigated. This project was generously supported by a Sparks! Ignition Grant from the Institute of Museum and Library Services.
This article has undergone a double-blind peer review process.
The VRAB does not require copyright transfer, only permission to publish and archive the article. Copyright holders retain copyright ownership, granting a nonexclusive license to the journal and OJS to publish the article, meaning that the author may also publish it elsewhere. Before submitting an article to the journal, please be sure that all necessary permissions have been cleared in any third party material.
This is an open access journal; users are allowed to read, download, copy, distribute, print, search, or link to the full texts of the articles, or use them for any other lawful purpose, without asking prior permission from the publisher or the author. All issues of the journal are licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).