Reverse Engineering the Image Library: a case study on the feasibility of using deep learning to identify significance in a 35mm slide collection

Stefaan Van Liefferinge; Gabriel Rodriguez; Lisa Peck; Tim Trombley; Kate Burch; Karen Lin; Lauren Arnett

Stefaan Van Liefferinge Columbia University
Gabriel Rodriguez Columbia University
Lisa Peck Columbia University
Tim Trombley Columbia University
Kate Burch Greater Portland Landmarks
Karen Lin Columbia University
Lauren Arnett Columbia University

Keywords: 35mm slides, deep learning, neural net, computer vision, halftone, artificial intelligence (AI), automation, OCR, optical character recognition

Abstract

The Columbia University Department of Art History and Archaeology holds approximately 400,000 35mm slides, but like other institutions without a master catalog, the collection is tremendously time-consuming to sort, leaving resources to languish in storage. Over the last year, the Media Center for Art History at Columbia University used deep learning and optical character recognition software to detect original photographic images in the 35mm slides collection. Both technologies served to classify images as copywork or an original photo. This project aimed to apply transferable techniques that will enable other collections to partially automate the process of cataloging and identifying significant images to create an open-source, scalable framework for archival discovery across humanities fields. This paper seeks to describe the methods and challenges and make clear the processes investigated. This project was generously supported by a Sparks! Ignition Grant from the Institute of Museum and Library Services.

This article has undergone a double-blind peer review process.

Author Biography

Stefaan Van Liefferinge, Columbia University

The Media Center for Art History develops and supports fieldwork and research projects documenting, presenting, and interpreting works of art, architecture, and cultural heritage sites. The Center’s goal is to advance the digital humanities, explore digital technologies, and preserve and develop its visual collection. As part of the Department of Art History and Archaeology, the Media Center’s specialized personnel and facilities serve the closely related fields of archaeology, art history, and architectural history. The staff is composed of the Director, Stefaan Van Liefferinge, Digital Curator Gabriel Rodriguez, Assistant Curator Lisa Peck, and Educational Technologist Tim Trombley. Media Center staff has expertise in art history, archaeology, computer science, and architectural history.

Acknowledgements:

Lauren Arnett (Columbia College '19) and Karen Lin (Columbia College '21) were student programmers. Manual cataloging and classification were carried out by Dominique Groffman (Columbia College '19) and Sophie Fox (Barnard College '19).

This project was generously supported by a Sparks! Ignition Grant from the Institute of Museum and Library Services.