Comparative Study and Expansion of Metadata Standards for Historic Fashion Collections

Dina Smith-Glaviana; Wen  Nie Ng; Caleb McIrvin; Chreston Miller; Julia Spencer

Dina Smith-Glaviana Virginia Tech
Wen Nie Ng Virginia Tech https://orcid.org/0000-0003-4143-254X
Caleb McIrvin Virginia Tech https://orcid.org/0009-0009-6718-459X
Chreston Miller Virginia Tech https://orcid.org/0000-0003-4276-0537
Julia Spencer Virginia Tech https://orcid.org/0000-0001-5577-5573

Keywords: Costume Core, controlled vocabularies, standardization, fashion collections

Abstract

This research seeks to contribute to efforts to standardize metadata across the costume and fashion domain by adding new metadata elements and controlled vocabularies to Costume Core. Expanding the metadata schema could increase the searchability and discoverability of fashion collections. To expand Costume Core, we used vocabulary from pre-trained Natural Language Processing (NLP) models to identify potential new descriptors from a conceptual latent space provided by a technique known as word embeddings. We also pulled from controlled vocabularies shared by other fashion collection personnel across the United States via online surveys.

The NLP techniques involved using a language model pre-trained on the Google News dataset to pinpoint terms similar to those in Costume Core. MOCHA, a Model Output Confirmative Helper Application, was developed to facilitate the review of potential descriptors. The results of the NLP analysis showed a difference between generated descriptors predicted to be accurate, and descriptors deemed accurate and confirmed by a fashion domain expert. However, using machine learning models for metadata expansion is justifiable due to the accuracy of generated descriptors and time-saving potential, as NLP analysis allowed for selection from a wider array of descriptors.

The revision process also resulted in identifying 528 new potential descriptors. The survey data indicated high variability in: collection cataloging systems; the resources used to determine accurate vocabulary for cataloging artifacts; the controlled vocabularies used; and how vocabularies were categorized, reflecting a lack of standardization in the field. However, by crowdsourcing controlled vocabularies, we discovered 48 new vocabularies that may be used to expand the Metadata schema.

In addition, the study provided insight into adding metadata elements in the form of fields or columns, such as those relating to medium such as fiber, fabric structure, and color, including hue, value, and intensity. The addition of such metadata elements could potentially enrich the schema and promote greater standardization of metadata across fashion collections.

Author Biographies

Dina Smith-Glaviana, Virginia Tech

Dina Smith-Glaviana is an Assistant Professor of Fashion Merchandising and Design in the Department of Apparel, Housing, and Resource Management at Virginia Polytechnic Institute and State University (VA Tech). Her research interests include dress and popular culture, subcultural dress, and re-enactment and historic dress.

Wen Nie Ng, Virginia Tech

Wen Nie Ng holds the position of Assistant Professor and Digital Collections Librarian within the Digital Libraries and Preservation Unit at the Virginia Tech University Libraries. Wen is responsible for managing online access, collection maintenance, and metadata creation for the unit while ensuring the seamless operation of the library's intranet. Possessing an MIS degree from Indiana University, Wen is also a certified web accessibility specialist and user experience designer. Their research interests span metadata, digital asset management systems, 3D modeling, and project management.

Caleb McIrvin, Virginia Tech

Caleb McIrvin is an accelerated master’s student in the computer science program at Virginia Tech. His research interests include the extension of machine learning techniques to quantum computation, with an emphasis on reinforcement learning. Previous papers include work on interpreting the effects of simulated noisiness in NISQ-era quantum circuits.

Chreston Miller, Virginia Tech

Chreston Miller is an Assistant Professor and the Data and Informatics Consultant for Engineering within Data Services at the Virginia Tech University Libraries. His consulting work led him to specialize in Natural Language Processing (NLP). He has also co-authored several articles on NLP dealing with unique language identification, freeform text classification within a small dataset, and experiences with open resources available for deep learning tools and algorithms. His research interests include Applied Machine Learning, NLP, and Human-Centered Computing.

Julia Spencer, Virginia Tech

Julia Spencer is the Digital Imaging Librarian at Virginia Tech’s Newman Library. She received her MLIS from Wayne State University as well as a graduate certification in Archival Administration. She has previously digitized materials at the Benson Ford Research Center at The Henry Ford and worked on the Detroit 67 Oral History Project with the Detroit Historical Society.