Content as Data: Archival Approaches to Computational Analysis

There is a growing interest in viewing “content as data.” Generally speaking, work in this space seeks to explore what can be possible when archives and cultural heritage organizations begin to think about, prepare, describe, and provision access to content data in ways that promote its amenability to computational use. Approaches towards data driven research demands authenticity and context of the data source be established through archival principles of documentation, transparency, and provenance. Providing access to content data can be as simple as making finding aids available in EAD and RDF, so that the information can be harvested for digital humanities projects. More complex access to content data implements AI applications, which allows systems and tools to deeply query the data, supporting automated workflows, new distribution platforms, and emerging technologies.

Panel sessions and brief 10-minute lightning talks for this stream are being sought from individuals and groups. Topics should highlight computational analysis of moving image content and may include:

  • Developing an iterative framework geared towards collaborations and integrations between new and established technologies, digital end-to-end workflows, and service models
  • Machine learning algorithms and statistical models that enhance discovery, use, and re-use
  • Expanding user engagement with crowdsourcing and social media to amplify interest in content
  • Support for diverse users in designing “collections as data” programs
  • Preparing and managing data with standards that exploit use and re-use in the digital content lifecycle
  • Approaches to digital transformation and innovation that have reshaped the birth to preservation lifecycle

 

This is planned as a five-session program stream on Friday, November 13, 2020.

468 ad