Large getphoto

Mining large cultural heritage corpora using deep learning methods

Frédéric Kaplan

Recorded 17 November 2016 in Lausanne, Vaud, Switzerland

Event: KTN - Know Thy Neighbor

Abstract

I will report on our ongoing investigations on three large-scale cultural heritage datasets: 4 Million Swiss newspapers articles covering a two hundred years period, 1 Million photographs of artworks currently under digitisation at the Cini Foundation and the Venice Time Machine continuously expanding corpora covering documents from a 1000 years period. The Swiss newspaper archives is sufficiently large to test word embeddings methods like Word2Vec, and study how they perform in diachronic contexts for which words progressively change meanings as language itself evolves. On the Artworks databases we are using convolutional neural networks for finding similarity between paintings, engravings, drawings and sculpture and design architectures for efficiently spotting matching details. Eventually, we combine these two approaches to try to crack one of the hardest problem of the Venice Time Machine: the direct projection of graphical forms in semantic spaces without passing through the currently impractical full textual transcription of the digitised documents.

Watched 619 times.

 Watch