Online Program

Return to main conference page
Friday, May 31
Data Science Techologies
Backend Data Science
Fri, May 31, 3:40 PM - 5:15 PM
Grand Ballroom E

Working with Images and Text in R Through Embeddings (306259)

Jorge Silva, Basilica 
*Michael Lucy, Basilica 

Keywords: Embeddings, transfer learning, feature extraction, high-dimensional data, deep learning, machine learning, image processing, nlp

An embedding maps a discrete object (such as a word, sentence, or image) into a vector space, with the property that semantically similar objects end up close together in the space. This technique, popularized by Word2vec, works as a form of feature extraction. The resulting features can be handled using familiar techniques such as regression and k-means clustering.

In this talk we’ll introduce Basilica, a tool that embeds images and text. We’ll do this by training a logistic regression on an image dataset using Basilica’s R client. We will cover how to turn images into useful features, how to use embeddings with different models, and best practices around how to work with embeddings. We’ll also dive into how different types of neural networks and upstream tasks are used to train modern embeddings.