Keywords: Embeddings, transfer learning, feature extraction, high-dimensional data, deep learning, machine learning, image processing, nlp
An embedding maps a discrete object (such as a word, sentence, or image) into a vector space, with the property that semantically similar objects end up close together in the space. This technique, popularized by Word2vec, works as a form of feature extraction. The resulting features can be handled using familiar techniques such as regression and k-means clustering.
In this talk we’ll introduce Basilica, a tool that embeds images and text. We’ll do this by training a logistic regression on an image dataset using Basilica’s R client. We will cover how to turn images into useful features, how to use embeddings with different models, and best practices around how to work with embeddings. We’ll also dive into how different types of neural networks and upstream tasks are used to train modern embeddings.