Abstract:
|
To sell products, it is critical to understand what you're selling and how each product relates to others. There are many ways of building taxonomies, often manually, but these tend to be difficult and time-consuming. We can to use clustering methods to achieve this goal, but this tends to be easier when we have dense features, as opposed to sparse features often found in e-commerce data sets. At Penguin Random House, we might be interested in automatically identifying which of the books we publish might be considered part of a series (e.g. George R. R. Martin's "A Song of Ice and Fire" novels). How might we automatically identify those if the best data we have are records of which consumers purchased which book, and when? Can we use the order in which readers made purchases to tell us something more about our own books? We will discuss the use of algorithms typically used for word representation (e.g. GloVe) to create product embeddings from the Penguin Random House library based on a "language" of product ID's instead of words. We will visualize these representations and discuss ways to use them to understand the way that consumers purchase and read the books that we publish.
|