utilizing vector embeddings across your platform

original slides

--------------------------------

overview of vector representations

humans have learned and continue to learn ways of representing things we observe in the world, as sets of numerical values

cuda sm

--------------------------------

foundation text embedding models

2024:

Out:one hot encoding

In:Large embedding models

--------------------------------

embedding your content

--------------------------------

content based recommendations

recommendation systems are often an undervalued application of ml but vital to some of the biggest companies you know netflix, spotify, google, amazon

a rather simple yet effective recoomendation strategy is to use the embedding of users past vists, and show them similar content based on the cosine similarity of the embeddings

collaborative filtering recommendations

similarities between users and items simultaneously to provide recommendations

collaborative filtering models can recommend an item to user A based on the interests of a similar user B

the hard part is finding a way to represent user A and B in ways that allow for measuring the similarities between users, often not being explicit

--------------------------------

retrieval augmented generation

given your content now represented as vectors, this enables us to do search over content during conversational settings to incorporate additional context

notice a trend yet? the same vectors can be used for content recommendations, site search, retrieval augmented generation, etc

--------------------------------

optimizations

your data

just like all ml systems, downstream performance is still dependent on the quality of your data

in this case, some thought should be given to what you are embedding on both ends of the comparison

either preprocessing or post processing can be used to handle imperfect data

your data in this case being both the stored content vectors as well as the vectorized user query

pre processing

preprocess your data by cleaning or formatting

also developing a good chunking strategy to incorporate as much context in chunks while keeping some token limit

formatting is your friend ex. markdown, html

ex 1.

ex 2.

if i gave you some 'context' but it was just the header of some seciton of content, how useful would that be? if i gave you some random paragraph without knowing the section it came from, the might be useful

but if i gave you a paragraph and told you what section it came from, much more helpful in terms of context

post processing

you cant expect users to provide perfect search queries, but you can absolutely alter their query to something that will return better results

"You are a rewriting assistant tasked with taking some user query and their current conversation, and returning an optimized search query that will be used to retrieve useful information relevant to their query"

reranking

embedding models are trained on large datasets encouraging generalization of similarity no matter the domain

generalization also leads to non-optimal ranks while still being close

it is very likely the 2 best documents for your search are within the top 10-25 but may not actually be the 1 and 2 ranked items from initial search

to further improve, incorporate a reranking model that considers something like 25 candidate documents from initial search, and reranks those to determine top 2

use some fast generalized model to search 10000s of documents and narrow down to 25 candidate documents, then use a more performant but slower model on those 25 to get final 2-5 items

cuda sm

--------------------------------

directory

--------------------------------