Embeddings
Numerical representations of text that capture its meaning. An embedding model converts words, sentences, or entire documents into lists of numbers — called vectors — arranged so that similar meanings end up with similar values. The word "president" and the word "governor," for example, would sit close together in this number space, while "president" and "sandwich" would be far apart.
For data reporters, embeddings unlock a powerful trick: searching by meaning instead of keywords. Say you have thousands of public comments submitted to a federal agency and you want to find every one that discusses water contamination — even if they use phrases like "polluted wells," "unsafe drinking supply," or "toxic runoff." A traditional keyword search would miss many of them, but embeddings let you find documents that are semantically similar to a query, regardless of the exact wording. The same idea powers document classification, clustering similar records, and detecting duplicate filings across large datasets.
Embeddings are a building block behind many AI features journalists encounter. They drive the retrieval step in retrieval-augmented generation (RAG) systems, which help chatbots look up facts before answering so they hallucinate less. They're also how search engines and recommendation algorithms understand that two differently worded queries mean the same thing. Under the hood, models break text into tokens before computing embeddings, so the two concepts are closely related.
Exa crawls the web and encodes the contents of web pages into a format known as embeddings, which can be processed by large language models. Embeddings turn words into numbers in such a way that words with similar meanings become numbers with similar values. In effect, this lets Exa capture the meaning of text on web pages, not just the keywords.— MIT Technology Review
Embedding models translate text inputs like words and phrases into numerical representations, known as embeddings, that capture the semantic meaning of the text. Embeddings are used in a range of applications, such as document retrieval and classification.— TechCrunch