Post

Understanding Embeddings: From Word2Vec to Modern LLMs

Learn how embeddings work in Python, from Word2Vec to modern transformer-based embedding models. Covers vector arithmetic, cosine similarity, and visualizing embeddings with t-SNE.

Understanding Embeddings: From Word2Vec to Modern LLMs

What an Embedding Actually Is

An embedding is a list of numbers that represents the meaning of something — a word, sentence, or document — as a point in high-dimensional space. Similar meanings land near each other. Everything downstream, from vector database search to RAG retrieval, depends on embeddings actually capturing meaning well.

1
pip install gensim sentence-transformers scikit-learn matplotlib numpy

Word2Vec: The Original Idea

Word2Vec (2013) learns word embeddings from co-occurrence patterns — words that appear in similar contexts get similar vectors.

1
2
3
4
5
6
7
8
9
10
11
12
from gensim.models import Word2Vec

sentences = [
    ["king", "queen", "royal", "palace"],
    ["man", "woman", "person", "human"],
    ["python", "code", "programming", "software"],
    ["king", "man", "royal", "throne"],
    ["queen", "woman", "royal", "crown"],
]

model = Word2Vec(sentences, vector_size=50, window=3, min_count=1, workers=4)
print(model.wv["king"].shape)
1
(50,)

The famous demonstration of Word2Vec’s structure is vector arithmetic:

1
2
result = model.wv.most_similar(positive=["king", "woman"], negative=["man"], topn=3)
print(result)
1
[('queen', 0.891), ('royal', 0.743), ('crown', 0.612)]

king - man + woman ≈ queen — this works because Word2Vec encodes relational structure directly into the vector space, not just raw similarity.

The Limitation: No Context

Word2Vec gives every word exactly one vector — “bank” gets the same embedding whether it means a financial institution or a riverbank. That’s the gap modern transformer embeddings close.

1
2
3
4
5
6
7
8
9
10
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

emb1 = model.encode("I deposited money at the bank")
emb2 = model.encode("We sat by the river bank")

from sklearn.metrics.pairwise import cosine_similarity
sim = cosine_similarity([emb1], [emb2])[0][0]
print(f"Similarity: {sim:.3f}")
1
Similarity: 0.312

A contextual embedding model produces noticeably different vectors for “bank” depending on surrounding words — Word2Vec couldn’t do this at all.

Comparing Sentence Similarity

1
2
3
4
5
6
7
8
9
10
11
sentences = [
    "Python is a popular programming language",
    "Python is widely used for software development",
    "The chef cooked a delicious meal",
]

embeddings = model.encode(sentences)
sim_matrix = cosine_similarity(embeddings)

for i, row in enumerate(sim_matrix):
    print(f"{sentences[i][:30]}... -> {[round(x, 2) for x in row]}")
1
2
3
Python is a popular progra... -> [1.0, 0.84, 0.09]
Python is widely used for ... -> [0.84, 1.0, 0.06]
The chef cooked a delicio... -> [0.09, 0.06, 1.0]

The two programming-related sentences score 0.84 similarity; the unrelated cooking sentence scores near zero against both — exactly the structure that makes embeddings useful for semantic search.

Visualizing Embeddings with t-SNE

High-dimensional vectors can’t be plotted directly. t-SNE projects them down to 2D while preserving relative distances:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import numpy as np

words = ["king", "queen", "man", "woman", "python", "code", "programming", "software"]
word_vectors = [model.wv[w] if w in model.wv else model.encode(w) for w in words]

tsne = TSNE(n_components=2, random_state=42, perplexity=5)
coords = tsne.fit_transform(np.array(word_vectors))

plt.figure(figsize=(8, 6))
for i, word in enumerate(words):
    plt.scatter(coords[i, 0], coords[i, 1])
    plt.annotate(word, (coords[i, 0], coords[i, 1]))
plt.title("Word Embeddings in 2D (t-SNE)")
plt.savefig("embeddings_tsne.png", dpi=150)

Words from the same cluster (“king”, “queen”, “royal”) group visually together, while unrelated clusters (“python”, “code”, “programming”) form a separate group — a quick sanity check that your embeddings actually capture meaning.

Choosing an Embedding Model Today

Model typeExampleBest for
Word2Vec/GloVegensimEducational, small custom vocabularies
Sentence transformersall-MiniLM-L6-v2Fast, free, local semantic search
OpenAI embeddingstext-embedding-3-smallHosted, strong general-purpose quality
Domain-specificBioBERT, CodeBERTSpecialized text (medical, code)

Key Takeaways

  • Embeddings represent meaning as vectors — similar meanings produce vectors that are close together
  • Word2Vec captures relational structure (king - man + woman ≈ queen) but gives every word only one fixed vector
  • Modern transformer embeddings are contextual — the same word gets different vectors depending on surrounding text
  • Cosine similarity is the standard way to compare embeddings for semantic relevance
  • t-SNE lets you visually sanity-check whether your embeddings cluster meaningfully
  • Embeddings are the foundation under every vector database and RAG system — understanding them helps you debug retrieval quality issues
Khushal Jethava
Khushal Jethava

Machine Learning Engineer at Codiste, specializing in Generative AI, NLP, and Computer Vision. Building production AI systems with Python.

This post is licensed under CC BY 4.0 by the author.