SPLADE Visualizer (How why SPLADE works)

Sparse lexical expansion in the browser via transformers.js

30

Why SPLADE works

SPLADE comes up constantly in search engineering, usually framed around vocabulary mismatch and query expansion—people say things like "SPLADE is better BM25" or "SPLADE solves the vocabulary mismatch problem." That's true, but it doesn't explain how. The core insight is simple: SPLADE repurposes the masked language modeling objective that BERT was already trained on. By running a forward pass over the query and reading the token-prediction distributions at every position, SPLADE gets contextually-grounded query expansion for free—no fine-tuning required to get the basic behavior.

The rest of this page builds that intuition from the ground up, starting with the problem and ending with the math.

The vocabulary mismatch problem

Imagine a developer documentation search system. A programmer searches for:

"python memory management"

But the relevant documents use more specific terminology:

"garbage collection in CPython"
"reference counting implementation"
"PyObject allocation and deallocation"

BM25 partially works here—it matches on "python" and "memory"—but misses any document that describes the concept without using those exact words. For example, a page about "garbage collection" might be highly relevant to the user's information need, but BM25 won't rank it well because it doesn't contain the token "memory." This is the vocabulary mismatch problem: the words in the query don't exactly match the words in relevant documents, even though they're semantically related. To address this problem, we need to introduce query expansion: augment each query token with semantically related terms so the search can bridge the lexical gap.

QUERY TOKENSNEAREST NEIGHBORS (GloVe cosine similarity)pythonrubyjavaperlmemorystoragecacherecallmanagementcontroloversightadmin

A naive approach: static word embeddings

One way to expand a query is to look up each token in a static word embedding space (like GloVe) and retrieve its nearest neighbors. The problem is that static embeddings have no notion of context—each word gets a single fixed vector regardless of how it's used. The embedding for "python" is a weighted average of every context it appears in across the training corpus: the programming language, and the snake. Depending on the corpus, that can pull in genuinely unhelpful neighbors. Use the word-similarity finder below to see for yourself: click any token in the query and see its nearest neighbors in GloVe space. Notice how the neighbors for "python" largely include other anmials, with a single "donwloadable" neighbor that happens to co-occur in the same contexts as "python" but isn't actually a programming language at all.

Static word embeddings (GloVe)

A naive approach to query expansion uses static word embeddings. Click Load embeddings, then click any token in your query to see its nearest neighbors in GloVe embedding space. Notice how context-free these neighbors are — heart pulls in pain and blood, not myocardial.

Contextual predictions with BERT

BERT was trained to predict masked tokens from surrounding context—the masked language modeling (MLM) objective. This means it doesn't look at "python" in isolation; it sees "python" in the context of "memory management" and predicts accordingly. When the word "python" is masked in "documentation for memory management for the [MASK] programming language," BERT should strongly predict python—and its runner-up predictions will be other programming languages and related terms, not snakes.

Click any token below to mask it and see BERT's top predictions in real time.

Waiting for model…

From BERT predictions to a sparse vector

The demo above runs one forward pass and reads the prediction distribution at a single masked position. SPLADE does something slightly different: it runs one forward pass over the full, unmasked query and reads the logit distribution at every token position simultaneously. For a three-word query, that means three parallel distributions—one for "python," one for "memory," one for "management."

Each raw logit x is then passed through the SPLADE activation function:

score(v, t) = log(1 + ReLU(ht(v)))

where v is a vocabulary token and t is an input token position. ReLU clips negative logits to zero—most vocabulary entries score zero, giving the vector its sparsity. The log(1 + ·) compresses the remaining values so no single high-confidence prediction overwhelms the rest.

After activating all positions, SPLADE takes the maximum across positions for each vocabulary dimension:

SPLADE(v) = maxt[ score(v, t) ]

If "allocation" scores highly under "memory" and modestly under "management," the max-pool preserves the higher value. The result is a single sparse vector that represents the union of all per-position expansions, weighted by BERT's contextual confidence.

BERT ACTIVATIONS — log(1 + ReLU(x)) PER TOKEN POSITIONSPLADE SPARSE VECTOR (max-pool across positions)pythonprogramming3.21language2.84java2.43memoryallocation4.12heap2.91cache2.67managementcontrol3.45handling2.12system1.98max( · )sparse vector

This sparse vector is the SPLADE representation. Its nonzero dimensions are vocabulary tokens that BERT considered relevant at some position in the query, and their weights reflect how strongly. Because the vector is sparse (most entries are zero), it can be stored and queried using a standard inverted index—the same infrastructure as BM25—while carrying the contextual signal of a transformer model. That's the full picture: a single forward pass, a nonlinear activation, and a max-pool, built on top of a model that was already trained to understand context.