Semantic Score
Cosine similarity between the query embedding and the document embedding.
sem = cos(E(query), E(doc)) TF-IDF Score
Cosine similarity on TF-IDF vectors. Works well with exact token matches such as EPD IDs, SKUs, manufacturer names.
tfidf = cos(TF-IDF(query), TF-IDF(doc))αHybrid Score
Weighted combination of semantic and TF-IDF. The weight α is auto-detected from query type:
hybrid = α · sem + (1 - α) · tfidf- α = 0.0 if EPD/SKU queries (pure keyword)
- α = 0.3 if numeric specs (e.g. "12mm", "40kg")
- α = 0.6 if natural language (balanced)
Exact Match
When the query contains a structured identifier (EPD ID or SKU) that matches a product verbatim, that product is returned immediately with α = 1.0.
Rerank Score
Cross-encoder relevance score. Unlike bi-encoders, the cross-encoder attends jointly over (query, document), i.e., it sees both texts simultaneously, enabling fine-grained comparison (e.g. 1960 vs 1770 MPa).
rerank = CrossEncoder(query, doc)Raw logit from ms-marco-MiniLM-L-6-v2. Higher is more relevant. Applied to the top-20 hybrid candidates from embedding models, then the top-5 are returned from the reranker.
Multilingual
Select the multilingual-MiniLM model to search in 50+ languages. It maps queries like "acier" (FR) or "Stahl" (DE) into the same embedding space as English documents.
When active, α is set to 0.85 so semantic similarity dominates because TF-IDF can't match cross-language tokens.
Example queries:
"acier 12mm" → steel cables
"beton C30" → cement products
"isolierung 80mm" → insulation
"bois PEFC" → timber
Enriched Indexing
Products are indexed with enriched text: tail descriptions stripped, English category synonyms injected from SKU prefix (e.g. CEM → cement, concrete). This means a query for "cement" matches all 20 cement products, not just the ~5 that literally contain the word.