4-Stage Search Pipeline
Every search_tools call flows through a four-stage pipeline that combines text search, vector similarity, graph relationships, and intelligent selection to find the best tool for the job.
Pipeline Overview
┌──────────────────────────────────────────────────────────────┐
│ User Query │
│ "best ORM for TypeScript" │
└──────────────────────┬─────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Stage 1: Hybrid Retrieval │
│ ┌─────────────────┐ ┌──────────────────────┐ │
│ │ BM25 Text │ │ Qdrant Vector │ │
│ │ (in-memory) │ │ (Nomic 768d) │ │
│ └────────┬────────┘ └──────────┬───────────┘ │
│ └────────┬───────────────┘ │
│ ▼ │
│ Reciprocal Rank Fusion (RRF) │
└──────────────────────┬─────────────────────────────────────┘
│ Top N candidates
▼
┌──────────────────────────────────────────────────────────────┐
│ Stage 2: Filter & Narrow │
│ Apply context filters: language, category, license, deploy │
│ Graceful degradation → progressively relax if empty │
└──────────────────────┬─────────────────────────────────────┘
│ Filtered candidates
▼
┌──────────────────────────────────────────────────────────────┐
│ Stage 3: Graph Reranking │
│ Cypher traversal of tool relationships in Memgraph │
│ finalScore = 0.6 × graphScore + 0.4 × stage2Score │
└──────────────────────┬─────────────────────────────────────┘
│ Reranked results
▼
┌──────────────────────────────────────────────────────────────┐
│ Stage 4: Selection │
│ < 20% gap + health split → Two recommendations │
│ Otherwise → Single recommendation │
└──────────────────────┬─────────────────────────────────────┘
│
▼
┌───────────┐
│ Result │
└───────────┘Stage 1: Hybrid Retrieval
The first stage casts a wide net using two complementary retrieval methods running in parallel:
BM25 Text Search
A classical term-frequency/inverse-document-frequency search over tool names, descriptions, and categories. Runs against an in-memory index for sub-millisecond latency. Excels at exact name matches and keyword queries.
Vector Search (Optional)
Semantic similarity search using Nomic Embed Code (768-dimensional embeddings) stored in Qdrant. Captures conceptual meaning — so a query for “database migration tool” matches tools that don’t literally contain those words. Optional because BM25 alone handles many queries well.
Results from both rankers are merged using Reciprocal Rank Fusion (RRF), which combines ranked lists without requiring score normalization:
RRF score for document d:
score(d) = Σ 1 / (k + rank_i(d))
i
Where:
k = 60 (smoothing constant)
i = each ranker (BM25, vector)
rank = position in that ranker's result listIf both BM25 and vector search return empty results, the pipeline falls back to ranking by maintenance_score, ensuring a query always returns something useful.
Stage 2: Filter & Narrow
Stage 2 applies context from the user’s clarification answers (or upfront context filters) to narrow the candidate set. Filters are applied as Qdrant payload filters or in-memory predicates depending on the retrieval path.
1// Context filters applied as Qdrant payload filters
2{
3 "language": "typescript",
4 "category": "orm",
5 "license": "MIT",
6 "deployment": "self-hosted"
7}
8
9// Progressive relaxation order:
10// 1. All filters → results? Done.
11// 2. Drop deployment → results? Done.
12// 3. Drop license → results? Done.
13// 4. Drop category → results? Done.
14// 5. Language only → guaranteed results.
The key design principle is graceful degradation: if the full filter set returns zero results, constraints are progressively relaxed (least important first) until results are found. This prevents dead-end searches while respecting user preferences.
Stage 3: Graph Reranking
This is where ToolPilot’s graph database shines. A Cypher query traverses relationships in Memgraph to compute a graph score for each candidate based on:
- Number and strength of
RELATED_TOedges to other highly-ranked candidates - Ecosystem density — tools in rich ecosystems score higher
- Health tier bonus — actively maintained tools get a slight uplift
finalScore = 0.6 × graphScore + 0.4 × stage2Score
The 60/40 weighting gives graph relationships the majority influence, since ToolPilot’s core value proposition is ecosystem-aware recommendations. The stage2Score acts as a relevance anchor to prevent graph popularity from dominating.
Stage 4: Selection
The final stage decides how many tools to recommend:
Two-Option Recommendation
Triggered when the top 2 candidates are within a 20% score gap AND have a stable/emerging health split (e.g., one established tool and one rising alternative). Gives agents a nuanced choice rather than a false single answer.
Single Recommendation
When one tool is a clear winner (large score gap or similar health profiles), the pipeline returns a single confident recommendation with full context.
Skip Clarification
search_tools call. This sends your query straight through all four stages with pre-applied filters.