Data Flow
ToolPilot has three primary data paths: a read path for query processing, a write path for data ingestion, and a feedback path that improves results over time.
Read Path (Query Processing)
Every tool discovery request follows this pipeline:
Agent ──▶ MCP Server ──▶ Search Package
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
Stage 1: Retrieve Stage 2: Filter Stage 3: Rerank
(Qdrant BM25 + (Qdrant payload (Memgraph Cypher
dense vectors) queries) graph traversal)
│ │ │
└───────────────────┼───────────────────┘
▼
Stage 4: Select
│
▼
MCP Response ──▶ AgentAgent sends MCP tool call
The AI agent invokes search_tools (or another MCP tool) over the MCP protocol with a natural-language query and optional filters.
MCP Server validates input
Zod schemas validate every incoming parameter. Malformed requests are rejected before they reach the search layer.
Search package loads tool corpus
The search package receives the validated request and prepares the 4-stage pipeline against the Qdrant collection.
Stage 1 — BM25 + Vector search
Qdrant performs hybrid retrieval: sparse BM25 keyword matching combined with dense 768-dimensional vector similarity (Nomic Embed Code).
Stage 2 — Payload filtering
Qdrant payload queries narrow results by language, category, license, minimum health score, and other metadata constraints.
Stage 3 — Graph reranking
Memgraph Cypher queries traverse RELATED_TO, DEPENDS_ON, and ALTERNATIVE_TO edges to boost contextually relevant tools and demote isolated ones.
Stage 4 — Selection logic
Final scoring, deduplication, and diversity enforcement produce the ranked result set.
Results returned via MCP protocol
The MCP Server formats the results (with health scores, descriptions, and related tools) and streams them back to the agent.
Write Path (Data Ingestion)
The indexer continuously updates the tool corpus:
GitHub API ──▶ Indexer ──▶ Redis Streams ──▶ Workers
│
┌───────────────────────────┤
▼ ▼ ▼
Memgraph Qdrant PostgreSQL
(graph nodes (vector (session &
& edges) embeddings) analytics)Indexer scans GitHub repositories
On a configurable schedule, the indexer queries the GitHub API for repositories matching tool-related topics and criteria.
Extracts tool metadata
Stars, forks, open issues, last commit date, README content, license, topics, and language data are extracted for each repository.
Queues update jobs via Redis Streams
Each discovered tool is published as a job to a Redis Stream. Consumer groups ensure exactly-once processing across worker replicas.
Workers update Memgraph and Qdrant
Workers create or update Tool, Category, Language, and License nodes in Memgraph, and upsert vector embeddings in Qdrant.
PostgreSQL stores session data
Indexing run metadata, health score history, and analytics events are persisted to PostgreSQL via Prisma for auditing and trend analysis.
Feedback Path
Agent feedback creates a reinforcement loop that improves future recommendations:
Agent ──▶ report_outcome ──▶ MCP Server
│
▼
Graph Edge Update
(reinforce or attenuate
RELATED_TO weights)
│
▼
Future searches return
improved recommendationsAgent calls report_outcome
After trying a recommended tool, the agent reports whether it was helpful, unhelpful, or partially useful via the report_outcome MCP tool.
MCP Server processes outcome
The outcome is validated and matched to the original search session so the system knows which recommendations to adjust.
Graph edge weights updated
Positive outcomes reinforce RELATED_TO edge weights (making the connection stronger). Negative outcomes attenuate weights, reducing future co-recommendations.
Future searches use updated weights
Stage 3 (graph reranking) uses the updated edge weights, so the next agent to search gets improved results automatically.
Continuous improvement