Data Flow

ToolPilot has three primary data paths: a read path for query processing, a write path for data ingestion, and a feedback path that improves results over time.

Read Path (Query Processing)

Every tool discovery request follows this pipeline:

text

Agent ──▶ MCP Server ──▶ Search Package
                              │
          ┌───────────────────┼───────────────────┐
          ▼                   ▼                   ▼
   Stage 1: Retrieve   Stage 2: Filter   Stage 3: Rerank
   (Qdrant BM25 +      (Qdrant payload   (Memgraph Cypher
    dense vectors)       queries)          graph traversal)
          │                   │                   │
          └───────────────────┼───────────────────┘
                              ▼
                       Stage 4: Select
                              │
                              ▼
                      MCP Response ──▶ Agent

Agent sends MCP tool call

The AI agent invokes search_tools (or another MCP tool) over the MCP protocol with a natural-language query and optional filters.

MCP Server validates input

Zod schemas validate every incoming parameter. Malformed requests are rejected before they reach the search layer.

Search package loads tool corpus

The search package receives the validated request and prepares the 4-stage pipeline against the Qdrant collection.

Stage 1 — BM25 + Vector search

Qdrant performs hybrid retrieval: sparse BM25 keyword matching combined with dense 768-dimensional vector similarity (Nomic Embed Code).

Stage 2 — Payload filtering

Qdrant payload queries narrow results by language, category, license, minimum health score, and other metadata constraints.

Stage 3 — Graph reranking

Memgraph Cypher queries traverse RELATED_TO, DEPENDS_ON, and ALTERNATIVE_TO edges to boost contextually relevant tools and demote isolated ones.

Stage 4 — Selection logic

Final scoring, deduplication, and diversity enforcement produce the ranked result set.

Results returned via MCP protocol

The MCP Server formats the results (with health scores, descriptions, and related tools) and streams them back to the agent.

Write Path (Data Ingestion)

The indexer continuously updates the tool corpus:

text

GitHub API ──▶ Indexer ──▶ Redis Streams ──▶ Workers
                                                         │
                              ┌───────────────────────────┤
                              ▼               ▼           ▼
                          Memgraph        Qdrant     PostgreSQL
                         (graph nodes    (vector     (session &
                          & edges)       embeddings)  analytics)

Indexer scans GitHub repositories

On a configurable schedule, the indexer queries the GitHub API for repositories matching tool-related topics and criteria.

Extracts tool metadata

Stars, forks, open issues, last commit date, README content, license, topics, and language data are extracted for each repository.

Queues update jobs via Redis Streams

Each discovered tool is published as a job to a Redis Stream. Consumer groups ensure exactly-once processing across worker replicas.

Workers update Memgraph and Qdrant

Workers create or update Tool, Category, Language, and License nodes in Memgraph, and upsert vector embeddings in Qdrant.

PostgreSQL stores session data

Indexing run metadata, health score history, and analytics events are persisted to PostgreSQL via Prisma for auditing and trend analysis.

Feedback Path

Agent feedback creates a reinforcement loop that improves future recommendations:

text

Agent ──▶ report_outcome ──▶ MCP Server
                                        │
                                        ▼
                              Graph Edge Update
                            (reinforce or attenuate
                              RELATED_TO weights)
                                        │
                                        ▼
                              Future searches return
                              improved recommendations

Agent calls report_outcome

After trying a recommended tool, the agent reports whether it was helpful, unhelpful, or partially useful via the report_outcome MCP tool.

MCP Server processes outcome

The outcome is validated and matched to the original search session so the system knows which recommendations to adjust.

Graph edge weights updated

Positive outcomes reinforce RELATED_TO edge weights (making the connection stronger). Negative outcomes attenuate weights, reducing future co-recommendations.

Future searches use updated weights

Stage 3 (graph reranking) uses the updated edge weights, so the next agent to search gets improved results automatically.

Continuous improvement

The feedback path means ToolPilot gets smarter with every interaction. Positive outcomes reinforce graph edges, while negative outcomes attenuate them — no manual curation required.

Read Path (Query Processing)

Every tool discovery request follows this pipeline:

text

Agent ──▶ MCP Server ──▶ Search Package
                              │
          ┌───────────────────┼───────────────────┐
          ▼                   ▼                   ▼
   Stage 1: Retrieve   Stage 2: Filter   Stage 3: Rerank
   (Qdrant BM25 +      (Qdrant payload   (Memgraph Cypher
    dense vectors)       queries)          graph traversal)
          │                   │                   │
          └───────────────────┼───────────────────┘
                              ▼
                       Stage 4: Select
                              │
                              ▼
                      MCP Response ──▶ Agent

Agent sends MCP tool call

The AI agent invokes search_tools (or another MCP tool) over the MCP protocol with a natural-language query and optional filters.

MCP Server validates input

Zod schemas validate every incoming parameter. Malformed requests are rejected before they reach the search layer.

Search package loads tool corpus

The search package receives the validated request and prepares the 4-stage pipeline against the Qdrant collection.

Stage 1 — BM25 + Vector search

Qdrant performs hybrid retrieval: sparse BM25 keyword matching combined with dense 768-dimensional vector similarity (Nomic Embed Code).

Stage 2 — Payload filtering

Qdrant payload queries narrow results by language, category, license, minimum health score, and other metadata constraints.

Stage 3 — Graph reranking

Memgraph Cypher queries traverse RELATED_TO, DEPENDS_ON, and ALTERNATIVE_TO edges to boost contextually relevant tools and demote isolated ones.

Stage 4 — Selection logic

Final scoring, deduplication, and diversity enforcement produce the ranked result set.

Results returned via MCP protocol

The MCP Server formats the results (with health scores, descriptions, and related tools) and streams them back to the agent.

Write Path (Data Ingestion)

The indexer continuously updates the tool corpus:

text

GitHub API ──▶ Indexer ──▶ Redis Streams ──▶ Workers
                                                         │
                              ┌───────────────────────────┤
                              ▼               ▼           ▼
                          Memgraph        Qdrant     PostgreSQL
                         (graph nodes    (vector     (session &
                          & edges)       embeddings)  analytics)

Indexer scans GitHub repositories

On a configurable schedule, the indexer queries the GitHub API for repositories matching tool-related topics and criteria.

Extracts tool metadata

Stars, forks, open issues, last commit date, README content, license, topics, and language data are extracted for each repository.

Queues update jobs via Redis Streams

Each discovered tool is published as a job to a Redis Stream. Consumer groups ensure exactly-once processing across worker replicas.

Workers update Memgraph and Qdrant

Workers create or update Tool, Category, Language, and License nodes in Memgraph, and upsert vector embeddings in Qdrant.

PostgreSQL stores session data

Indexing run metadata, health score history, and analytics events are persisted to PostgreSQL via Prisma for auditing and trend analysis.

Feedback Path

Agent feedback creates a reinforcement loop that improves future recommendations:

text

Agent ──▶ report_outcome ──▶ MCP Server
                                        │
                                        ▼
                              Graph Edge Update
                            (reinforce or attenuate
                              RELATED_TO weights)
                                        │
                                        ▼
                              Future searches return
                              improved recommendations

Agent calls report_outcome

After trying a recommended tool, the agent reports whether it was helpful, unhelpful, or partially useful via the report_outcome MCP tool.

MCP Server processes outcome

The outcome is validated and matched to the original search session so the system knows which recommendations to adjust.

Graph edge weights updated

Positive outcomes reinforce RELATED_TO edge weights (making the connection stronger). Negative outcomes attenuate weights, reducing future co-recommendations.

Future searches use updated weights

Stage 3 (graph reranking) uses the updated edge weights, so the next agent to search gets improved results automatically.

Continuous improvement

The feedback path means ToolPilot gets smarter with every interaction. Positive outcomes reinforce graph edges, while negative outcomes attenuate them — no manual curation required.

Data Flow

Read Path (Query Processing)

Write Path (Data Ingestion)

Feedback Path

Search ToolCairn

Data Flow

Read Path (Query Processing)

Write Path (Data Ingestion)

Feedback Path