Feedback Loop
ToolPilot gets smarter with every interaction. When agents report whether a recommended tool worked, that signal flows back into the graph — reinforcing good recommendations and weakening bad ones.
The Cycle
The feedback loop is a continuous cycle that connects tool usage to graph improvement:
┌──────────────────────┐
│ Agent uses tool │
│ (via search_tools) │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Agent evaluates │
│ tool effectiveness │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ report_outcome() │
│ success | failure │
│ partial │
└──────────┬───────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────────┐ ┌───────────┐ ┌──────────────┐
│ "success" │ │ "partial" │ │ "failure" │
│ +weight │ │ neutral │ │ −weight │
│ reset Δt │ │ log only │ │ attenuate │
└──────┬──────┘ └─────┬─────┘ └──────┬───────┘
│ │ │
└──────────────┼──────────────┘
│
▼
┌─────────────────────┐
│ Graph edges updated │
│ (Memgraph) │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Future searches │
│ improved for ALL │
│ users │
└─────────────────────┘
│
╰──────▶ (cycle repeats)Outcome Types
The report_outcome MCP tool accepts three outcome types, each with different effects on the graph:
✅ Success
The tool worked as expected. This is the strongest positive signal — it reinforces the graph edge between the tool and its related context, resets the decay timer, and adds a small weight boost.
1// report_outcome with "success"
2// → Reinforces the relationship between the recommended tool
3// and the user's context/stack
4
5report_outcome({
6 session_id: "sess_abc123",
7 outcome: "success",
8 tool: "chromadb",
9 context: "Used as embedded vector store in RAG pipeline"
10})
11
12// Effect on graph:
13// 1. RELATED_TO edge weight += 0.05 (capped at 1.0)
14// 2. last_reinforced = now() (resets decay timer)
15// 3. reinforcement_count += 1
16// 4. Outcome logged to analytics
❌ Failure
The tool didn’t work for the intended use case. This attenuates the edge weight by 15%, reducing the likelihood it will be recommended in similar contexts. Repeated failures may trigger a manual review flag.
1// report_outcome with "failure"
2// → Attenuates the relationship, reducing confidence
3
4report_outcome({
5 session_id: "sess_def456",
6 outcome: "failure",
7 tool: "abandoned-db",
8 context: "Incompatible with Node.js 22, no ESM support"
9})
10
11// Effect on graph:
12// 1. RELATED_TO edge weight *= 0.85 (15% reduction)
13// 2. failure_count += 1
14// 3. If failure_count > threshold → flag for review
15// 4. Outcome logged with failure reason
🔶 Partial
The tool partially worked or worked with caveats. This is a neutral signal — the outcome is logged for analytics but causes minimal weight change. Over time, patterns in partial reports can inform health score adjustments.
1// report_outcome with "partial"
2// → Neutral signal — logged but minimal graph impact
3
4report_outcome({
5 session_id: "sess_ghi789",
6 outcome: "partial",
7 tool: "some-orm",
8 context: "Works but missing TypeScript types for v5 API"
9})
10
11// Effect on graph:
12// 1. No weight change
13// 2. Outcome logged for analysis
14// 3. May inform future health score recalculation
Impact Summary
| Outcome | Weight Change | Decay Timer | Analytics |
|---|---|---|---|
| ✅ Success | +0.05 (capped at 1.0) | Reset to now | Logged |
| ❌ Failure | ×0.85 (15% reduction) | Unchanged | Logged + review flag if repeated |
| 🔶 Partial | No change | Unchanged | Logged |
A Self-Improving System
The feedback loop creates a network effect: the more agents use ToolPilot, the better it gets for everyone. Each outcome report is a data point that refines the graph’s understanding of tool relationships. Over time, this produces a recommendation engine that reflects real-world usage patterns rather than static editorial opinions.
Combined with temporal edge decay, the feedback loop ensures that the graph stays both accurate (reflecting current reality) and self-correcting (reducing bad recommendations automatically).
Every Report Matters