How AI Agents Collaborate: A Framework Architecture

February 9, 2026

Note: This article is part of an ongoing AI-assisted development series (/ai). In keeping with the subject matter, all the code for this system was written by Claude Opus 4.6 while I provided the architectural direction and workflow design.

OpenClaw exploded into the scene last week and people immediately started building personal AI assistants that could clean 20K Gmail messages, run autonomous $10K trading systems, post across platforms without human input. Within days, thousands of single agents were running on machines around the world, each doing autonomous work.

The natural next question for me became whether agents could work as a team instead of alone. Could you chat with specialized teammates who each bring different expertise, collaborate on tasks, remember project context, and get real work done? This framework explores multi-agent collaboration through that lens.

The goal was to build a framework where you can chat with different agents who bring different perspectives and expertise and can do things like write code, deploy applications, create tests, design interfaces, and generate images, all while understanding your project context and remembering past conversations.

The framework addresses six core challenges:

  • Intelligent routing that selects the right agent based on expertise, conversation context, and team dynamics
  • Shared team awareness where agents understand current topics, recent decisions, open questions, and who’s actively working on what
  • Persistent memory that survives sessions through three layers: working (current conversation), episodic (session summaries), and semantic (extracted facts with embeddings)
  • Context assembly from several sources including conversation history, semantic search over facts, knowledge base retrieval, and real-time team state
  • Tool integration via MCP servers that agents discover at runtime rather than hardcoded capabilities
  • Agent handoffs that enable delegation mid-turn with full context transfer and loop prevention

Core Principles

The framework operates on seven design decisions that emerged from trying different approaches and keeping what worked:

Per-agent channel identities: Each agent runs as its own bot with unique tokens and avatars on communication platforms. Users see different people responding in group chats rather than a single bot switching hats, creating natural team dynamics through distinct personalities.

Three-layer memory: Working memory handles current conversation with fast access to recent messages. Episodic memory compacts finished sessions into LLM-generated summaries. Semantic memory extracts facts during conversations and stores them with embeddings for hybrid search combining vector similarity with keyword matching.

Knowledge base with dual retrieval: Small files like style guides inject directly into system prompts. Large knowledge bases (documentation, code, external sources) chunk at roughly 500 tokens, embed with text-embedding-3-small, store in pgvector, and retrieve by cosine similarity to user messages.

Team state as first-class data: Current topic, recent decisions, open questions, and key insights get tracked in real-time team context that agents reference before responding, enabling coordination without constant LLM analysis of full conversation history.

Smart routing with caching: Pattern matching handles explicit cases (DMs, mentions, continuity) before falling back to LLM analysis with Claude Haiku. Routing decisions are cached for sixty seconds, preventing redundant calls when discussing the same topic.

Hot-reload configuration: Agent definitions, MCP servers, and channels live in YAML files that load into the database. Changes propagate via NOTIFY triggers within a second across all running instances, enabling personality updates, tool additions, and configuration changes without restarts.

MCP for tool integration: External capabilities come through Model Context Protocol servers discovered at runtime. GitHub operations, web search, Notion access, and Linear issue tracking all connect via MCP rather than hardcoded API clients, making the system extensible without framework changes.


System Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Communication Channels                     β”‚
β”‚              (Telegram β€’ Discord β€’ Slack β€’ API)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              GroupCoordinator + TeamRouter                    β”‚
β”‚                                                               β”‚
β”‚  β€’ Pattern matching (mentions, continuity, DMs)               β”‚
β”‚  β€’ LLM routing (expertise, keywords, capabilities)            β”‚
β”‚  β€’ Multi-responder support (primary + secondaries)            β”‚
β”‚  β€’ Decision caching (60s TTL)                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Selected Agent(s)                         β”‚
β”‚                                                              β”‚
β”‚  BaseAgent.process_with_tools() β€” agentic loop               β”‚
β”‚  β€’ Build system prompt (multiple context sources)            β”‚
β”‚  β€’ Call LLM with tools                                       β”‚
β”‚  β€’ Execute tools via MCP                                     β”‚
β”‚  β€’ Delegate via handoff_to_agent                             β”‚
β”‚  β€’ Return response (max 10 tool iterations)                  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚        β”‚        β”‚        β”‚        β”‚
       β–Ό        β–Ό        β–Ό        β–Ό        β–Ό
   β”Œβ”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”
   β”‚ LLM β”‚  β”‚ MCP β”‚  β”‚Memoryβ”‚ β”‚Know- β”‚ β”‚ Team β”‚
   β”‚Multiβ”‚  β”‚Toolsβ”‚  β”‚3-    β”‚ β”‚ledge β”‚ β”‚State β”‚
   β”‚Prov.β”‚  β”‚     β”‚  β”‚Layer β”‚ β”‚Base  β”‚ β”‚      β”‚
   β””β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜
       β”‚        β”‚        β”‚        β”‚        β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           PostgreSQL + pgvector + Redis                       β”‚
β”‚           ConfigCache: hot-reload via NOTIFY                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The system connects users through communication channels to specialized agents where each agent maintains its own identity on the platform. When messages arrive, routing decides who responds based on patterns or LLM analysis. That agent assembles context from multiple sources, processes through an agentic loop with tool access, and returns responses while updating shared team state for coordination.

Data is stored in PostgreSQL; pgvector enables semantic search over facts and knowledge chunks. Redis handles distributed caching and cross-instance invalidation. Configuration lives in YAML but loads into the database for hot-reload capability through PostgreSQL NOTIFY triggers.


Message Flow: From User to Response

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. Message Arrives                                            β”‚
β”‚    All agent bots receive (same channel)                      β”‚
β”‚    First bot triggers routing                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 2. Routing Decision                                            β”‚
β”‚                                                                β”‚
β”‚    Pattern Match (fast)          LLM Analysis (expensive)      β”‚
β”‚    β€’ Direct messages        β†’    β€’ Load last 10 messages       β”‚
β”‚    β€’ @mentions              β†’    β€’ Get agent profiles          β”‚
β”‚    β€’ Continuity (120s)      β†’    β€’ Call Claude Haiku           β”‚
β”‚                                  β€’ Return primary + secondary  β”‚
β”‚    Cache decision (60s TTL)                                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 3. Context Assembly (9 sources)                                β”‚
β”‚                                                                β”‚
β”‚    Static (system prompt):       Dynamic (per-turn):           β”‚
β”‚    β€’ Communication style         β€’ Conversation (20 msgs)      β”‚
β”‚    β€’ Personality prompt          β€’ Memory (hybrid search)      β”‚
β”‚    β€’ Context files               β€’ Knowledge (embeddings)      β”‚
β”‚    β€’ Skills                      β€’ Team state (live)           β”‚
β”‚    β€’ Team descriptions                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 4. Agent Processing Loop (max 10 iterations)                   β”‚
β”‚                                                                β”‚
β”‚    LLM Call β†’ Tool Use? ──Yes──→ Execute via MCP ───┐          β”‚
β”‚         β”‚                                           β”‚          β”‚
β”‚         No                                          β”‚          β”‚
β”‚         β”‚                                           β”‚          β”‚
β”‚         ↓                                           β”‚          β”‚
β”‚    Text Response β†β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚                                                                β”‚
β”‚    Special: handoff_to_agent β†’ Build context β†’ Delegate        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 5. Post-Response Updates (async)                               β”‚
β”‚                                                                β”‚
β”‚    β€’ Save to conversation history                              β”‚
β”‚    β€’ Extract facts (every 5 messages)                          β”‚
β”‚    β€’ Update team state (speaker, energy)                       β”‚
β”‚    β€’ Extract insights (decisions, questions)                   β”‚ 
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

All agent bots receive messages because they’re in the same channel, with the first to arrive triggering routing analysis. GroupCoordinator runs through patterns (DMs, mentions, continuity) before falling back to LLM analysis, caching routing decisions for sixty seconds to avoid redundant calls on the same topic.

The selected agent builds context from several sources split into static (system prompt) and dynamic (per-turn) categories. It processes through a tool-use loop that can execute MCP tools, delegate via handoffs, and maintain multi-turn reasoning. After responding, background tasks update memory and team state asynchronously to keep response times fast.


Context Sources: Static vs Dynamic

Static Context (baked into system prompt) Dynamic Context (assembled per-turn)
Communication style guide Conversation history (last 20 messages)
Personality prompt Memory facts (hybrid search: 70% vector, 30% keyword)
Context files (always-inject) Knowledge chunks (pgvector similarity)
Skills (trigger patterns, instructions) Team state (topic, decisions, questions, insights)
Team descriptions (relationships)

Static context is built once at initialization. Dynamic context is assembled on each turn, which keeps agents current without rebuilding the whole prompt every time.


Routing: Pattern Match to LLM Analysis

Message arrives
     β”‚
     β”œβ”€β†’ DM? ──────────────────────→ Always respond
     β”‚
     β”œβ”€β†’ @mention? ────────────────→ Named agent responds
     β”‚
     β”œβ”€β†’ Same agent spoke <120s? ──→ Continue conversation
     β”‚
     └─→ No pattern match
          β”‚
          β–Ό
     TeamRouter.analyze_message()
          β”‚
          β”œβ”€β†’ Load last 10 messages
          β”œβ”€β†’ Get agent profiles (role, expertise, keywords, capabilities)
          β”œβ”€β†’ Prompt Claude Haiku
          β”‚
          β–Ό
     Returns: primary (confidence, reason)
              + secondary[] (optional, if enabled & confidence >0.5)
          β”‚
          β–Ό
     Cache decision (60s)
     Share with all bots

Routing starts with cheap pattern matching where most interactions hit these fast paths. When patterns miss, TeamRouter calls Claude Haiku with full context: message, conversation history, and detailed agent profiles including expertise areas, trigger keywords, capabilities, and personality hints.

The LLM returns structured output with primary responder, confidence score, reasoning, and optionally one or two secondary responders. Primary agents respond immediately while secondary agents wait 2 to 6 seconds, validate relevance, then respond if the context still makes sense.


Team Awareness: Shared State

The team_context table maintains real-time state per chat:

Field Purpose
current_topic What the team discusses
working_on Active work items
recent_decisions Choices made (with attribution)
open_questions Unresolved items
key_insights Important observations
last_speaker Who spoke last
consecutive_turns Same agent turn count
energy_level Conversation intensity (low/normal/high/heated)

After each response, the system extracts insights:

Decisions, detected via markers such as “let’s”, “we decided”, “I recommend”
Questions, substantive ones (>10 chars, filtering trivial “ok?”)
Insights, the first meaningful sentence from each response

These get stored in agent_insights and team_context tables, making them visible to all agents in subsequent turns.


Memory: Three Layers

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ WORKING MEMORY                                              β”‚
β”‚ β€’ Current conversation (last 20 messages)                   β”‚
β”‚ β€’ Chat-scoped (groups) or user-scoped (DMs)                 β”‚
β”‚ β€’ Table: conversations, conversation_messages               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                    ↓ Session timeout (30min)                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ EPISODIC MEMORY                                             β”‚
β”‚ β€’ LLM-generated session summaries                           β”‚
β”‚ β€’ Key decisions, unresolved items, major topics             β”‚
β”‚ β€’ Table: memory_sessions                                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                    ↓ Facts extracted (every 5 msgs)         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ SEMANTIC MEMORY                                             β”‚
β”‚ β€’ Extracted facts with embeddings                           β”‚
β”‚ β€’ Categories: preference, decision, knowledge, task         β”‚
β”‚ β€’ Hybrid search: 70% vector + 30% keyword                   β”‚
β”‚ β€’ Table: memory_facts (pgvector)                            β”‚
β”‚ β€’ Limit: 500 facts/user (prune old, low-importance)         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Working memory handles the current session with fast access to recent messages. When sessions time out, working memory compacts into episodic summaries via gpt-4o-mini, capturing key decisions and unresolved items.

Semantic memory extracts facts during conversations, running every five messages and requesting JSON output with fact text, category, and confidence. Facts get embedded and stored with pgvector while retrieval uses hybrid search combining vector similarity with keyword matching.

The system enforces limits by pruning older, lower-importance facts when hitting max capacity per user (500 by default), keeping memory focused on recent, important, frequently accessed information.


Knowledge Base: Injection vs Retrieval

Small files (style guides, project rules) get injected directly into system prompts while large knowledge bases (documentation, code) chunk at roughly 500 tokens, embed with text-embedding-3-small, store in pgvector, and retrieve by cosine similarity.

External Sources (GitHub, Notion, GDrive)
     β”‚
     β”œβ”€β†’ MCP Server connects
     β”œβ”€β†’ Enumerate items (files, pages, docs)
     β”œβ”€β†’ Fetch content
     β”œβ”€β†’ Incremental sync (compare SHA/timestamps)
     β”‚
     β–Ό
Content chunking (~500 tokens)
     β”‚
     β”œβ”€β†’ Paragraph boundaries first
     β”œβ”€β†’ Sentence boundaries if needed
     β”œβ”€β†’ ~50 token overlap between chunks
     β”‚
     β–Ό
Embedding generation (text-embedding-3-small, 1536d)
     β”‚
     β”œβ”€β†’ Redis cache (1hr TTL)
     β”œβ”€β†’ Batch processing
     β”‚
     β–Ό
Store in context_files
     β”‚
     β”œβ”€β†’ pgvector column
     β”œβ”€β†’ IVFFlat index
     β”œβ”€β†’ External metadata (source, ID, URL, sync timestamp)
     β”‚
     β–Ό
Query-time retrieval
     β”‚
     β”œβ”€β†’ Generate query embedding
     β”œβ”€β†’ Cosine similarity search (1 - embedding <=> query)
     β”œβ”€β†’ Filter: active, agent match, min_similarity=0.2
     └─→ Return top 5 chunks with relevance scores

External knowledge syncing connects to MCP servers for GitHub, Notion, and Google Drive where the system enumerates items, fetches content, and runs incremental sync by comparing commit SHAs or timestamps with each sync run tracked for auditing.


Agent System: Processing Loop

Agent.process_with_tools(context)
     β”‚
     β”œβ”€β†’ Build system prompt
     β”‚   └─→ _build_mcp_system_prompt()
     β”‚       β”œβ”€β†’ Communication style
     β”‚       β”œβ”€β†’ Personality
     β”‚       β”œβ”€β†’ Context files
     β”‚       β”œβ”€β†’ Skills
     β”‚       └─→ Team descriptions
     β”‚
     β”œβ”€β†’ Inject dynamic context
     β”‚   β”œβ”€β†’ Team state
     β”‚   β”œβ”€β†’ Knowledge chunks
     β”‚   └─→ Memory facts
     β”‚
     β”œβ”€β†’ Load tools
     β”‚   β”œβ”€β†’ MCP tools (filtered by agent)
     β”‚   └─→ Virtual: handoff_to_agent
     β”‚
     β”œβ”€β†’ Call LLM
     β”‚   β”‚
     β”‚   β”œβ”€β†’ Text? ──────────→ Return final response
     β”‚   β”‚
     β”‚   └─→ Tool use?
     β”‚       β”‚
     β”‚       β”œβ”€β†’ Approval needed? β†’ Request user approval
     β”‚       β”œβ”€β†’ Handoff? β†’ Build context β†’ Delegate
     β”‚       └─→ Execute via MCP β†’ Add result β†’ Loop
     β”‚
     └─→ Max 10 iterations

The agent builds a system prompt from multiple sources, injects dynamic context layers, and loads available tools including the virtual handoff_to_agent. It enters a loop that calls the LLM, checks for tool use, executes tools via MCP, and repeats up to ten times.

Handoffs use the same tool mechanism but get intercepted before MCP where the system validates the target agent, builds handoff context with reason and team state, and hands control to GroupCoordinator. Loop prevention tracks chains with max depth of three.


MCP Integration: Tool Discovery and Execution

MCPManager startup
     β”‚
     β”œβ”€β†’ Read active servers from ConfigCache
     β”œβ”€β†’ Resolve env vars (${VAR} pattern)
     β”œβ”€β†’ Create stdio transport (subprocess)
     β”œβ”€β†’ Initialize MCPClientSession
     β”œβ”€β†’ Discover tools (list_tools)
     └─→ Mark connection status in DB
Tool execution flow
     β”‚
     β”œβ”€β†’ Agent calls tool: "github__create_issue"
     β”œβ”€β†’ Parse prefix: server="github", tool="create_issue"
     β”œβ”€β†’ Lookup session
     β”œβ”€β†’ Execute via MCP
     β”œβ”€β†’ Return JSON result
     └─→ Format for LLM conversation
Self-trigger guard
     β”‚
     β”œβ”€β†’ Connection writes to DB
     β”œβ”€β†’ Triggers PostgreSQL NOTIFY
     β”œβ”€β†’ ConfigCache receives event
     β”œβ”€β†’ Could trigger reconnect β†’ LOOP
     β”‚
     └─→ _connecting set prevents this
         β”œβ”€β†’ Check if server in set
         β”œβ”€β†’ Add name β†’ Connect β†’ Remove
         └─→ Skip if already connecting

MCPManager connects to servers via stdio transport, discovers tools, and caches them with server prefixes where tool names like “github__create_issue” prevent collisions across servers. The available_to_agents field filters which agents see which tools.

Self-trigger guard prevents infinite reconnection loops. When connecting writes to the database, it triggers NOTIFY events that ConfigCache receives, with the _connecting set tracking in-progress connections to break the cycle.


Trade-offs and Practical Considerations

Three-layer memory increases storage and retrieval costs where every five messages triggers fact extraction via LLM and every query generates an embedding plus runs vector similarity, with the payoff being agents that remember preferences without needing full conversation history.

Per-agent bots create better user experience at the cost of managing multiple tokens where each agent needs registration with the platform and all bots receive all messages even if only one responds, using bandwidth but simplifying coordination.

Multi-responder support enables richer interactions but complicates timing where secondary responses wait several seconds to validate relevance and rapid messaging cancels many secondary responses.

Knowledge base embeddings scale to millions of chunks via pgvector indexes but introduce latency where every query generates an embedding via API call (cached in Redis when available) and large knowledge bases may need tuning of similarity thresholds and result counts.


Early Stage

This framework is constantly evolving and is not ready to be published. Building this architecture has been a continuous learning process, exploring different approaches to memory, coordination, and knowledge management. More updates will come as the framework develops and new patterns emerge from actual use.


Screenshots

Onboarding Wizard

The onboarding wizard guides users through initial setup and agent configuration

Screenshot 1

Agents know about each other

Screenshot 2

Awareness of code and GitHub MCP in action

Screenshot 3

Deep dive into codebase and research using Tavily MCP

Screenshot 4

Ability to create tickets and continue discussion there

Screenshot 6

MCP tool integration and execution


For your LLM ;)

2026-02-09 17:34:30,450 - src.bot.manager - INFO - [jange] πŸ“© Message from Prashish | Game Machine Labs in supergroup: can you tell me what we have discussed in the past...
2026-02-09 17:34:30,450 - src.bot.handlers - INFO - [jange] Received message from user 485126821 (group=True): can you tell me what we have discussed in the past?
2026-02-09 17:34:30,450 - src.bot.handlers - INFO - [jange] πŸ” Checking smart routing...
2026-02-09 17:34:30,454 - src.bot.manager - INFO - [anita] πŸ“© Message from Prashish | Game Machine Labs in supergroup: can you tell me what we have discussed in the past...
2026-02-09 17:34:30,454 - src.bot.handlers - INFO - [anita] Received message from user 485126821 (group=True): can you tell me what we have discussed in the past?
2026-02-09 17:34:30,454 - src.bot.handlers - INFO - [anita] πŸ” Checking smart routing...
2026-02-09 17:34:30,459 - src.bot.handlers - INFO - [anita] πŸ€– Calling AI router...
2026-02-09 17:34:30,459 - src.bot.manager - INFO - [anita] πŸ” should_respond_smart called for: 'can you tell me what we have discussed in the past...'
2026-02-09 17:34:30,459 - src.bot.manager - INFO - [anita] πŸ€– Getting team router...
2026-02-09 17:34:30,459 - src.bot.manager - INFO - [anita] πŸ“¨ Calling router.analyze_message...
2026-02-09 17:34:30,459 - src.bot.team_router - INFO - πŸ“¨ analyze_message called: chat=-1003716043040, msg='can you tell me what we have discussed in the past...'
2026-02-09 17:34:30,459 - src.bot.team_router - INFO - πŸ€– Calling AI for routing analysis...
2026-02-09 17:34:30,459 - src.bot.team_router - INFO - 🧠 Starting AI routing analysis for: can you tell me what we have discussed in the past...
2026-02-09 17:34:30,459 - src.bot.team_router - INFO - πŸ”„ Calling Anthropic API with model=claude-3-5-haiku-20241022, timeout=30s
2026-02-09 17:34:30,462 - src.bot.handlers - INFO - [jange] πŸ€– Calling AI router...
2026-02-09 17:34:30,462 - src.bot.manager - INFO - [jange] πŸ” should_respond_smart called for: 'can you tell me what we have discussed in the past...'
2026-02-09 17:34:30,462 - src.bot.manager - INFO - [jange] πŸ€– Getting team router...
2026-02-09 17:34:30,462 - src.bot.manager - INFO - [jange] πŸ“¨ Calling router.analyze_message...
2026-02-09 17:34:30,462 - src.bot.team_router - INFO - πŸ“¨ analyze_message called: chat=-1003716043040, msg='can you tell me what we have discussed in the past...'
2026-02-09 17:34:30,462 - src.bot.team_router - INFO - ⏳ Another bot is already analyzing this message, waiting...
2026-02-09 17:34:30,465 - src.bot.manager - INFO - [pakka] πŸ“© Message from Prashish | Game Machine Labs in supergroup: can you tell me what we have discussed in the past...
2026-02-09 17:34:30,465 - src.bot.handlers - INFO - [pakka] Received message from user 485126821 (group=True): can you tell me what we have discussed in the past?
2026-02-09 17:34:30,465 - src.bot.handlers - INFO - [pakka] πŸ” Checking smart routing...
2026-02-09 17:34:30,469 - src.bot.manager - INFO - [buddhi] πŸ“© Message from Prashish | Game Machine Labs in supergroup: can you tell me what we have discussed in the past...
2026-02-09 17:34:30,469 - src.bot.handlers - INFO - [buddhi] Received message from user 485126821 (group=True): can you tell me what we have discussed in the past?
2026-02-09 17:34:30,469 - src.bot.handlers - INFO - [buddhi] πŸ” Checking smart routing...
2026-02-09 17:34:30,470 - src.bot.handlers - INFO - [pakka] πŸ€– Calling AI router...
2026-02-09 17:34:30,470 - src.bot.manager - INFO - [pakka] πŸ” should_respond_smart called for: 'can you tell me what we have discussed in the past...'
2026-02-09 17:34:30,470 - src.bot.manager - INFO - [pakka] πŸ€– Getting team router...
2026-02-09 17:34:30,470 - src.bot.manager - INFO - [pakka] πŸ“¨ Calling router.analyze_message...
2026-02-09 17:34:30,470 - src.bot.team_router - INFO - πŸ“¨ analyze_message called: chat=-1003716043040, msg='can you tell me what we have discussed in the past...'
2026-02-09 17:34:30,470 - src.bot.team_router - INFO - ⏳ Another bot is already analyzing this message, waiting...
2026-02-09 17:34:30,473 - src.bot.handlers - INFO - [buddhi] πŸ€– Calling AI router...
2026-02-09 17:34:30,473 - src.bot.manager - INFO - [buddhi] πŸ” should_respond_smart called for: 'can you tell me what we have discussed in the past...'
2026-02-09 17:34:30,473 - src.bot.manager - INFO - [buddhi] πŸ€– Getting team router...
2026-02-09 17:34:30,473 - src.bot.manager - INFO - [buddhi] πŸ“¨ Calling router.analyze_message...
2026-02-09 17:34:30,473 - src.bot.team_router - INFO - πŸ“¨ analyze_message called: chat=-1003716043040, msg='can you tell me what we have discussed in the past...'
2026-02-09 17:34:30,473 - src.bot.team_router - INFO - ⏳ Another bot is already analyzing this message, waiting...
2026-02-09 17:34:30,485 - src.bot.manager - INFO - [kalpana] πŸ“© Message from Prashish | Game Machine Labs in supergroup: can you tell me what we have discussed in the past...
2026-02-09 17:34:30,485 - src.bot.handlers - INFO - [kalpana] Received message from user 485126821 (group=True): can you tell me what we have discussed in the past?
2026-02-09 17:34:30,486 - src.bot.handlers - INFO - [kalpana] πŸ” Checking smart routing...
2026-02-09 17:34:30,489 - src.bot.handlers - INFO - [kalpana] πŸ€– Calling AI router...
2026-02-09 17:34:30,489 - src.bot.manager - INFO - [kalpana] πŸ” should_respond_smart called for: 'can you tell me what we have discussed in the past...'
2026-02-09 17:34:30,489 - src.bot.manager - INFO - [kalpana] πŸ€– Getting team router...
2026-02-09 17:34:30,489 - src.bot.manager - INFO - [kalpana] πŸ“¨ Calling router.analyze_message...
2026-02-09 17:34:33,770 - src.memory.service - INFO - Retrieved 0 facts and 0 sessions for user 485126821
2026-02-09 17:34:30,489 - src.bot.team_router - INFO - πŸ“¨ analyze_message called: chat=-1003716043040, msg='can you tell me what we have discussed in the past...'
2026-02-09 17:34:30,489 - src.bot.team_router - INFO - ⏳ Another bot is already analyzing this message, waiting...
2026-02-09 17:34:30,585 - src.bot.manager - INFO - [vivek] πŸ“© Message from Prashish | Game Machine Labs in supergroup: can you tell me what we have discussed in the past...
2026-02-09 17:34:30,586 - src.bot.handlers - INFO - [vivek] Received message from user 485126821 (group=True): can you tell me what we have discussed in the past?
2026-02-09 17:34:30,586 - src.bot.handlers - INFO - [vivek] πŸ” Checking smart routing...
2026-02-09 17:34:30,588 - src.bot.handlers - INFO - [vivek] πŸ€– Calling AI router...
2026-02-09 17:34:30,588 - src.bot.manager - INFO - [vivek] πŸ” should_respond_smart called for: 'can you tell me what we have discussed in the past...'
2026-02-09 17:34:30,588 - src.bot.manager - INFO - [vivek] πŸ€– Getting team router...
2026-02-09 17:34:30,588 - src.bot.manager - INFO - [vivek] πŸ“¨ Calling router.analyze_message...
2026-02-09 17:34:30,588 - src.bot.team_router - INFO - πŸ“¨ analyze_message called: chat=-1003716043040, msg='can you tell me what we have discussed in the past...'
2026-02-09 17:34:30,588 - src.bot.team_router - INFO - ⏳ Another bot is already analyzing this message, waiting...
2026-02-09 17:34:35,164 - src.indexing.search - INFO - [REDACTED] search found 10 results for: can you tell me what we have discussed in the past...
2026-02-09 17:34:35,164 - src.indexing.search - INFO - Built [REDACTED] context: 4157 chars, 4 chunks
2026-02-09 17:34:35,604 - src.bot.team_router - INFO - πŸ“₯ Got routing response: Let's analyze this message:
SHOULD_RESPOND: yes
PRIMARY: buddhi | CONFIDENCE: 0.9 | REASON: Direct request about past discussion requires curiosity a...
2026-02-09 17:34:35,604 - src.bot.team_router - INFO - πŸ“Š Routing decision: should_respond=True, primary=buddhi, reason=Direct request about past discussion requires curiosity and exploration
2026-02-09 17:34:36,045 - src.agents.base - INFO - kopila: Using model gpt-4o-mini via openrouter
2026-02-09 17:34:36,485 - src.bot.team_router - WARNING - Brainstorm coordinator error: 'str' object has no attribute 'get'
2026-02-09 17:34:36,486 - src.bot.manager - INFO - [anita] πŸ“Š Got decision: should_respond=True, primary=buddhi
2026-02-09 17:34:36,486 - src.bot.manager - INFO - [anita] ❌ Low-confidence secondary (0.30), not responding
2026-02-09 17:34:36,486 - src.bot.handlers - INFO - [anita] ❌ Not responding: Low confidence secondary: 0.30
2026-02-09 17:34:36,487 - src.bot.team_router - INFO - πŸ’Ύ Got cached result after waiting: primary=buddhi
2026-02-09 17:34:36,487 - src.bot.manager - INFO - [jange] πŸ“Š Got decision: should_respond=True, primary=buddhi
2026-02-09 17:34:36,487 - src.bot.manager - INFO - [jange] ❌ Not primary, assigned to buddhi
2026-02-09 17:34:36,487 - src.bot.handlers - INFO - [jange] ❌ Not responding: Assigned to buddhi
2026-02-09 17:34:36,487 - src.bot.team_router - INFO - πŸ’Ύ Got cached result after waiting: primary=buddhi
2026-02-09 17:34:36,487 - src.bot.manager - INFO - [pakka] πŸ“Š Got decision: should_respond=True, primary=buddhi
2026-02-09 17:34:36,487 - src.bot.manager - INFO - [pakka] ❌ Not primary, assigned to buddhi
2026-02-09 17:34:36,488 - src.bot.handlers - INFO - [pakka] ❌ Not responding: Assigned to buddhi
2026-02-09 17:34:36,488 - src.bot.team_router - INFO - πŸ’Ύ Got cached result after waiting: primary=buddhi
2026-02-09 17:34:36,488 - src.bot.manager - INFO - [buddhi] πŸ“Š Got decision: should_respond=True, primary=buddhi
2026-02-09 17:34:36,488 - src.bot.manager - INFO - [buddhi] βœ… I am the primary responder!
2026-02-09 17:34:36,488 - src.bot.handlers - INFO - [buddhi] βœ… Smart routing says respond: Primary responder: Direct request about past discussion requires curiosity and exploration
2026-02-09 17:34:36,489 - src.bot.team_router - INFO - πŸ’Ύ Got cached result after waiting: primary=buddhi
2026-02-09 17:34:36,489 - src.bot.manager - INFO - [kalpana] πŸ“Š Got decision: should_respond=True, primary=buddhi
2026-02-09 17:34:36,489 - src.bot.manager - INFO - [kalpana] ❌ Not primary, assigned to buddhi
2026-02-09 17:34:36,489 - src.bot.handlers - INFO - [kalpana] ❌ Not responding: Assigned to buddhi
2026-02-09 17:34:36,490 - src.bot.team_router - INFO - πŸ’Ύ Got cached result after waiting: primary=buddhi
2026-02-09 17:34:36,490 - src.bot.manager - INFO - [vivek] πŸ“Š Got decision: should_respond=True, primary=buddhi
2026-02-09 17:34:36,490 - src.bot.manager - INFO - [vivek] ❌ Not primary, assigned to buddhi
2026-02-09 17:34:36,490 - src.bot.handlers - INFO - [vivek] ❌ Not responding: Assigned to buddhi
2026-02-09 17:34:40,069 - src.memory.service - INFO - Retrieved 0 facts and 0 sessions for user 485126821
2026-02-09 17:34:41,917 - src.indexing.search - INFO - [REDACTED] search found 10 results for: can you tell me what we have discussed in the past...
2026-02-09 17:34:41,917 - src.indexing.search - INFO - Built [REDACTED] context: 4157 chars, 4 chunks
2026-02-09 17:35:47,726 - src.scheduler - INFO - Starting scheduled [REDACTED] indexing for: prashishh/...
2026-02-09 17:35:47,727 - src.indexing.sources.github_source - INFO - Fetching file tree from prashishh/...
2026-02-09 17:35:48,380 - src.indexing.sources.github_source - INFO - Found 819 files in repository
2026-02-09 17:35:48,402 - src.indexing.sources.github_source - INFO - Filtered to 62 indexable files
2026-02-09 17:35:48,402 - src.indexing.indexer - INFO - Starting indexing of 62 GitHub files (force=False)