Skip to content

Agent Architecture

Available Agents

Agent Path Description
deeptutor ./src/deeptutor/__init__.py:graph Socratic dialogue assistant with full middleware stack
seminar_agent ./src/agent/__init__.py:graph Legacy seminar agent
simple_agent ./src/simple_agent/__init__.py:graph Minimal agent for testing

Deeptutor (Primary Agent)

The deeptutor agent is the primary implementation with:

  • Middleware-based architecture for modularity
  • Two file systems: Client-side (user's files) and server-side (agent memory)
  • Clarification tools for handling ambiguous user intent
  • Task tracking with TodoListMiddleware
  • Streaming payments with Cashu micropayments

See Deeptutor Architecture for full details.

Middleware Stack

  1. CashuPaymentMiddleware - Payment validation and per-iteration deduction
  2. TodoListMiddleware - Task tracking for complex operations
  3. ClarifyWithHumanMiddleware - Ask user for intent clarification
  4. FilesystemMiddleware - Server-side ephemeral storage (StateBackend)
  5. ClientToolsMiddleware - Client-side file operations via interrupts
  6. HumanInTheLoopMiddleware - Approval for funding requests

Deepagents Reference

The deepagents/ directory contains a reference implementation of the deepagents library, which provides:

  • FilesystemMiddleware - File tools with backend abstraction
  • TodoListMiddleware - Task tracking (also available from langchain)
  • SubAgentMiddleware - Spawn subagents for complex tasks
  • StateBackend / StoreBackend - Storage backends

Note: This is included for reference only. The actual deepagents package should be installed separately via pip:

pip install -e ./deepagents/libs/deepagents

Deeptutor Agent Architecture

The Deeptutor agent is a Socratic dialogue assistant that helps users develop and refine arguments. It uses a middleware-based architecture for modularity and extensibility.

Overview

┌─────────────────────────────────────────────────────────────────┐
│                        Deeptutor Agent                          │
├─────────────────────────────────────────────────────────────────┤
│  Middleware Stack (processed in order)                          │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ 1. CashuPaymentMiddleware  - Payment validation          │   │
│  │ 2. TodoListMiddleware      - Task tracking               │   │
│  │ 3. ClarifyWithHumanMiddleware - User clarification       │   │
│  │ 4. FilesystemMiddleware    - Server-side memory          │   │
│  │ 5. ClientToolsMiddleware   - Client-side file ops        │   │
│  │ 6. HumanInTheLoopMiddleware - Approval workflows         │   │
│  └──────────────────────────────────────────────────────────┘   │
├─────────────────────────────────────────────────────────────────┤
│                      Two File Systems                            │
│  ┌────────────────────┐    ┌────────────────────────────────┐   │
│  │ Server-side        │    │ Client-side (Browser)          │   │
│  │ (StateBackend)     │    │ (IndexedDB)                    │   │
│  │                    │    │                                │   │
│  │ /scratch/          │    │ User's project files:          │   │
│  │ /summaries/        │    │ - Scraped articles             │   │
│  │ /analysis/         │    │ - Drafts and notes             │   │
│  │                    │    │ - Seminar documents            │   │
│  │ Agent writes freely│    │ Writes require approval        │   │
│  └────────────────────┘    └────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Middleware Stack

1. CashuPaymentMiddleware

Handles streaming micropayments with per-iteration deduction.

  • Validates Cashu tokens without immediate redemption
  • Deducts configurable satoshis per LLM iteration
  • Interrupts for additional funding when exhausted
  • Generates refund tokens for unused balance

2. TodoListMiddleware

Provides task tracking for complex multi-step operations.

Tool: write_todos(todos: List[Todo])

Use cases: - Multi-step research tasks - Argument development workflows - Complex document creation

3. ClarifyWithHumanMiddleware

Allows the agent to ask clarifying questions when user intent is unclear.

Tools: - ask_user(question) - Free-form natural language question - ask_choices(question, options, allow_multiple?, allow_freeform?) - Structured choices

When to use: - User's goal or intent is ambiguous - Multiple valid interpretations exist - User preferences would significantly change approach

When NOT to use: - Asking how to use its own tools - Confirming obvious next steps - Delays that don't add value

4. FilesystemMiddleware (StateBackend)

Provides server-side ephemeral storage for agent working memory.

Tools: ls, read_file, write_file, patch_file, glob, grep

Use for: - Intermediate analysis and notes - Drafts before presenting to user - Research findings within a session

Files persist within a thread but not across threads.

5. ClientToolsMiddleware

Provides access to user's project files stored in the browser.

Read & Write tools (auto-approved): - list_files(file_type?) - List files with optional filters - read_file(file_id) - Read file content - search_files(query, top_k?) - Semantic search - grep_files(pattern, glob_pattern?, case_sensitive?) - Pattern search - glob_files(pattern) - Find files by name pattern - write_file(title, content, file_type) - Create new file - patch_file(file_id, search, replace, description) - Edit file

6. HumanInTheLoopMiddleware

Handles approval workflows for payment funding requests.

Interrupt Flow

Agent calls tool
┌──────────────────┐
│ Is it a client   │──No──► Execute normally
│ or clarify tool? │
└────────┬─────────┘
         │Yes
┌──────────────────┐
│ interrupt()      │
│ Pause execution  │
└────────┬─────────┘
┌──────────────────┐
│ Frontend handles │
│ - Renders UI     │
│ - Gets user input│
│ - Executes tool  │
└────────┬─────────┘
┌──────────────────┐
│ Resume with      │
│ tool result      │
└────────┬─────────┘
Agent continues

Design Considerations

Why Two File Systems?

  1. User autonomy: User's files stay in their browser, under their control
  2. Privacy: Scraped articles and drafts never leave the client unless explicitly shared
  3. Agent flexibility: Agent can freely write to its working memory without interrupting the user
  4. Session context: Agent can maintain analysis notes throughout a conversation

Why Clarification Tools?

Instead of making assumptions, the agent can: - Ask structured questions with predefined options - Request free-form clarification when needed - Avoid wasted effort from misunderstanding intent

The tools are designed to NOT be overused: - System prompt discourages asking about tool usage - Encourages asking only when genuinely ambiguous

Why Client-side Tool Execution?

  1. Latency: File operations happen locally, no round-trip to server
  2. Offline capability: Files work even if connection drops
  3. Data sovereignty: User's documents stay on their device
  4. Approval UX: Frontend can show rich diffs and approval dialogs

State Schema

class DeeptutorState(TypedDict, total=False):
    # Messages (required)
    messages: Annotated[Sequence[BaseMessage], add_messages]

    # Payment state
    payment_token: str | None
    payment_balance_sats: int
    payment_spent_sats: int
    payment_refund_token: str | None
    payment_status: PaymentStatus
    payment_refund_claimed: bool

    # Project context
    current_project_id: str | None

    # Middleware-added state
    # files: dict[str, FileData]  # Added by FilesystemMiddleware
    # todos: list[Todo]           # Added by TodoListMiddleware

Example Prompts

Trigger Clarification Tools

"Help me with my argument"
→ Agent should ask: "What topic is your argument about?" or offer choices.

"I want to write about economics"  
→ Agent might ask choices: "Which aspect interests you? a) Monetary policy b) Market structures c) International trade d) Something else"

"Improve this"
→ Agent should ask: "What would you like me to improve? Style, clarity, argumentation, or something else?"

Normal Usage (No Clarification Needed)

"Create a new document titled 'Bitcoin Thesis' with an introduction about sound money"
→ Clear intent, agent proceeds with write_file.

"Search my files for mentions of inflation"
→ Clear intent, agent uses grep_files.

File Structure

agents/src/deeptutor/
├── __init__.py
├── graph.py              # Agent factory and configuration
├── state.py              # State type definitions
└── middleware/
    ├── __init__.py
    ├── payment.py        # CashuPaymentMiddleware
    ├── client_tools.py   # ClientToolsMiddleware
    └── clarify.py        # ClarifyWithHumanMiddleware

Frontend Integration

The frontend handles interrupts by:

  1. Detecting interrupt type via type field:
  2. client_tool_execution → Execute tool locally
  3. clarification_request → Show question UI
  4. payment_exhausted → Show funding dialog

  5. Rendering appropriate UI:

  6. Text input for ask_user
  7. Choice buttons for ask_choices
  8. Diff view for patch_file approval

  9. Resuming the graph with the response in the expected format

See frontend/src/lib/services/tool-executor.ts for tool execution implementation.

Clarification Flow

When the agent calls ask_user or ask_choices:

Agent calls ask_user("What topic?")
ClarifyWithHumanMiddleware
interrupt({type: "clarification_request", ...})
┌────────────────────────────────────────┐
│ Frontend: langgraph.ts                 │
│ - Detects isClarificationInterrupt()   │
│ - Calls onClarificationInterrupt()     │
└────────────────────────────────────────┘
┌────────────────────────────────────────┐
│ Frontend: agent.svelte.ts              │
│ - Stores clarificationInterrupt        │
│ - Sets awaitingHumanResponse = true    │
└────────────────────────────────────────┘
┌────────────────────────────────────────┐
│ Frontend: ClarificationPanel.svelte    │
│ - Shows question text                  │
│ - For ask_user: textarea input         │
│ - For ask_choices: button options      │
│ - Optional freeform with choices       │
└────────────────────────────────────────┘
         ▼ (user responds)
┌────────────────────────────────────────┐
│ resumeWithClarificationResponse()      │
│ - Formats response as tool result      │
│ - Resumes graph with answer            │
└────────────────────────────────────────┘
Agent receives user's answer as ToolMessage
Agent continues with clarified intent
         ▼ (may trigger another interrupt)
┌────────────────────────────────────────┐
│ Chained Interrupt Handling:            │
│ - Another clarification (ask_user)     │
│ - Client tool (list_files, etc.)       │
│ - HITL approval (write operations)     │
└────────────────────────────────────────┘

Chained Interrupt Handling

After resuming from any interrupt type, the agent may immediately trigger another interrupt. All resume functions include callbacks for all interrupt types:

  • onClarificationInterrupt - Another clarification question
  • onClientToolInterrupt - Client-side tool execution needed
  • onHITLInterrupt - Human approval required

This allows seamless handling of multi-step interactions where the agent asks clarifying questions, then uses tools, then requests approvals, etc.

Interrupt Data Format

// ask_user interrupt
{
  type: 'clarification_request',
  tool: 'ask_user',
  tool_call_id: '12345',
  question: 'What topic would you like to write about?'
}

// ask_choices interrupt
{
  type: 'clarification_request',
  tool: 'ask_choices',
  tool_call_id: '12345',
  question: 'What type of document?',
  options: [
    { id: 'seminar', label: 'Socratic Seminar' },
    { id: 'essay', label: 'Essay' },
    { id: 'notes', label: 'Research Notes' }
  ],
  allow_multiple: false,
  allow_freeform: true
}

Response Format

// For ask_user
{ response: "I want to write about Bitcoin's monetary policy" }

// For ask_choices (single select)
{ selected: ['seminar'] }

// For ask_choices (with freeform)
{ selected: ['essay'], freeform: 'Specifically about inflation' }