Agent-Driven RAG Architecture¶

This document describes the client-side RAG (Retrieval Augmented Generation) architecture where the LangGraph agent drives book retrieval through tool calls executed in the browser.

Overview¶

┌─────────────────────┐                    ┌─────────────────────┐
│   Svelte Frontend   │◄──────────────────►│  LangGraph Agent    │
│                     │   Direct via SDK    │  (Port 2024)        │
│  - Chat UI          │                    │                     │
│  - Tool Executor    │                    │  - AI Reasoning     │
│  - Vector Service   │                    │  - Tool Calls       │
│  - EPUB Service     │                    │  - Payment Verify   │
└─────────────────────┘                    └─────────────────────┘
         │
         ▼
┌─────────────────────┐
│  FastAPI Backend    │  (Wallet only)
│  (Port 8000)        │
│  - Ecash Receive    │
│  - Balance Check    │
└─────────────────────┘

How It Works¶

1. Agent Tools with Interrupt¶

The LangGraph agent has tools that require client-side data:

Tool	Purpose	Client Execution
`get_table_of_contents()`	Book structure	`epubService.getTableOfContentsForAgent()`
`get_book_metadata()`	Title, author, pages	`epubService.getMetadata()`
`get_chapter(chapter_id)`	Full chapter text	`epubService.getChapterText()`
`search_book(query)`	Semantic search	`vectorService.search()`

The graph is compiled with interrupt_before=["tools"], causing execution to pause before any tool runs.

2. Interrupt Flow¶

User sends message
       ↓
Agent decides to call search_book("redemption theme")
       ↓
Graph INTERRUPTS (returns state with pending tool call)
       ↓
Frontend receives interrupt via stream
       ↓
Frontend executes search locally using vectorService
       ↓
Frontend calls client.threads.updateState() with tool result
       ↓
Frontend resumes stream with null input
       ↓
Agent continues with search results in context
       ↓
Agent may call more tools or generate final response

3. Client-Side Vector Search¶

Books are indexed locally using:

Embeddings: @xenova/transformers (MiniLM-L6-v2, ~23MB)
Search: Custom cosine similarity over in-memory embeddings

Indexing happens on-demand when the agent first calls search_book().

Key Files¶

File	Purpose
`agent/src/agent/graph.py`	Agent with tools and `interrupt_before` config
`frontend/src/lib/services/langgraph.ts`	SDK wrapper with interrupt handling loop
`frontend/src/lib/services/agentToolsService.ts`	Client-side tool dispatcher
`frontend/src/lib/services/vectorService.ts`	Embedding and vector search
`frontend/src/lib/services/epubService.ts`	EPUB text extraction
`frontend/src/lib/stores/chat.svelte.ts`	Chat state with tool status

Agent Tool Definitions¶

# agent/src/agent/graph.py

@tool
def get_table_of_contents() -> str:
    """Get the table of contents for the current book.
    Returns chapter titles, IDs, and hierarchy."""
    return ""  # Never called - client executes via interrupt

@tool
def get_chapter(chapter_id: str) -> str:
    """Get the full text content of a specific chapter."""
    return ""

@tool
def search_book(query: str, top_k: int = 5) -> str:
    """Semantic search across the entire book."""
    return ""

# Graph compiled with interrupt
graph = builder.compile(interrupt_before=["tools"])

Frontend Interrupt Handler¶

// frontend/src/lib/services/langgraph.ts

while (iteration < maxIterations) {
  const stream = client.runs.stream(threadId, assistantId, { input });

  for await (const event of stream) {
    if (lastMessage?.tool_calls?.length > 0) {
      // Interrupt detected - execute tools locally
      pendingToolCalls = lastMessage.tool_calls;
      interrupted = true;
    }
  }

  if (!interrupted) break;

  // Execute tools and resume
  const results = await Promise.all(
    pendingToolCalls.map(tc => executeToolCall(tc, bookId))
  );

  await client.threads.updateState(threadId, {
    values: { messages: toolResultMessages }
  });

  input = null;  // Resume from interrupt
}

Tool Execution Service¶

// frontend/src/lib/services/agentToolsService.ts

export async function executeToolCall(toolCall, bookId) {
  switch (toolCall.name) {
    case 'get_table_of_contents':
      return { id: toolCall.id, result: await epubService.getTableOfContentsForAgent() };
    case 'get_chapter':
      return { id: toolCall.id, result: await epubService.getChapterText(toolCall.args.chapter_id) };
    case 'search_book':
      return { id: toolCall.id, result: await vectorService.search(toolCall.args.query, bookId) };
    // ...
  }
}

Chat UI Status¶

The chat interface shows tool execution status:

"Searching book..." when search_book() is running
"Reading chapter..." when get_chapter() is running
"Reading book structure..." when get_table_of_contents() is running

This provides transparency about what the agent is doing.

Benefits¶

Agent Control: Agent decides what to retrieve based on the question
Multi-Step Reasoning: Can search, analyze, search again
Privacy: EPUB content stays in browser, only excerpts sent to LLM
No Backend Chat: Chat goes directly to LangGraph, simpler architecture
Rich Context: Agent has full access to book via tools

Limitations¶

Latency: Each tool call adds a round-trip
Embedding Model Download: ~23MB on first use
Memory Usage: Large books with many chunks use browser memory
Max Iterations: Capped at 10 tool calls per message to prevent loops

Configuration¶

Frontend Environment¶

# frontend/.env
VITE_LANGGRAPH_API_URL=http://localhost:2024
VITE_LANGGRAPH_ASSISTANT_ID=agent

Agent Environment¶

# agent/.env
LLM_BASE_URL=http://localhost:11434/v1
LLM_API_KEY=ollama
LLM_MODEL=llama3.2