ConnectOnionConnectOnion
DocsNetworkSession Reconnect

Session Reconnect

WebSocket connections drop. Agents keep running. Here's how reconnection works.

Key insight: The agent thread and its IO queues survive the WebSocket. When a client reconnects, the same queues are reattached to the new connection. The agent never knows the difference.

Architecture

Two layers handle session survival:

Two-layer session storage

┌─────────────────────────────────────┐
│ In-Memory (ActiveSessionRegistry)   │
│ Running agents, IO queues, threads  │
│ Cleaned after 10min idle            │
└──────────────┬──────────────────────┘
               │ save on completion
┌──────────────▼──────────────────────┐
│ Disk (.co/session_results.jsonl)    │
│ Final results for polling recovery  │
│ Expires after 24h                   │
└─────────────────────────────────────┘

In-Memory

Keeps the agent thread and IO queues alive so a reconnecting client resumes mid-execution.

Disk (JSONL)

Stores final results so a client that never reconnects can poll later.

Session Lifecycle

State transitions

register()
    │
    ▼
 RUNNING ──────────────────────► COMPLETED
    │            agent finishes       │
    │                                 │
    ▼ client disconnects              │ 10min idle
 SUSPENDED                           ▼
    │                              REMOVED
    │ client reconnects
    ▼
 RUNNING (same IO queues)
TransitionTriggerWhat happens
→ RUNNINGregister()Agent thread spawned, IO queues created
→ SUSPENDEDClient WebSocket dropsAgent keeps running, queues buffer events
→ RUNNINGClient reconnects (same session_id)Same IO queues reattached to new WebSocket
→ COMPLETEDAgent finishesResult saved to JSONL, session stays in memory
→ REMOVED10min idle (no client ping)Freed from memory

Reconnection Flow

Timeline: connect → disconnect → reconnect → finish

Time   Client              WebSocket Handler    Agent Thread
────   ──────              ─────────────────    ────────────
T+0    INPUT ─────────────► accept
                            register()
                            spawn thread ───────► agent.input() starts

T+5                        ◄─────────────────── io.send(thinking)
       ◄── thinking ────────

T+15                       ◄─────────────────── io.send(approval_needed)
       ◄── approval_needed─                     io.receive() BLOCKS
                                                 waiting for response...

T+20   ✕ DISCONNECT         mark_suspended()
                            (queues stay alive)   (still blocked)

T+25   RECONNECT ──────────► registry.get() → FOUND
                             drain queued events
       ◄── queued events ───
                             update_ping()
                             pump same IO queues
       approve ────────────► io._incoming.put() ► io.receive() unblocks
                                                   agent continues...

T+35                        ◄─────────────────── agent finishes
                             mark_completed()
                             save to JSONL
       ◄── OUTPUT ──────────

What happened: Agent asked for approval at T+15, blocked waiting. Client disconnected at T+20 — agent stayed blocked, events buffered. Client reconnected at T+25 — got buffered events, sent approval. Agent unblocked and finished normally.

IO Queue Bridge

The agent runs in a sync thread. The WebSocket handler is async. Two thread-safe queues bridge them:

WebSocketIO — async/sync bridge

┌───────────────────┐          ┌───────────────────┐
│  Agent Thread      │          │  WebSocket Handler │
│  (sync Python)     │          │  (async ASGI)      │
│                    │          │                    │
│  io.send(event) ──►│─outgoing─│►── ws.send(event)  │
│                    │  queue   │                    │
│  io.receive()  ◄──│─incoming─│◄── ws.receive()    │
│  (blocks)          │  queue   │                    │
└───────────────────┘          └───────────────────┘

On disconnect

io.close() puts a sentinel in the incoming queue, unblocking any waiting receive().

On reconnect

The same io object is reused. A new WebSocket handler pumps the same queues.

Keep-Alive

Server sends PING every 30s. Client responds with PONG. Each message updates last_ping in the registry.

PING/PONG heartbeat

Client                    Server
  │                         │
  │◄──── PING ──────────────│  every 30s
  │───── PONG ─────────────►│  update last_ping
  │                         │
  │◄──── PING ──────────────│
  │───── PONG ─────────────►│  update last_ping
  │                         │
  │  ✕ disconnect            │
  │                         │  last_ping freezes
  │                         │  idle timer starts
  │                         │  ...
  │                         │  10min idle → cleanup

Session Cleanup

One rule for all non-running sessions:

Cleanup rule

             status != 'running'
             AND idle > 10min
                   │
                   ▼
          ┌────────────────┐
          │ REMOVE from    │
          │ registry       │
          │ (memory freed) │
          └────────────────┘

No special cases. Completed, suspended — same rule.

Results already on disk. JSONL storage has the final result.

Client can still poll. GET /sessions/{id} works for 24h.

Background job. Runs every 60s to sweep expired sessions.

Recovery Without Reconnect

If the client never comes back:

Polling recovery after disconnect

Client gone                 Server
                              │
                              │  agent finishes
                              │  save result to .co/session_results.jsonl
                              │  mark_completed()
                              │
                              │  ... 10min idle ...
                              │
                              │  cleanup_expired() → removed from memory
                              │
                              │  (result still on disk for 24h)
                              │
Client returns (hours later)  │
  │                           │
  │── GET /sessions/{id} ────►│  read from JSONL
  │◄── result ────────────────│

No data loss. The JSONL file is the durable record.

Session Merge

When a client reconnects and both sides have session state, merge_sessions() resolves the conflict using iteration count (incremented on each LLM call):

Iteration-based conflict resolution

Client (stale)              Server (continued)
iteration: 5                iteration: 10
    │                           │
    └───────────┬───────────────┘
                │ merge_sessions()
                ▼
          server wins (higher iteration)
          → use server session state
ScenarioResolution
Server continued (iteration 10 vs 5)Server wins
Client newer (iteration 8 vs 3)Client wins
Tie (same iteration)Higher timestamp wins

Key Files

FileRole
network/host/session/active.pyActiveSessionRegistry — in-memory session tracking
network/io/websocket.pyWebSocketIO — queue bridge between async/sync
network/host/session/storage.pySessionStorage — JSONL persistence
network/host/session/merge.pySession merge conflict resolution
network/asgi/websocket.pyWebSocket handler — orchestrates reconnection

Enjoying ConnectOnion?

⭐ Star us on GitHub = ☕ Coffee chat with our founder. We love meeting builders.