Engineering December 15, 2025 12 min read

Building a Real-Time Collaboration Engine with WebSockets

A deep dive into how we built Tapioca's real-time collaboration features using WebSockets, including presence indicators, live updates, and conflict resolution.

AC

Alex Chen

Lead Engineer

Real-time collaboration has become an essential feature for modern productivity tools. In this post, we'll walk through how we built Tapioca's real-time engine using WebSockets in Go, including the challenges we faced and solutions we developed.

The Challenge

When multiple users work on the same project simultaneously, they need to see each other's changes instantly. Traditional HTTP polling creates too much latency and server load. WebSockets provide a persistent, bidirectional connection that's perfect for this use case.

Our requirements were:

  • Sub-100ms latency for updates
  • Presence indicators showing who's online and what they're viewing
  • Graceful handling of network interruptions
  • Scalability across multiple server instances
  • Conflict resolution for simultaneous edits

Architecture Overview

Our real-time system consists of three main components:

1. WebSocket Hub

The Hub manages all active connections and routes messages between them. Each connection is associated with a user and subscribes to specific "rooms" (projects, tasks, etc.).

type Hub struct {
    // Registered connections by room
    rooms map[string]map[*Connection]bool

    // Register requests from connections
    register chan *Connection

    // Unregister requests
    unregister chan *Connection

    // Inbound messages to broadcast
    broadcast chan *Message

    // Redis pub/sub for multi-instance support
    redis *redis.Client
}

func (h *Hub) Run() {
    for {
        select {
        case conn := <-h.register:
            h.addToRoom(conn)
        case conn := <-h.unregister:
            h.removeFromRoom(conn)
        case msg := <-h.broadcast:
            h.broadcastToRoom(msg)
        }
    }
}

2. Connection Manager

Each WebSocket connection is wrapped in a Connection struct that handles message encoding, heartbeats, and reconnection logic.

type Connection struct {
    hub      *Hub
    conn     *websocket.Conn
    send     chan []byte
    userID   string
    rooms    []string
    lastPing time.Time
}

func (c *Connection) writePump() {
    ticker := time.NewTicker(pingPeriod)
    defer ticker.Stop()

    for {
        select {
        case message, ok := <-c.send:
            if !ok {
                c.conn.WriteMessage(websocket.CloseMessage, []byte{})
                return
            }
            c.conn.WriteMessage(websocket.TextMessage, message)
        case <-ticker.C:
            c.conn.WriteMessage(websocket.PingMessage, nil)
        }
    }
}

3. Redis Pub/Sub Bridge

To support horizontal scaling, we use Redis pub/sub to broadcast messages across server instances. When a message arrives on one server, it's published to Redis and all other servers forward it to their connected clients.

Presence System

Users want to know who else is viewing the same task or project. Our presence system tracks this in real-time.

type PresenceUpdate struct {
    UserID    string    `json:"userId"`
    Name      string    `json:"name"`
    Avatar    string    `json:"avatar"`
    Room      string    `json:"room"`
    Status    string    `json:"status"` // viewing, editing, idle
    Cursor    *Cursor   `json:"cursor,omitempty"`
    UpdatedAt time.Time `json:"updatedAt"`
}

When a user opens a task, their client sends a presence:join message. The server broadcasts this to all other users in that room. When the user navigates away or closes the tab, a presence:leave message is sent.

Conflict Resolution

What happens when two users edit the same field simultaneously? We use a simple last-write-wins strategy with conflict detection.

  1. Each entity has a version field that increments on every update
  2. When a client sends an update, it includes the version it's based on
  3. If the server version is newer, the update is rejected with a conflict error
  4. The client receives the current state and can merge or retry

For most project management use cases, last-write-wins with conflict notification provides the right balance of simplicity and user experience. More complex scenarios might require operational transformation (OT) or CRDTs.

Client-Side Implementation

On the frontend (Svelte), we use a WebSocket store that manages the connection lifecycle and exposes reactive state:

// Simplified WebSocket store
function createWebSocket() {
    let ws: WebSocket | null = null;
    const presence = writable<Map<string, PresenceUpdate>>(new Map());
    const status = writable<'connecting' | 'connected' | 'disconnected'>('disconnected');

    function connect() {
        ws = new WebSocket(WS_URL);
        status.set('connecting');

        ws.onopen = () => {
            status.set('connected');
            authenticate();
        };

        ws.onmessage = (event) => {
            const msg = JSON.parse(event.data);
            handleMessage(msg);
        };

        ws.onclose = () => {
            status.set('disconnected');
            // Exponential backoff reconnect
            setTimeout(connect, getBackoffDelay());
        };
    }

    return { connect, presence, status, send };
}

Performance Results

After deploying the real-time system, we measured:

  • Average latency: 45ms for message delivery
  • Concurrent connections: 10,000+ per server instance
  • Memory usage: ~100KB per connection
  • Redis pub/sub overhead: <5ms additional latency

Lessons Learned

Building a production-ready real-time system taught us several lessons:

  1. Heartbeats are essential - WebSocket connections can silently die. Regular pings detect this quickly.
  2. Plan for reconnection - Networks are unreliable. Clients should automatically reconnect with exponential backoff.
  3. Room-based routing - Don't broadcast to everyone. Route messages only to interested subscribers.
  4. Monitor everything - Track connection counts, message rates, and latency. Issues surface quickly with good metrics.

What's Next

We're exploring adding collaborative cursors for the Gantt chart view and operational transformation for rich text comments. Stay tuned for future engineering posts diving into these topics.

The full implementation is available in our open-source repository. Contributions and feedback are always welcome!

Share this article

AC

Alex Chen

Lead Engineer

Full-stack developer passionate about developer experience and open-source software.