Building a Real-Time Collaboration Engine with WebSockets
A deep dive into how we built Tapioca's real-time collaboration features using WebSockets, including presence indicators, live updates, and conflict resolution.
Alex Chen
Lead Engineer
Real-time collaboration has become an essential feature for modern productivity tools. In this post, we'll walk through how we built Tapioca's real-time engine using WebSockets in Go, including the challenges we faced and solutions we developed.
The Challenge
When multiple users work on the same project simultaneously, they need to see each other's changes instantly. Traditional HTTP polling creates too much latency and server load. WebSockets provide a persistent, bidirectional connection that's perfect for this use case.
Our requirements were:
- Sub-100ms latency for updates
- Presence indicators showing who's online and what they're viewing
- Graceful handling of network interruptions
- Scalability across multiple server instances
- Conflict resolution for simultaneous edits
Architecture Overview
Our real-time system consists of three main components:
1. WebSocket Hub
The Hub manages all active connections and routes messages between them. Each connection is associated with a user and subscribes to specific "rooms" (projects, tasks, etc.).
type Hub struct {
// Registered connections by room
rooms map[string]map[*Connection]bool
// Register requests from connections
register chan *Connection
// Unregister requests
unregister chan *Connection
// Inbound messages to broadcast
broadcast chan *Message
// Redis pub/sub for multi-instance support
redis *redis.Client
}
func (h *Hub) Run() {
for {
select {
case conn := <-h.register:
h.addToRoom(conn)
case conn := <-h.unregister:
h.removeFromRoom(conn)
case msg := <-h.broadcast:
h.broadcastToRoom(msg)
}
}
} 2. Connection Manager
Each WebSocket connection is wrapped in a Connection struct that handles message encoding, heartbeats, and reconnection logic.
type Connection struct {
hub *Hub
conn *websocket.Conn
send chan []byte
userID string
rooms []string
lastPing time.Time
}
func (c *Connection) writePump() {
ticker := time.NewTicker(pingPeriod)
defer ticker.Stop()
for {
select {
case message, ok := <-c.send:
if !ok {
c.conn.WriteMessage(websocket.CloseMessage, []byte{})
return
}
c.conn.WriteMessage(websocket.TextMessage, message)
case <-ticker.C:
c.conn.WriteMessage(websocket.PingMessage, nil)
}
}
} 3. Redis Pub/Sub Bridge
To support horizontal scaling, we use Redis pub/sub to broadcast messages across server instances. When a message arrives on one server, it's published to Redis and all other servers forward it to their connected clients.
Presence System
Users want to know who else is viewing the same task or project. Our presence system tracks this in real-time.
type PresenceUpdate struct {
UserID string `json:"userId"`
Name string `json:"name"`
Avatar string `json:"avatar"`
Room string `json:"room"`
Status string `json:"status"` // viewing, editing, idle
Cursor *Cursor `json:"cursor,omitempty"`
UpdatedAt time.Time `json:"updatedAt"`
} When a user opens a task, their client sends a presence:join message. The server broadcasts this to all other users in that room. When
the user navigates away or closes the tab, a presence:leave message is sent.
Conflict Resolution
What happens when two users edit the same field simultaneously? We use a simple last-write-wins strategy with conflict detection.
- Each entity has a
versionfield that increments on every update - When a client sends an update, it includes the version it's based on
- If the server version is newer, the update is rejected with a
conflicterror - The client receives the current state and can merge or retry
For most project management use cases, last-write-wins with conflict notification provides the right balance of simplicity and user experience. More complex scenarios might require operational transformation (OT) or CRDTs.
Client-Side Implementation
On the frontend (Svelte), we use a WebSocket store that manages the connection lifecycle and exposes reactive state:
// Simplified WebSocket store
function createWebSocket() {
let ws: WebSocket | null = null;
const presence = writable<Map<string, PresenceUpdate>>(new Map());
const status = writable<'connecting' | 'connected' | 'disconnected'>('disconnected');
function connect() {
ws = new WebSocket(WS_URL);
status.set('connecting');
ws.onopen = () => {
status.set('connected');
authenticate();
};
ws.onmessage = (event) => {
const msg = JSON.parse(event.data);
handleMessage(msg);
};
ws.onclose = () => {
status.set('disconnected');
// Exponential backoff reconnect
setTimeout(connect, getBackoffDelay());
};
}
return { connect, presence, status, send };
} Performance Results
After deploying the real-time system, we measured:
- Average latency: 45ms for message delivery
- Concurrent connections: 10,000+ per server instance
- Memory usage: ~100KB per connection
- Redis pub/sub overhead: <5ms additional latency
Lessons Learned
Building a production-ready real-time system taught us several lessons:
- Heartbeats are essential - WebSocket connections can silently die. Regular pings detect this quickly.
- Plan for reconnection - Networks are unreliable. Clients should automatically reconnect with exponential backoff.
- Room-based routing - Don't broadcast to everyone. Route messages only to interested subscribers.
- Monitor everything - Track connection counts, message rates, and latency. Issues surface quickly with good metrics.
What's Next
We're exploring adding collaborative cursors for the Gantt chart view and operational transformation for rich text comments. Stay tuned for future engineering posts diving into these topics.
The full implementation is available in our open-source repository. Contributions and feedback are always welcome!
Share this article