bug: nack() is never called — transient handler failures silently drop events #3

New Issue

GKaszewski · 2026-06-10T09:12:59Z

GKaszewski commented

2026-06-10 09:12:59 +00:00

Problem

WorkerService::dispatch() always calls ack() after running all handlers, even when a handler returns an error. The nack() method is fully implemented in all queue adapters (sqlite, postgres, nats) with exponential backoff and dead-letter support, but it is never called anywhere.

Result: if a handler fails transiently (TMDB down, search index unavailable, etc.), the event is consumed and lost — no retry.

Root cause

crates/application/src/worker.rs dispatch():

async fn dispatch(handlers: Arc<Vec<Arc<dyn EventHandler>>>, envelope: EventEnvelope) {
    for handler in handlers.iter() {
        if let Err(e) = handler.handle(&envelope.event).await {
            tracing::warn!("event handler error (non-fatal): {e}");
        }
    }
    if let Err(e) = envelope.ack().await {   // always acks, never nacks
        tracing::error!("ack failed: {e}");
    }
}

Complication

The worker uses a fan-out model — every event is dispatched to all handlers. Naively nacking on any handler error would re-run all handlers, causing duplicate processing on the ones that already succeeded. Handlers would need to be idempotent for this to be safe.

Options

Make handlers idempotent + nack on any error — simplest fix, requires auditing each handler for idempotency
Route each event type to exactly one handler — removes the fan-out problem, nack becomes safe but changes dispatch architecture
Per-handler ack tracking — most correct, most complex

Affected files

crates/application/src/worker.rs
crates/adapters/sqlite-event-queue/src/lib.rs (nack implemented but unused)
crates/adapters/postgres-event-queue/src/lib.rs (same)
crates/adapters/nats/src/consumer.rs (same)

## Problem `WorkerService::dispatch()` always calls `ack()` after running all handlers, even when a handler returns an error. The `nack()` method is fully implemented in all queue adapters (sqlite, postgres, nats) with exponential backoff and dead-letter support, but it is never called anywhere. Result: if a handler fails transiently (TMDB down, search index unavailable, etc.), the event is consumed and lost — no retry. ## Root cause `crates/application/src/worker.rs` `dispatch()`: ```rust async fn dispatch(handlers: Arc<Vec<Arc<dyn EventHandler>>>, envelope: EventEnvelope) { for handler in handlers.iter() { if let Err(e) = handler.handle(&envelope.event).await { tracing::warn!("event handler error (non-fatal): {e}"); } } if let Err(e) = envelope.ack().await { // always acks, never nacks tracing::error!("ack failed: {e}"); } } ``` ## Complication The worker uses a fan-out model — every event is dispatched to all handlers. Naively nacking on any handler error would re-run all handlers, causing duplicate processing on the ones that already succeeded. Handlers would need to be idempotent for this to be safe. ## Options 1. **Make handlers idempotent + nack on any error** — simplest fix, requires auditing each handler for idempotency 2. **Route each event type to exactly one handler** — removes the fan-out problem, nack becomes safe but changes dispatch architecture 3. **Per-handler ack tracking** — most correct, most complex ## Affected files - `crates/application/src/worker.rs` - `crates/adapters/sqlite-event-queue/src/lib.rs` (nack implemented but unused) - `crates/adapters/postgres-event-queue/src/lib.rs` (same) - `crates/adapters/nats/src/consumer.rs` (same)

GKaszewski closed this issue

2026-06-11 22:32:56 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: GKaszewski/movies-diary#3