ArchitectureEvent BusObservabilityInfrastructureCost Optimization

From Polling to Pushing: How Flipturn Built a Cost-Effective Event Bus

Suvro Banerjee

February 5, 2026

16 min read

1. Introduction: The "Silent Bill" of Polling

When we first built Flipturn—an autonomous AI agent that handles Level 3 support tickets for DevTools companies—we followed the standard playbook. We chose Arq, a battle-tested Python task queue, deployed it to Render, and connected it to Upstash Redis. It worked beautifully. Until we got our first invoice.

The Problem: Paying for Silence

Our initial architecture used the classic worker polling pattern: every worker would wake up, check the Redis queue for new jobs, find nothing, go back to sleep, and repeat. With Arq's default configuration, that meant polling every 0.5 seconds.

Let's do the math:

Polls per minute: 120
Commands per hour: 7,200
Commands per day: 172,800
Commands per month: 5.2 million

On Upstash's serverless pricing model—where you pay per command, not per hour—we were burning $0.35 per day per worker just to check for empty queues. For a startup processing maybe 10-20 tickets per day, 99% of those 172,800 daily commands were asking "Any new jobs?" and hearing back "Nope."

This is what we call the "Silent Bill": infrastructure costs incurred when nothing is happening.

The Realization

The irony hit us during a code review. We were building an AI agent that autonomously responds to customer incidents—designed to be instant when needed but lazy when not. Yet our infrastructure was the exact opposite: constantly awake, constantly polling, constantly burning money while idle.

We needed a system that could:

Wake instantly when a webhook arrived (low latency, no polling delays)
Sleep completely when idle (zero cost during silence)
Scale to zero without sacrificing reliability

The goal wasn't just cost savings—it was architectural alignment. Our agent should only "think" when there's something to think about.

2. The Decision: Why Redis Streams? (The "Honda Civic" Defense)

When evaluating event bus options, we looked at the usual suspects:

Option	Pros	Cons	Verdict
Kafka	Industry standard, proven at scale, rich ecosystem	Requires JVM, Zookeeper/KRaft, expensive managed services ($50+/mo minimum)	❌ "Like buying a freight train to deliver a pizza"
RabbitMQ	Great AMQP support, native message acknowledgment	Requires maintaining another piece of infrastructure, additional deploy complexity	❌ Too heavy for our needs
Cloud Pub/Sub	Fully managed, auto-scaling	Vendor lock-in, $0.40/million messages adds up, cold start issues	❌ Still per-message pricing
Redis Streams	Built into Redis (which we already have), native consumer groups, XREADGROUP with BLOCK = "fake push"	Less mature ecosystem than Kafka	✅ 80% of Kafka's power, 20% of the complexity

The Verdict: Redis Streams

We chose Redis Streams because it offered the perfect trade-off for a seed-stage startup:

Zero new infrastructure: We already had Upstash Redis for caching. No new services to deploy, monitor, or maintain.
True "scale to zero": With blocking reads, we could go from 172,800 commands/day down to 1,440 (one blocked read per minute).
Production-grade primitives: Consumer groups, pending entry lists (PEL), ACK/NACK semantics—all the reliability features we needed.
Simple mental model: Just a log. Producers append (XADD), consumers read (XREADGROUP), workers acknowledge (XACK).

As one engineer put it: "It's a Honda Civic—not flashy, but it'll get you to product-market fit before you run out of runway."

3. The Architecture: "Fake Push" (Long Polling)

The secret sauce behind our 95% cost reduction is a technique called blocking reads. It's not a true push system (which would require open ports, webhooks, or WebSockets), but it behaves like one.

How Blocking Reads Work

Traditional polling:

while True:
    jobs = redis.get_jobs()  # Command #1
    if jobs:
        process(jobs)
    sleep(0.5)              # Command #2 happens 0.5s later

With Redis Streams blocking reads:

while True:
    jobs = redis.xreadgroup(BLOCK=60000)  # Wait UP TO 60s
    # This is ONE command that either:
    # - Returns immediately if data exists
    # - Waits 60s, then returns empty

The key insight: BLOCK makes Redis hold the connection open until either:

New data arrives → Returns instantly (true push-like behavior)
Timeout expires → Returns empty after 60 seconds

This means we went from 120 commands/minute to 1 command/minute during idle periods.

Sequence Diagram

┌─────────┐         ┌────────┐         ┌────────┐
│ Webhook │         │ Redis  │         │ Worker │
└────┬────┘         └───┬────┘         └───┬────┘
     │                  │                   │
     │   POST /webhook  │                   │
     │─────────────────>│                   │
     │                  │                   │
     │   XADD (push)    │                   │
     │──────────────────>│                   │
     │                  │                   │
     │   200 OK         │                   │
     │<─────────────────│                   │
     │                  │                   │
     │                  │ XREADGROUP        │
     │                  │ (BLOCK=60000)     │
     │                  │<──────────────────│
     │                  │                   │
     │                  │ [Job Data]        │
     │                  │ (instant wake)    │
     │                  │──────────────────>│
     │                  │                   │
     │                  │       XACK        │
     │                  │<──────────────────│
     │                  │                   │

What makes this elegant:

The webhook handler pushes to Redis and returns 200 OK instantly (no waiting for job completion).
The worker stays "asleep" until work arrives (no busy polling)
When work arrives, the worker wakes immediately (no 0.5s average polling delay)

4. The Implementation:

Here's how we implemented this in production.

Configuration (app/core/config.py)

# Redis Streams Settings (V3 Event-Driven Architecture)
STREAM_ENABLED: bool = True  # V3 Redis Streams is now default (95% cost savings)
STREAM_NAME: str = "flipturn:incidents"
STREAM_CONSUMER_GROUP: str = "flipturn-workers"
STREAM_CONSUMER_NAME: str | None = None  # Auto-generated if None
STREAM_BLOCK_MS: int = 60000  # 1 minute blocking read (reduces idle polling)
STREAM_MAX_LEN: int = 10000  # Prevent storage bloat
STREAM_BATCH_SIZE: int = 10  # Max jobs to fetch per XREADGROUP call

Key parameters:

STREAM_BLOCK_MS=60000: Wait up to 60 seconds for new jobs (vs polling 120x/minute)
STREAM_MAX_LEN=10000: Cap the stream size using approximate trimming (garbage collection)
STREAM_BATCH_SIZE=10: Process up to 10 jobs per batch (throughput optimization)

Producer: Enqueuing Jobs (app/services/stream_service.py)

async def enqueue_to_stream(
    self,
    signal: UniversalIncidentSignal
) -> str:
    """
    Enqueue incident signal to Redis Stream

    Returns:
        str: Stream entry ID (e.g., "1234567890123-0")
    """
    # Idempotency lock: Prevent duplicate processing
    lock_key = f"lock:{signal.deduplication_key}"
    lock_acquired = await self.redis.set(
        lock_key,
        "1",
        nx=True,  # Only set if not exists
        ex=int(self.settings.ARQ_JOB_TIMEOUT * 2)
    )

    if not lock_acquired:
        logger.warning(
            f"⚠️ Duplicate job detected: {signal.deduplication_key}"
        )
        return "duplicate"

    # Serialize signal to JSON
    payload = {
        "signal": signal.model_dump_json(),
        "deduplication_key": signal.deduplication_key,
    }

    # XADD with approximate trimming (~ means "at least")
    entry_id = await self.redis.xadd(
        name=self.stream_name,
        fields=payload,
        maxlen=self.max_len,
        approximate=True  # More efficient trimming
    )

    logger.info(
        f"✅ Enqueued to stream: {signal.incident_id} "
        f"(entry: {entry_id}, dedup: {signal.deduplication_key})"
    )

    return entry_id

Why idempotency locks matter: Zendesk (like many webhook providers) can deliver the same event multiple times. Using Redis SET NX (set if not exists) as a distributed lock ensures we never process the same ticket twice—even if we scale to multiple workers.

Consumer: Processing Jobs (app/services/stream_service.py)

async def consume_from_stream(
    self,
    consumer_name: str,
    start_id: str = ">"
) -> AsyncIterator[tuple[str, UniversalIncidentSignal]]:
    """
    Consume jobs from Redis Stream using consumer group

    This is a blocking operation that waits up to STREAM_BLOCK_MS for new jobs.
    Uses XREADGROUP which automatically handles:
    - Job distribution across multiple consumers
    - Pending entry list (PEL) for crash recovery
    """
    logger.info(
        f"🔄 Starting stream consumer '{consumer_name}' "
        f"(group: {self.consumer_group}, stream: {self.stream_name})"
    )

    while True:
        try:
            # XREADGROUP with BLOCK - this is the magic that eliminates polling
            # Will wait up to STREAM_BLOCK_MS (60 seconds) for new data
            response = await self.redis.xreadgroup(
                groupname=self.consumer_group,
                consumername=consumer_name,
                streams={self.stream_name: start_id},
                count=self.batch_size,
                block=self.block_ms
            )

            # No data - this is normal (timeout after BLOCK period)
            if not response:
                logger.debug("No new jobs (timeout)")
                continue

            # Parse response: [(stream_name, [(entry_id, fields)])]
            for stream_name, entries in response:
                for entry_id, fields in entries:
                    # Decode fields
                    signal_json = fields.get(b"signal") or fields.get("signal")

                    if isinstance(signal_json, bytes):
                        signal_json = signal_json.decode("utf-8")

                    # Deserialize signal
                    signal = UniversalIncidentSignal.model_validate_json(signal_json)

                    logger.info(f"📥 Received job: {signal.incident_id}")

                    yield (entry_id, signal) # Yield to the worker loop for processing (maintaining separation of concerns)

        except asyncio.CancelledError:
            logger.info("🛑 Stream consumer cancelled")
            break
        except Exception as e:
            logger.error(f"❌ Error consuming from stream: {e}", exc_info=True)
            await asyncio.sleep(5)  # Back off on errors

What makes this production-grade:

Consumer groups: Multiple workers can subscribe to the same stream, and Redis automatically distributes jobs (round-robin load balancing).
Pending Entry List (PEL): If a worker crashes mid-processing, the job stays in the PEL and can be claimed by another worker.
Graceful shutdown: asyncio.CancelledError handling ensures we don't drop in-flight jobs during deploys.

Worker Loop (app/core/stream_worker.py)

async def run(self) -> None:
    """Main worker loop - consume and process jobs"""
    self.running = True

    try:
        async for entry_id, signal in self.stream_service.consume_from_stream(
            consumer_name=self.consumer_name
        ):
            if not self.running:
                break

            try:
                # Process the job (PII scrubbing + AI analysis)
                result = await self.process_incident(signal)

                # ACK on success
                if result["status"] in ("success", "skipped"):
                    await self.stream_service.ack_job(entry_id)
                    await self.stream_service.release_lock(signal.deduplication_key)
                else:
                    # NACK on failure (leaves in PEL for retry)
                    await self.stream_service.nack_job(entry_id, self.consumer_name)

            except Exception as e:
                logger.error(f"❌ Fatal error processing {entry_id}: {e}")
                await self.stream_service.nack_job(entry_id, self.consumer_name)

    except asyncio.CancelledError:
        logger.info("🛑 Worker loop cancelled")

ACK/NACK semantics:

XACK: Remove job from PEL (successful processing)
NACK: Leave job in PEL (failed processing, can be retried)

This is how we ensure at-least-once delivery: if a worker crashes mid-job, another worker can claim the unacknowledged entry from the PEL.

5. Deployment & Observability: "The Invisible Open-Core"

Render Deployment with Procfile

We run Flipturn on Render using a simple Procfile that splits the API and worker into separate processes:

web: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
worker: python -m app.core.stream_worker

Why this works:

web process: FastAPI app that receives webhooks and pushes to Redis Streams
worker process: Long-lived consumer that blocks on XREADGROUP
Both share the same codebase and configuration (injected via environment variables)

The "Ghost" in the Machine: Consumer Names

One challenge we hit early: when we deployed multiple workers, we couldn't tell which worker was processing which job. The solution was STREAM_CONSUMER_NAME:

def _generate_consumer_name(self) -> str:
    """Generate unique consumer name for this worker"""
    hostname = socket.gethostname()
    if self.settings.STREAM_CONSUMER_NAME:
        return self.settings.STREAM_CONSUMER_NAME
    return f"{hostname}-{id(self)}"

This shows up in Redis as:

XINFO CONSUMERS flipturn:incidents flipturn-workers
1) name: "render-worker-1-abc123"
   pending: 2
   idle: 1500

This helps us debug "zombie consumers"—workers that crashed but still have pending jobs in the PEL.

Observability: Queue Health Endpoint

We built a /webhooks/queue-health endpoint that exposes real-time metrics (app/api/v1/endpoints/webhooks.py):

@router.get("/queue-health")
async def queue_health(
    settings: Annotated[Settings, Depends(get_settings)],
    stream_service: Annotated[StreamService | None, Depends(get_stream_service)]
):
    """Report queue health metrics"""
    stream_stats = await stream_service.get_stream_info()
    pending_stats = await stream_service.get_pending_info()

    return {
        "status": "healthy",
        "mode": "streams",
        "stream_name": settings.STREAM_NAME,
        "consumer_group": settings.STREAM_CONSUMER_GROUP,
        "stream_length": stream_stats.get("length", 0),
        "pending_messages": pending_stats.get("pending", 0),
        "consumers": pending_stats.get("consumers", 0),
        "last_delivered_id": stream_stats.get("last-generated-id", "0-0"),
        "block_ms": settings.STREAM_BLOCK_MS
    }

What we monitor:

stream_length: Total messages in stream (should be near zero after processing)
pending_messages: Jobs in Pending Entry List (unacknowledged work)
consumers: Active worker count
last_delivered_id: Most recent job ID (helps detect stalls)

This single endpoint gives us confidence that the "invisible" event bus is running smoothly.

6. The Impact: Startup Survival

Cost Savings: The Numbers

Before (Arq Polling):

172,800 commands/day per worker
At Upstash pricing (~$0.002 per 1,000 commands): $0.35/day = $127/year per worker

After (Redis Streams with BLOCK=60000):

1,440 commands/day per worker (one blocked read per minute)
At Upstash pricing: $0.003/day = $1.10/year per worker

Savings: 95% reduction in idle Redis costs.

For a two-worker deployment, that's $250/year saved. That might not sound like much, but for a seed-stage startup:

It's 20 hours of founder time (at $125/hr effective rate)
It's 2-3 months of Render hosting
It's peace of mind that we're not burning cash on "silence"

Performance: Latency Improvements

Before (Polling every 0.5s):

Average latency from webhook → worker pickup: ~250ms (half the poll interval)
Worst case: 500ms

After (Blocking reads):

Average latency: <50ms (network latency only)
Worst case: Still <50ms

The agent now feels "instant" because there's no artificial delay waiting for the next poll.

Simplicity: No New Infrastructure

The best part? We didn't add a single new service. No Kafka cluster to maintain. No RabbitMQ to monitor. No separate message broker to secure and patch.

We just used Redis differently.

This is the kind of leverage that matters at the seed stage: get more value from infrastructure you already have, rather than adding complexity.

7. What We Learned (The Optimizations)

1. Don't Set BLOCK Too High (The "Silent Drop" Risk)

We initially considered setting STREAM_BLOCK_MS to 5 minutes (300,000ms) to push costs even lower. Bad idea.

Problem: Render's load balancer (like AWS ALB) has a 60-second idle timeout. If a connection blocks for 5 minutes, the load balancer will silently kill it, and the worker won't know the connection is dead until it tries to read—leading to phantom workers that think they're alive but aren't receiving jobs.

Solution: We settled on 60 seconds (just under the timeout) as a safe default. Workers wake up once per minute, confirm the connection is alive, and block again if no data exists.

2. Idempotency is Non-Negotiable

Webhook providers (Zendesk, Stripe, GitHub) all have the same caveat in their docs: "We may deliver the same webhook multiple times."

Using Redis locks (SET NX) as a distributed deduplication mechanism ensures that even if Zendesk sends the same ticket twice, we only process it once:

lock_key = f"lock:{signal.deduplication_key}"
lock_acquired = await self.redis.set(lock_key, "1", nx=True, ex=600)

if not lock_acquired:
    return "duplicate"

Lesson learned: Idempotency should be baked into the producer (enqueue), not the consumer (worker). By the time the worker picks up a job, it's already committed to processing it—detecting duplicates at that stage is too late.

3. Monitoring the PEL (Pending Entry List) is Critical

The Pending Entry List is Redis Streams' equivalent of a "dead letter queue." Jobs that fail or time out end up in the PEL, where they sit until:

Another worker claims them via XCLAIM
An admin manually acknowledges them via XACK
They expire (which requires custom cleanup logic)

What we monitor:

redis-cli XPENDING flipturn:incidents flipturn-workers
# Output:
# 1) (integer) 3              ← 3 pending jobs
# 2) "1706245123456-0"        ← Oldest pending ID
# 3) "1706245678901-0"        ← Newest pending ID
# 4) (consumer, count) pairs

If pending_messages grows unbounded, it means workers are crashing mid-job. We alert on pending_messages > 10.

4. The MAXLEN Parameter Prevents Storage Bloat

Redis Streams append-only logs can grow unbounded if you're not careful. We use maxlen=10000 with approximate trimming:

await self.redis.xadd(
    name=self.stream_name,
    fields=payload,
    maxlen=10000,
    approximate=True  # ~10,000, not exactly 10,000
)

Why approximate=True? Redis uses this as a hint: "Keep the stream around 10,000 entries, but don't trim on every write." This is more performant because trimming is expensive (it requires a full scan of the stream to find old entries).

For our use case (low-latency event processing), jobs are ACK'd and removed within seconds, so the stream rarely exceeds 50 entries in practice. But the maxlen cap ensures that if something goes wrong (workers crash, PEL fills up), we don't run out of Redis memory.

8. Future-Proofing: The "Universal Event Bus"

The beautiful thing about Redis Streams? It's not just a job queue—it's a universal event bus.

Now that we have this foundation, we can ingest events from multiple sources without changing the worker:

Proactive Ingestion Pipelines (Roadmap)

[Old World]                    [New World]
Zendesk Webhook Only  →  Universal Event Bus

                           ┌──> Zendesk Webhooks
                           │
Zendesk → Worker      →    ├──> Datadog Log Alerts
                           │
                           ├──> Slack Events
                           │
                           └──> Cron Jobs (Proactive Scans)
                                      ↓
                                Redis Streams
                                      ↓
                                  Workers
                                      ↓
                              [Same Brain Logic]

Example: Datadog Webhook Integration

Instead of waiting for a human to file a Zendesk ticket, we can ingest Datadog monitor alerts directly:

@router.post("/webhooks/datadog")
async def datadog_webhook(payload: DatadogAlertPayload):
    signal = UniversalIncidentSignal(
        incident_id=f"dd_alert_{payload.alert_id}",
        source=IncidentSource.DATADOG,
        severity=IncidentSeverity.HIGH,
        description=payload.alert_message
    )
    await stream_service.enqueue_to_stream(signal)
    return {"status": "enqueued"}

The worker doesn't need to change—it already knows how to process UniversalIncidentSignal objects. We just added a new ingestion point.

Why This Matters

This architecture lets us transition from reactive support (waiting for tickets) to proactive support (AI agents that wake up when logs show errors, before customers notice).

And we didn't need Kafka. We didn't need RabbitMQ. We just needed Redis—which we already had.

Conclusion: The Honda Civic Principle

When you're a seed-stage startup, the best architecture isn't the one with the most stars on GitHub or the flashiest tech talk at a conference. It's the one that:

Solves the problem (event-driven job processing)
Minimizes cost (95% reduction in idle infrastructure spend)
Reduces complexity (no new services to deploy or maintain)
Ships fast (implemented in 3 days, including tests)

Redis Streams gave us all four.

We didn't build a "perfect" event bus. We built a good enough event bus that helps us on the path to product-market fit before we ran out of runway. And when we outgrow it—when we're processing millions of tickets per day and need Kafka's horizontal scaling—we'll have the revenue to justify that complexity.

Until then, we're driving the Honda Civic. And it's getting us exactly where we need to go.

Appendix: Key Takeaways for Engineers

If you're building a similar system, here are the TL;DR lessons:

Polling is expensive on serverless: If you pay per command (Upstash, AWS Lambda), blocking reads > polling.
Redis Streams ≠ Kafka-lite: It's simpler, but lacks partitioning and compaction. Good for <100k msgs/sec.
BLOCK parameter is magic: XREADGROUP with BLOCK=60000 turns Redis into a push system.
Idempotency at enqueue-time: Use SET NX locks to deduplicate before adding to the stream.
Monitor the PEL: Pending Entry List growth = workers crashing mid-job. Alert on it.
MAXLEN with approximate=True: Prevents storage bloat without performance hit.
Consumer names matter: Use unique IDs (hostname + PID) to debug which worker owns which job.

An Invitation

With AI-driven “vibe coding,” teams are shipping faster than ever, but maintenance and SRE aren’t keeping pace. We’re already seeing this in alert fatigue, messy incident triage, and slower RCA.

I’m Suvro, founder of Flipturn. I’m rethinking SRE for this new reality and would love to learn from your experience. If you’re open, I’d appreciate 30 minutes to understand the challenges you’re facing and how you wish they were solved. I’m committed to partnering closely with you to build this right.

Want to eliminate incident firefighting?

Join teams using Flipturn for autonomous root cause analysis.

Request Access

ArchitectureEvent BusObservabilityInfrastructureCost Optimization

From Polling to Pushing: How Flipturn Built a Cost-Effective Event Bus

Suvro Banerjee

February 5, 2026

16 min read

1. Introduction: The "Silent Bill" of Polling

The Problem: Paying for Silence

Let's do the math:

Polls per minute: 120
Commands per hour: 7,200
Commands per day: 172,800
Commands per month: 5.2 million

This is what we call the "Silent Bill": infrastructure costs incurred when nothing is happening.

The Realization

We needed a system that could:

Wake instantly when a webhook arrived (low latency, no polling delays)
Sleep completely when idle (zero cost during silence)
Scale to zero without sacrificing reliability

The goal wasn't just cost savings—it was architectural alignment. Our agent should only "think" when there's something to think about.

2. The Decision: Why Redis Streams? (The "Honda Civic" Defense)

When evaluating event bus options, we looked at the usual suspects:

Option	Pros	Cons	Verdict
Kafka	Industry standard, proven at scale, rich ecosystem	Requires JVM, Zookeeper/KRaft, expensive managed services ($50+/mo minimum)	❌ "Like buying a freight train to deliver a pizza"
RabbitMQ	Great AMQP support, native message acknowledgment	Requires maintaining another piece of infrastructure, additional deploy complexity	❌ Too heavy for our needs
Cloud Pub/Sub	Fully managed, auto-scaling	Vendor lock-in, $0.40/million messages adds up, cold start issues	❌ Still per-message pricing
Redis Streams	Built into Redis (which we already have), native consumer groups, XREADGROUP with BLOCK = "fake push"	Less mature ecosystem than Kafka	✅ 80% of Kafka's power, 20% of the complexity

The Verdict: Redis Streams

We chose Redis Streams because it offered the perfect trade-off for a seed-stage startup:

Zero new infrastructure: We already had Upstash Redis for caching. No new services to deploy, monitor, or maintain.
True "scale to zero": With blocking reads, we could go from 172,800 commands/day down to 1,440 (one blocked read per minute).
Production-grade primitives: Consumer groups, pending entry lists (PEL), ACK/NACK semantics—all the reliability features we needed.
Simple mental model: Just a log. Producers append (XADD), consumers read (XREADGROUP), workers acknowledge (XACK).

As one engineer put it: "It's a Honda Civic—not flashy, but it'll get you to product-market fit before you run out of runway."

3. The Architecture: "Fake Push" (Long Polling)

How Blocking Reads Work

Traditional polling:

while True:
    jobs = redis.get_jobs()  # Command #1
    if jobs:
        process(jobs)
    sleep(0.5)              # Command #2 happens 0.5s later

With Redis Streams blocking reads:

while True:
    jobs = redis.xreadgroup(BLOCK=60000)  # Wait UP TO 60s
    # This is ONE command that either:
    # - Returns immediately if data exists
    # - Waits 60s, then returns empty

The key insight: BLOCK makes Redis hold the connection open until either:

New data arrives → Returns instantly (true push-like behavior)
Timeout expires → Returns empty after 60 seconds

This means we went from 120 commands/minute to 1 command/minute during idle periods.

Sequence Diagram

┌─────────┐         ┌────────┐         ┌────────┐
│ Webhook │         │ Redis  │         │ Worker │
└────┬────┘         └───┬────┘         └───┬────┘
     │                  │                   │
     │   POST /webhook  │                   │
     │─────────────────>│                   │
     │                  │                   │
     │   XADD (push)    │                   │
     │──────────────────>│                   │
     │                  │                   │
     │   200 OK         │                   │
     │<─────────────────│                   │
     │                  │                   │
     │                  │ XREADGROUP        │
     │                  │ (BLOCK=60000)     │
     │                  │<──────────────────│
     │                  │                   │
     │                  │ [Job Data]        │
     │                  │ (instant wake)    │
     │                  │──────────────────>│
     │                  │                   │
     │                  │       XACK        │
     │                  │<──────────────────│
     │                  │                   │

What makes this elegant:

The webhook handler pushes to Redis and returns 200 OK instantly (no waiting for job completion).
The worker stays "asleep" until work arrives (no busy polling)
When work arrives, the worker wakes immediately (no 0.5s average polling delay)

4. The Implementation:

Here's how we implemented this in production.

Configuration (app/core/config.py)

# Redis Streams Settings (V3 Event-Driven Architecture)
STREAM_ENABLED: bool = True  # V3 Redis Streams is now default (95% cost savings)
STREAM_NAME: str = "flipturn:incidents"
STREAM_CONSUMER_GROUP: str = "flipturn-workers"
STREAM_CONSUMER_NAME: str | None = None  # Auto-generated if None
STREAM_BLOCK_MS: int = 60000  # 1 minute blocking read (reduces idle polling)
STREAM_MAX_LEN: int = 10000  # Prevent storage bloat
STREAM_BATCH_SIZE: int = 10  # Max jobs to fetch per XREADGROUP call

Key parameters:

STREAM_BLOCK_MS=60000: Wait up to 60 seconds for new jobs (vs polling 120x/minute)
STREAM_MAX_LEN=10000: Cap the stream size using approximate trimming (garbage collection)
STREAM_BATCH_SIZE=10: Process up to 10 jobs per batch (throughput optimization)

Producer: Enqueuing Jobs (app/services/stream_service.py)

async def enqueue_to_stream(
    self,
    signal: UniversalIncidentSignal
) -> str:
    """
    Enqueue incident signal to Redis Stream

    Returns:
        str: Stream entry ID (e.g., "1234567890123-0")
    """
    # Idempotency lock: Prevent duplicate processing
    lock_key = f"lock:{signal.deduplication_key}"
    lock_acquired = await self.redis.set(
        lock_key,
        "1",
        nx=True,  # Only set if not exists
        ex=int(self.settings.ARQ_JOB_TIMEOUT * 2)
    )

    if not lock_acquired:
        logger.warning(
            f"⚠️ Duplicate job detected: {signal.deduplication_key}"
        )
        return "duplicate"

    # Serialize signal to JSON
    payload = {
        "signal": signal.model_dump_json(),
        "deduplication_key": signal.deduplication_key,
    }

    # XADD with approximate trimming (~ means "at least")
    entry_id = await self.redis.xadd(
        name=self.stream_name,
        fields=payload,
        maxlen=self.max_len,
        approximate=True  # More efficient trimming
    )

    logger.info(
        f"✅ Enqueued to stream: {signal.incident_id} "
        f"(entry: {entry_id}, dedup: {signal.deduplication_key})"
    )

    return entry_id

Consumer: Processing Jobs (app/services/stream_service.py)

async def consume_from_stream(
    self,
    consumer_name: str,
    start_id: str = ">"
) -> AsyncIterator[tuple[str, UniversalIncidentSignal]]:
    """
    Consume jobs from Redis Stream using consumer group

    This is a blocking operation that waits up to STREAM_BLOCK_MS for new jobs.
    Uses XREADGROUP which automatically handles:
    - Job distribution across multiple consumers
    - Pending entry list (PEL) for crash recovery
    """
    logger.info(
        f"🔄 Starting stream consumer '{consumer_name}' "
        f"(group: {self.consumer_group}, stream: {self.stream_name})"
    )

    while True:
        try:
            # XREADGROUP with BLOCK - this is the magic that eliminates polling
            # Will wait up to STREAM_BLOCK_MS (60 seconds) for new data
            response = await self.redis.xreadgroup(
                groupname=self.consumer_group,
                consumername=consumer_name,
                streams={self.stream_name: start_id},
                count=self.batch_size,
                block=self.block_ms
            )

            # No data - this is normal (timeout after BLOCK period)
            if not response:
                logger.debug("No new jobs (timeout)")
                continue

            # Parse response: [(stream_name, [(entry_id, fields)])]
            for stream_name, entries in response:
                for entry_id, fields in entries:
                    # Decode fields
                    signal_json = fields.get(b"signal") or fields.get("signal")

                    if isinstance(signal_json, bytes):
                        signal_json = signal_json.decode("utf-8")

                    # Deserialize signal
                    signal = UniversalIncidentSignal.model_validate_json(signal_json)

                    logger.info(f"📥 Received job: {signal.incident_id}")

                    yield (entry_id, signal) # Yield to the worker loop for processing (maintaining separation of concerns)

        except asyncio.CancelledError:
            logger.info("🛑 Stream consumer cancelled")
            break
        except Exception as e:
            logger.error(f"❌ Error consuming from stream: {e}", exc_info=True)
            await asyncio.sleep(5)  # Back off on errors

What makes this production-grade:

Consumer groups: Multiple workers can subscribe to the same stream, and Redis automatically distributes jobs (round-robin load balancing).
Pending Entry List (PEL): If a worker crashes mid-processing, the job stays in the PEL and can be claimed by another worker.
Graceful shutdown: asyncio.CancelledError handling ensures we don't drop in-flight jobs during deploys.

Worker Loop (app/core/stream_worker.py)

async def run(self) -> None:
    """Main worker loop - consume and process jobs"""
    self.running = True

    try:
        async for entry_id, signal in self.stream_service.consume_from_stream(
            consumer_name=self.consumer_name
        ):
            if not self.running:
                break

            try:
                # Process the job (PII scrubbing + AI analysis)
                result = await self.process_incident(signal)

                # ACK on success
                if result["status"] in ("success", "skipped"):
                    await self.stream_service.ack_job(entry_id)
                    await self.stream_service.release_lock(signal.deduplication_key)
                else:
                    # NACK on failure (leaves in PEL for retry)
                    await self.stream_service.nack_job(entry_id, self.consumer_name)

            except Exception as e:
                logger.error(f"❌ Fatal error processing {entry_id}: {e}")
                await self.stream_service.nack_job(entry_id, self.consumer_name)

    except asyncio.CancelledError:
        logger.info("🛑 Worker loop cancelled")

ACK/NACK semantics:

XACK: Remove job from PEL (successful processing)
NACK: Leave job in PEL (failed processing, can be retried)

This is how we ensure at-least-once delivery: if a worker crashes mid-job, another worker can claim the unacknowledged entry from the PEL.

5. Deployment & Observability: "The Invisible Open-Core"

Render Deployment with Procfile

We run Flipturn on Render using a simple Procfile that splits the API and worker into separate processes:

web: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
worker: python -m app.core.stream_worker

Why this works:

web process: FastAPI app that receives webhooks and pushes to Redis Streams
worker process: Long-lived consumer that blocks on XREADGROUP
Both share the same codebase and configuration (injected via environment variables)

The "Ghost" in the Machine: Consumer Names

One challenge we hit early: when we deployed multiple workers, we couldn't tell which worker was processing which job. The solution was STREAM_CONSUMER_NAME:

def _generate_consumer_name(self) -> str:
    """Generate unique consumer name for this worker"""
    hostname = socket.gethostname()
    if self.settings.STREAM_CONSUMER_NAME:
        return self.settings.STREAM_CONSUMER_NAME
    return f"{hostname}-{id(self)}"

This shows up in Redis as:

XINFO CONSUMERS flipturn:incidents flipturn-workers
1) name: "render-worker-1-abc123"
   pending: 2
   idle: 1500

This helps us debug "zombie consumers"—workers that crashed but still have pending jobs in the PEL.

Observability: Queue Health Endpoint

We built a /webhooks/queue-health endpoint that exposes real-time metrics (app/api/v1/endpoints/webhooks.py):

@router.get("/queue-health")
async def queue_health(
    settings: Annotated[Settings, Depends(get_settings)],
    stream_service: Annotated[StreamService | None, Depends(get_stream_service)]
):
    """Report queue health metrics"""
    stream_stats = await stream_service.get_stream_info()
    pending_stats = await stream_service.get_pending_info()

    return {
        "status": "healthy",
        "mode": "streams",
        "stream_name": settings.STREAM_NAME,
        "consumer_group": settings.STREAM_CONSUMER_GROUP,
        "stream_length": stream_stats.get("length", 0),
        "pending_messages": pending_stats.get("pending", 0),
        "consumers": pending_stats.get("consumers", 0),
        "last_delivered_id": stream_stats.get("last-generated-id", "0-0"),
        "block_ms": settings.STREAM_BLOCK_MS
    }

What we monitor:

stream_length: Total messages in stream (should be near zero after processing)
pending_messages: Jobs in Pending Entry List (unacknowledged work)
consumers: Active worker count
last_delivered_id: Most recent job ID (helps detect stalls)

This single endpoint gives us confidence that the "invisible" event bus is running smoothly.

6. The Impact: Startup Survival

Cost Savings: The Numbers

Before (Arq Polling):

172,800 commands/day per worker
At Upstash pricing (~$0.002 per 1,000 commands): $0.35/day = $127/year per worker

After (Redis Streams with BLOCK=60000):

1,440 commands/day per worker (one blocked read per minute)
At Upstash pricing: $0.003/day = $1.10/year per worker

Savings: 95% reduction in idle Redis costs.

For a two-worker deployment, that's $250/year saved. That might not sound like much, but for a seed-stage startup:

It's 20 hours of founder time (at $125/hr effective rate)
It's 2-3 months of Render hosting
It's peace of mind that we're not burning cash on "silence"

Performance: Latency Improvements

Before (Polling every 0.5s):

Average latency from webhook → worker pickup: ~250ms (half the poll interval)
Worst case: 500ms

After (Blocking reads):

Average latency: <50ms (network latency only)
Worst case: Still <50ms

The agent now feels "instant" because there's no artificial delay waiting for the next poll.

Simplicity: No New Infrastructure

The best part? We didn't add a single new service. No Kafka cluster to maintain. No RabbitMQ to monitor. No separate message broker to secure and patch.

We just used Redis differently.

This is the kind of leverage that matters at the seed stage: get more value from infrastructure you already have, rather than adding complexity.

7. What We Learned (The Optimizations)

1. Don't Set BLOCK Too High (The "Silent Drop" Risk)

We initially considered setting STREAM_BLOCK_MS to 5 minutes (300,000ms) to push costs even lower. Bad idea.

Solution: We settled on 60 seconds (just under the timeout) as a safe default. Workers wake up once per minute, confirm the connection is alive, and block again if no data exists.

2. Idempotency is Non-Negotiable

Webhook providers (Zendesk, Stripe, GitHub) all have the same caveat in their docs: "We may deliver the same webhook multiple times."

Using Redis locks (SET NX) as a distributed deduplication mechanism ensures that even if Zendesk sends the same ticket twice, we only process it once:

lock_key = f"lock:{signal.deduplication_key}"
lock_acquired = await self.redis.set(lock_key, "1", nx=True, ex=600)

if not lock_acquired:
    return "duplicate"

3. Monitoring the PEL (Pending Entry List) is Critical

The Pending Entry List is Redis Streams' equivalent of a "dead letter queue." Jobs that fail or time out end up in the PEL, where they sit until:

Another worker claims them via XCLAIM
An admin manually acknowledges them via XACK
They expire (which requires custom cleanup logic)

What we monitor:

redis-cli XPENDING flipturn:incidents flipturn-workers
# Output:
# 1) (integer) 3              ← 3 pending jobs
# 2) "1706245123456-0"        ← Oldest pending ID
# 3) "1706245678901-0"        ← Newest pending ID
# 4) (consumer, count) pairs

If pending_messages grows unbounded, it means workers are crashing mid-job. We alert on pending_messages > 10.

4. The MAXLEN Parameter Prevents Storage Bloat

Redis Streams append-only logs can grow unbounded if you're not careful. We use maxlen=10000 with approximate trimming:

await self.redis.xadd(
    name=self.stream_name,
    fields=payload,
    maxlen=10000,
    approximate=True  # ~10,000, not exactly 10,000
)

8. Future-Proofing: The "Universal Event Bus"

The beautiful thing about Redis Streams? It's not just a job queue—it's a universal event bus.

Now that we have this foundation, we can ingest events from multiple sources without changing the worker:

Proactive Ingestion Pipelines (Roadmap)

[Old World]                    [New World]
Zendesk Webhook Only  →  Universal Event Bus

                           ┌──> Zendesk Webhooks
                           │
Zendesk → Worker      →    ├──> Datadog Log Alerts
                           │
                           ├──> Slack Events
                           │
                           └──> Cron Jobs (Proactive Scans)
                                      ↓
                                Redis Streams
                                      ↓
                                  Workers
                                      ↓
                              [Same Brain Logic]

Example: Datadog Webhook Integration

Instead of waiting for a human to file a Zendesk ticket, we can ingest Datadog monitor alerts directly:

@router.post("/webhooks/datadog")
async def datadog_webhook(payload: DatadogAlertPayload):
    signal = UniversalIncidentSignal(
        incident_id=f"dd_alert_{payload.alert_id}",
        source=IncidentSource.DATADOG,
        severity=IncidentSeverity.HIGH,
        description=payload.alert_message
    )
    await stream_service.enqueue_to_stream(signal)
    return {"status": "enqueued"}

The worker doesn't need to change—it already knows how to process UniversalIncidentSignal objects. We just added a new ingestion point.

Why This Matters

This architecture lets us transition from reactive support (waiting for tickets) to proactive support (AI agents that wake up when logs show errors, before customers notice).

And we didn't need Kafka. We didn't need RabbitMQ. We just needed Redis—which we already had.

Conclusion: The Honda Civic Principle

When you're a seed-stage startup, the best architecture isn't the one with the most stars on GitHub or the flashiest tech talk at a conference. It's the one that:

Solves the problem (event-driven job processing)
Minimizes cost (95% reduction in idle infrastructure spend)
Reduces complexity (no new services to deploy or maintain)
Ships fast (implemented in 3 days, including tests)

Redis Streams gave us all four.

Until then, we're driving the Honda Civic. And it's getting us exactly where we need to go.

Appendix: Key Takeaways for Engineers

If you're building a similar system, here are the TL;DR lessons:

Polling is expensive on serverless: If you pay per command (Upstash, AWS Lambda), blocking reads > polling.
Redis Streams ≠ Kafka-lite: It's simpler, but lacks partitioning and compaction. Good for <100k msgs/sec.
BLOCK parameter is magic: XREADGROUP with BLOCK=60000 turns Redis into a push system.
Idempotency at enqueue-time: Use SET NX locks to deduplicate before adding to the stream.
Monitor the PEL: Pending Entry List growth = workers crashing mid-job. Alert on it.
MAXLEN with approximate=True: Prevents storage bloat without performance hit.
Consumer names matter: Use unique IDs (hostname + PID) to debug which worker owns which job.

An Invitation

Want to eliminate incident firefighting?

Join teams using Flipturn for autonomous root cause analysis.

Request Access