From Two Backends to Any Backend: What the Architecture Enables Next
A Different Kind of Question
The first four posts in this series were about building — designing the abstraction, proving the migration, implementing the LGTM stack. By the end of Milestone 2, two entirely different observability stacks produce indistinguishable evidence for the reasoning layer. The architecture works.
That shifts the question.
Before the provider abstraction existed, every conversation about expanding Flipturn's reach started with the same obstacle: "We'd have to rewrite the integration layer." That blocked everything. Multi-tenancy. New backends. Context providers. Remediation. Each depended on first solving the backend coupling problem.
That problem is solved.
So now: what becomes possible?
1. The Stub That Will Become the Core
Inside app/providers/config.py, there is a method that currently does almost nothing:
@classmethod
def for_tenant(cls, tenant_id: str) -> "BackendConfig":
"""Future: per-customer config from config store."""
return cls.from_env() # Phase 1 fallback
BackendConfig.for_tenant() is a stub. In its current form, it ignores tenant_id entirely and falls back to reading the same environment variables for every request. That is the right behavior for a single-tenant deployment — one Flipturn instance, one observability stack.
For a multi-tenant SaaS product, it is the most important method in the codebase.
When for_tenant() reads from a config store instead of environment variables, every part of the stack that depends on BackendConfig becomes tenant-aware without modification. bootstrap_registry() takes a BackendConfig. build_tools_from_registry() takes the resulting ProviderRegistry. The LangGraph tools execute against whichever providers that registry contains. None of those layers need to know about tenancy — they just consume what the registry gives them.
The entire multi-tenant model flows from implementing this one method correctly.
@classmethod
def for_tenant(cls, tenant_id: str) -> "BackendConfig":
# Milestone 3: read from config store
config = config_store.get(tenant_id) # Postgres, Redis, or secrets manager
return cls(backends=config.backends)
The stream worker that processes each incident already carries a tenant identifier from the incident signal. Threading that identifier into BackendConfig.for_tenant() is the mechanical step. The hard part — the architecture that makes it safe — is already done.
2. Adding a New Backend Takes One New Package
With the registry and bootstrap factory in place, adding a new observability backend is a contained, three-file operation.
Consider Jaeger — widely used for distributed tracing in organizations that chose the CNCF stack before Tempo was mature. Jaeger exposes a query API at /api/traces and /api/services. A JaegerProvider implementing TraceProvider would follow the exact same pattern as TempoProvider:
app/providers/jaeger/
├── __init__.py
├── provider.py # JaegerProvider — TraceProvider protocol
├── translator.py # CanonicalTraceQuery → Jaeger query params
└── normalizer.py # Jaeger span response → NormalizedSpan
The translator maps CanonicalTraceQuery to Jaeger's HTTP query parameters: service, operation, tags, minDuration, maxDuration. The normalizer converts Jaeger's span model — which uses traceID, spanID, operationName, duration in microseconds — into NormalizedSpan with duration_ms. The provider implements fetch_traces() and fetch_trace_by_id().
Then, in default_provider_factories():
def default_provider_factories() -> dict[str, ProviderFactory]:
return {
"datadog": lambda backend: DatadogProvider(),
"loki": lambda backend: LokiProvider(backend),
"tempo": lambda backend: TempoProvider(backend),
"jaeger": lambda backend: JaegerProvider(backend), # new
"prometheus": lambda backend: PrometheusProvider(backend),
"sentry": lambda backend: SentryProvider(),
}
And in BackendConfig.from_env():
jaeger_url = os.getenv("JAEGER_URL")
if jaeger_url:
backends.append(BackendCredentials(
type="jaeger",
credentials={"url": jaeger_url},
))
Nothing else changes. The registry picks up JaegerProvider via _register_by_capability(), which calls isinstance(provider, TraceProvider). The tool factory generates a fetch_traces tool backed by Jaeger. The formatter produces the same text format the LLM has always read. The evidence planner sends the same CanonicalTraceQuery it always has.
A customer running Jaeger for traces and Loki for logs gets a complete RCA with traces from Jaeger and logs from Loki. No code above app/providers/ knows or cares.
The same pattern works for Elastic, CloudWatch, New Relic, Honeycomb — each as a new package, each behind the same four protocols.
3. Context Providers: Evidence Beyond Observability
The protocol pattern solves the observability backend problem. But root cause analysis does not live in observability data alone.
An error spike in the search service is interesting. An error spike in the search service at 14:23 UTC — two minutes after a deployment of search-service v2.4.1 landed via ArgoCD — is explained. The deployment is the root cause. Flipturn could tell you that today if it had access to the deployment timeline.
The architecture to support this already exists. It requires extending the protocol family beyond the four observability signals.
The natural extension is a new set of provider protocols for context signals:
@runtime_checkable
class DeploymentProvider(Protocol):
provider_name: str
def fetch_recent_deployments(
self, service: str | None, time_range: TimeRange
) -> list[NormalizedDeploymentEvent]: ...
@runtime_checkable
class TicketProvider(Protocol):
provider_name: str
def fetch_related_issues(
self, query: CanonicalIssueQuery, time_range: TimeRange
) -> list[NormalizedTicket]: ...
@runtime_checkable
class AlertProvider(Protocol):
provider_name: str
def fetch_firing_alerts(
self, service: str | None, time_range: TimeRange
) -> list[NormalizedAlert]: ...
These protocols follow identical conventions to LogProvider and TraceProvider. NormalizedDeploymentEvent carries timestamp, service, version, status, author, commit_sha, source_provider. NormalizedTicket carries id, title, status, priority, url, source_provider. All with source_provider so the formatter can label them correctly.
The ProviderRegistry gets three new lists. _register_by_capability() gets three new isinstance checks — again, not elif, so a single GitHubProvider implementing both DeploymentProvider and TicketProvider registers in both lists in one call. The tool factory generates fetch_deployments, fetch_alerts, fetch_related_tickets tools by the same logic that generates fetch_logs and fetch_traces today.
BackendConfig gains GITHUB_TOKEN, ARGOCD_URL, JIRA_URL. The bootstrap factory gains entries for "github", "argocd", "jira", "pagerduty", "kubernetes".
The evidence planner already computes a QueryPlan list per incident. Adding context plans alongside observability plans is a matter of including them in the planning pass:
# In evidence_planner.py — alongside existing log/trace/metric plans
plans.append(QueryPlan(
source="github_deployments",
signal_type="deployments",
canonical_query=CanonicalDeploymentQuery(service=service),
priority=3,
))
The LLM receives deployment events, recent alerts, and related Jira tickets formatted as evidence — the same LOG_1, TRACE_3 evidence reference pattern it already uses, extended with DEPLOY_1, ALERT_2, TICKET_1. The reasoning prompt does not change structurally. The agent correlates deployment timestamps with error spikes the same way it correlates trace IDs with log entries.
An engineer whose mental model of RCA is "look at logs and traces and check if there was a deploy" now has a system that does exactly that — and the system architecture reflects that mental model directly.
4. Kubernetes as a First-Class Context Provider
Kubernetes deserves specific attention because it sits at a different layer than the others.
Jira and PagerDuty are external SaaS services — HTTP APIs behind credentials, integrated the same way Datadog and Sentry are. Kubernetes is local infrastructure. The Kubernetes API server runs inside the cluster. Pod logs, events, resource state, and node conditions are all accessible via the in-cluster service account without external credentials.
A KubernetesProvider implementing DeploymentProvider (and possibly a new InfrastructureProvider protocol for pod status and events) can use the kubernetes Python client with in-cluster config:
class KubernetesProvider:
provider_name = "kubernetes"
def __init__(self) -> None:
kubernetes.config.load_incluster_config()
self._apps = kubernetes.client.AppsV1Api()
self._core = kubernetes.client.CoreV1Api()
def fetch_recent_deployments(
self, service: str | None, time_range: TimeRange
) -> list[NormalizedDeploymentEvent]:
# List ReplicaSets or Deployments with matching label selector
# Convert rollout events to NormalizedDeploymentEvent
...
From the evidence planner's perspective, KubernetesProvider is just another entry in the registry — no different from LokiProvider or PrometheusProvider. From the LLM's perspective, pod restart events and deployment rollouts appear as formatted evidence alongside log entries and spans.
The distinction matters: Kubernetes can tell the agent that the search service had 3 pod restarts in the 10 minutes before the error spike. That is a causal signal that no observability tool surfaces directly. It lives in the infrastructure layer, not the telemetry layer, and pulling it into the evidence plane is what makes the provider abstraction genuinely powerful rather than just a clean architecture exercise.
5. The Remediation Question
Every serious conversation about autonomous SRE eventually reaches the same fork in the road.
The agent can diagnose. Should it act?
The honest answer is: sometimes, with boundaries. Flipturn's position on this is not "full autonomy" — it is bounded remediation behind approval gates.
The reasoning layer can recommend actions based on evidence: roll back search-service to v2.4.0, scale the cache deployment from 3 to 6 replicas, restart the worker pod. It can make these recommendations with confidence scores derived from the evidence it has collected. What it cannot do is execute those actions without a human in the loop — at least not until the system has demonstrated high enough reliability in its reasoning that the approval gate becomes a formality rather than a safeguard.
The architecture already has the scaffolding for this. The action layer is a provider family too:
@runtime_checkable
class RemediationProvider(Protocol):
provider_name: str
def execute_rollback(
self, service: str, target_version: str, approval_token: str
) -> RemediationResult: ...
def execute_scale(
self, service: str, replicas: int, approval_token: str
) -> RemediationResult: ...
The approval_token is the gate. The agent produces a RecommendedAction with the parameters. A human approves via Slack or Zendesk. The system generates a time-bounded token. The provider executes with that token. The action is logged as evidence in the incident record.
That sequence does not require new architecture — it requires wiring the existing pieces together with a controlled execution surface. The registry, the tool factory, the evidence plane, and the formatter all participate. The only new layer is the approval gateway and the execution audit trail.
6. What the Architecture Now Commits To
Three things have to remain true as the system grows.
The evidence model is the contract. When adding a deployment provider, the question is not "what does the ArgoCD API return?" It is "how does an ArgoCD rollout event map to NormalizedDeploymentEvent?" The API is an implementation detail inside the provider package. The normalized model is what everything else depends on. If the model is right, every layer above it is insulated from API changes.
The registry is the extension point. New backends, new signal types, new context providers — all flow through the same bootstrap_registry() → _register_by_capability() → tool factory path. Adding a capability to the system means implementing a provider and a protocol. It does not mean touching the agent, the evidence planner, the formatter, or the LangGraph graph definition.
The LLM prompt contract is stable. The formatter's job — converting normalized evidence to formatted text — does not change when backends change. The agent already understands how to read LOG_1 [14:23:11 UTC] ERROR search-service: .... Extending to DEPLOY_1 [14:21:08 UTC] search-service v2.4.1 deployed (commit: a3f92b1, author: alice@example.com) follows the same convention. The evidence format is the API between the evidence plane and the reasoning layer. Keeping it stable means the LLM's reasoning quality does not degrade as the system expands.
7. The Shape of the Platform
Compressing everything this series has described into one diagram:
Every box in that diagram either exists today or has a clear, bounded implementation path. The boxes that exist today — the registry, the formatter, the agent, the evidence plane — were the ones that required the most careful design. The boxes that come next inherit that design rather than fighting against it.
That is the point of an architecture worth writing about.
Where the Series Ends and the Work Continues
This series started with a coupling problem: an AI agent whose reasoning quality was hostage to a single observability vendor. It ends with a provider architecture that treats backend choice as configuration — not code.
The progression:
- Named the coupling — vendor DSL embedded in the reasoning layer
- Designed the contract — canonical queries and normalized models as the stable interface
- Proved the migration — adapter pattern, zero regression, feature flag rollout
- Implemented the LGTM stack — three new query languages, one canonical interface
- Mapped what comes next — multi-tenancy, context providers, bounded remediation
The architecture is not complete. for_tenant() is a stub. Jaeger and Elastic providers do not exist yet. The remediation gateway is a design sketch. There is real implementation work ahead.
But the foundation decisions are made. The protocols, the registry, the canonical query model, the normalized evidence models — these are the primitives that the platform builds on. Getting those right, and proving them against two real observability stacks, was the work that had to happen first.
Everything else is additive.
Want to eliminate incident firefighting?
Join teams using Flipturn for autonomous root cause analysis.
Request Access