Building the Delivery Control Plane: Lessons from Apollo_
Building the Delivery Control Plane: Lessons from Apollo
I didn't set out to build a product. I set out to solve a problem that was driving me insane.
It was late 2024, and Moonxi — our consulting studio — was running six concurrent engagements across three time zones. We had engineers in Sao Paulo, clients in New York and London, infrastructure spread across AWS accounts, and a patchwork of tools that were supposed to make it all work: Jira for tasks, GitHub for code, Terraform for infrastructure, Datadog for monitoring, spreadsheets for cost tracking, and Slack for everything else.
None of them talked to each other. Not really.
I could tell you what tasks were in progress. I could tell you what was deployed. I could, with enough digging, estimate what a particular project was costing us in cloud resources. But I could not — at any given moment — answer the question that mattered most: what is the true cost of delivering this outcome, and are we making money doing it?
That question haunted me because I knew the answer was probably "no." Or at least "barely." And I wasn't alone.
The Industry Is Flying Blind
SPI Research publishes an annual benchmark report that should terrify every consulting executive. In their most recent data, consultancy utilization — the percentage of available hours that are actually billed to clients — hit 68.9%. That's the lowest in a decade. For every ten hours a consultant is available, more than three are unbilled. Gone. Pure cost.
But utilization is just the symptom. The disease is deeper.
Average EBITDA margins across the consulting industry crashed to 9.8%. Think about that number. For every million dollars in revenue, less than a hundred thousand survives as profit. In an industry that sells expertise — that has no cost of goods sold in the traditional sense, no inventory, no manufacturing — a single-digit margin is a structural failure.
Where does the money go? It evaporates in the space between tools. In the gap between a Jira ticket being marked "done" and the actual cost of the engineering hours, infrastructure, and coordination that went into delivering it. In the invisible overhead of context switching, status meetings, and manual reconciliation of data that should be automated.
The consulting industry has spent two decades digitizing everything except the thing that matters most: the economic feedback loop between delivery and margin.
The Operational Graph
Apollo started as a weekend project. A Python script that pulled data from our GitHub repos, correlated it with Jira tickets, and cross-referenced both against AWS Cost Explorer. The first version was ugly. It was also revelatory.
For the first time, I could see — in a single view — that a particular feature had taken 47 engineering hours across three sprints, consumed $340 in cloud resources during development and testing, and was now running in production at a steady-state cost of $12/month. The ticket said it was "2 story points." The reality was $4,200 in fully-loaded delivery cost.
That was the moment Apollo stopped being a script and started becoming a system.
Today, Apollo is a full delivery control plane. It has 203 governed agent tools — discrete capabilities that AI agents can invoke under policy control. It has 54 API routes that connect to every system in our delivery stack. And at its core, it maintains what we call the operational graph: a living, queryable model of how tasks connect to code, how code connects to deployments, how deployments connect to infrastructure costs, and how all of it rolls up into project-level and portfolio-level margin.
The operational graph is not a dashboard. Dashboards are snapshots. The graph is a data structure — a directed acyclic graph where every node is an artifact (a task, a commit, a deployment, a cost event) and every edge is a causal relationship. When a developer commits code that references a task, Apollo creates an edge. When that code is deployed, Apollo creates another edge from the commit to the deployment event. When that deployment triggers a cost change in AWS, Apollo captures it and links it back.
The result is that at any moment, I can trace a line from a business outcome all the way down to the infrastructure cost that supports it. And I can do it in real-time.
203 Governed Agent Tools
The number 203 isn't arbitrary. It's the result of two years of iterating on what AI agents actually need to operate within a consulting organization — and, more importantly, what they should be prevented from doing.
When we first started integrating AI agents into our delivery workflow, we gave them broad access. Create tasks, modify code, trigger deployments. The results were predictably chaotic. An agent would close a task before the code was reviewed. Another would spin up infrastructure in the wrong AWS account. A third would update a client-facing document with language that hadn't been approved.
The lesson was clear: agent capability without agent governance is liability.
So we built a governance layer. Every tool in Apollo has a policy envelope — a set of rules that define who can invoke it, under what conditions, with what approvals, and with what audit trail. An agent can create a draft pull request, but it can't merge to main without human approval. It can estimate cloud costs for a proposed architecture, but it can't provision resources above a cost threshold. It can draft a client status update, but it can't send it.
This is what we mean by "governed." Not restricted. Not hobbled. Governed. The agent has full capability within a policy boundary, and every action is recorded in the operational graph.
The MCP Standardization Wave
When Anthropic introduced the Model Context Protocol, something clicked. We had been building our own tool-calling interface — a bespoke system that worked but was brittle and tied to our specific stack. MCP offered something better: a standard.
Standards matter at inflection points. When the web was young, every server had its own protocol. HTTP standardized the interface and the web exploded. When cloud computing emerged, every provider had proprietary APIs. Terraform's provider model standardized infrastructure-as-code and DevOps became a discipline.
MCP is doing the same thing for agent-tool interaction. And for organizations like ours that are building agent governance systems, standardization is not a nice-to-have — it's existential.
Here's why: if every agent framework defines its own tool-calling convention, then every governance policy must be reimplemented for every framework. Your audit trail fragments. Your cost controls become patchwork. Your security model has gaps wherever you forgot to implement a check in one of the seventeen different calling conventions.
With MCP, we write governance policies once. The policy envelope wraps the MCP tool definition. Any MCP-compliant agent — regardless of the underlying model or framework — hits the same governance layer. Audit is unified. Cost controls are consistent. And when a new agent framework appears (and they appear monthly), we don't have to rebuild our governance from scratch.
Apollo adopted MCP early. Today, all 203 tools are MCP-compliant, with governance metadata embedded in the tool descriptions. The agent knows not just what it can do, but what it's allowed to do, under what constraints, and what will happen if it tries to exceed them.
Dogfooding as Ultimate Validation
There is a particular kind of conviction that comes from using your own product every day. Not in a demo environment. Not in a sandbox. In production, with real clients, real money, and real consequences.
Apollo runs Moonxi's operations. Every morning, it generates a briefing that shows me: which projects are on track, which are drifting, where margin is compressing, and what actions agents have taken overnight. When an agent optimizes a deployment pipeline and saves $200/month in infrastructure costs, I see it in the graph before I finish my coffee.
When something breaks — and things break — I feel it immediately. A bug in the cost attribution model showed a project as 12% more profitable than it actually was. We caught it because the graph's margin calculation didn't match the invoice we were about to send. That's the kind of bug you only catch when the tool is wired into your actual business, not when it's running in a test environment with synthetic data.
Dogfooding also creates a relentless prioritization function. When you're both the builder and the user, you know exactly which features matter and which are vanity. We've killed dozens of features that seemed important in theory but were never used in practice. The ones that survived are the ones that saved time, caught errors, or made money visible.
The Lesson for Founders
Every founder has a moment where they realize they've been building something that doesn't exist in the market — not because nobody thought of it, but because nobody was in enough pain to build it properly.
Apollo exists because I was tired of not knowing whether my consulting business was making money on a per-project, per-sprint, per-deployment basis. The tools available were either too high-level (financial dashboards that told you quarterly results after it was too late to act) or too low-level (DevOps monitoring that could tell you a container's CPU usage but not its business cost).
The gap in the middle — the delivery control plane — was empty. And it was empty because building it requires domain expertise in both consulting operations and platform engineering. You need to understand what a utilization rate means and how a Kubernetes deployment works. That intersection is small.
If you're a founder, pay attention to the tools you build for yourself. The scripts you write on weekends. The dashboards you hack together because the commercial options don't do what you need. The internal systems your team relies on but that no one outside your company knows about.
If those tools are good enough that other people in your industry would pay for them, you might have a product. Not a pivot. Not a "why not try selling this." A product, born from genuine pain and validated by daily use.
Apollo wasn't planned as a product. It was planned as survival. The fact that it became something other consultancies want to use is a consequence of building something that actually works — because it had to, because our business depended on it.
What's Next
Apollo is still early. The operational graph covers our stack, but the ambition is broader: a universal delivery control plane that any consulting organization can adopt, customize with their own governance policies, and connect to their own tool ecosystem.
The MCP standard makes this possible in a way it wasn't two years ago. And the market is ready — or rather, the market is desperate. With utilization at 68.9% and margins at 9.8%, the consulting industry cannot afford to keep flying blind.
We built the instrument panel. Now we're learning to fly.