MarkBase AI: Engineering an Autonomous Marketing Intelligence Platform
How I designed and built the AI architecture behind MarkBase, a natural-language marketing intelligence platform that unifies Google Ads, Meta, LinkedIn, and GA into a single command layer for mid-market teams.
Role
Co-Founder, CTO & AI Engineer
Timeframe
2025 – Present
Status
Live in early access, with 1,600+ waitlist signups and 10 paying beta partners
Tech Stack
An autonomous marketing intelligence platform that replaces fragmented dashboards with a single conversational command layer.
Mid-market marketing teams spend $10K to $100K+ per month across advertising, analytics, and outreach tools, yet most of that budget goes to humans coordinating between systems rather than to execution. MarkBase is the platform my co-founder and I built to collapse that coordination layer into software.
As Co-Founder, CTO, and AI Engineer, I own the full AI and backend architecture: the system that turns a natural-language question like "Why did conversions drop last week?" into a structured analysis across Google Ads, Meta, LinkedIn, and Google Analytics, and, when asked, turns an instruction like "Send an invite to 50 fintech CEOs for our launch event" into an executed outreach campaign.
This case study walks through three engineering decisions I'm proudest of, because each one reflects a non-obvious tradeoff rather than a default choice.
The Problem
Marketing platforms don't talk to each other. A campaign that looks healthy on Google Ads might be cannibalizing Meta performance. A drop in GA conversions might trace back to a CPC spike on a single LinkedIn campaign from two weeks ago. Answering "how is my marketing actually performing?" today requires a human, usually an agency, to pull numbers from four dashboards, normalize them, and write a narrative.
MarkBase replaces that human coordination with a conversational interface that:
- Queries any connected marketing platform using natural language
- Scores performance against industry benchmarks deterministically
- Routes cross-channel questions intelligently when the same metric lives in multiple places
- Executes outreach and campaign actions through authenticated integrations
The engineering challenge is that almost every interesting query in marketing is cross-channel, multi-metric, and context-dependent. Exactly the kind of problem where naive LLM orchestration falls apart under real user load.
Decision 1: Serverless-First Architecture for an MVP That Has to Scale Both Directions
The harder scaling problem for an early-stage AI product isn't handling traffic spikes. It's not burning runway on idle capacity while you still have unpredictable demand, and doing that without locking yourself out of enterprise-grade scale later.
At MVP stage with 10 paying beta partners and 1,600+ waitlisted users, I had to design for three realities at once:
- Unpredictable load. Waitlist cohorts onboard in bursts, not linearly.
- Credential-sensitive workloads. We hold OAuth tokens for customer ad accounts and domain auth for outbound email, so the security model has to be production-grade on day one.
- A clear enterprise path. The pitch deck commits to a $5K/month enterprise tier, so nothing in the MVP can become a ceiling later.
I chose a serverless-first architecture on AWS: Lambda for compute, DynamoDB for state, Cognito for auth, SES for outbound email. The architecture is designed to scale to zero during quiet periods and absorb bursts without pre-provisioning.
Why this stack over the obvious alternatives:
- Lambda over ECS/Kubernetes. At MVP load, container orchestration pays infrastructure complexity for capacity we don't need yet. Lambda's cold-start penalty is real but acceptable for a chat interface where the user already expects a few seconds of LLM latency; the cost profile is dramatically better.
- DynamoDB over Postgres/RDS. Our access patterns are key-based (user to connected channels, user to query history, contact to metadata). The relational joins marketing analytics eventually needs happen inside the marketing platforms' own APIs, not in our database. Paying for an idle RDS instance to support query patterns we don't have would have been the wrong tradeoff.
- Cognito over rolling our own auth. The correct default for any product touching customer credentials. The time saved went directly into the AI layer, which is where the product's actual differentiation lives.
- SES over SendGrid/Postmark. We already live in AWS, deliverability is strong, domain authentication is clean, and the cost curve is an order of magnitude better at the volumes MarMail will eventually send.
The architecture scales horizontally without re-architecture. When we move to SageMaker-hosted fine-tuned models for the enterprise tier, the orchestration layer doesn't change. Only the inference endpoint does.
Decision 2: Deterministic Scoring Over LLM Judgment for Performance Evaluation
The most common failure mode in AI marketing tools is letting the LLM judge performance. Ask GPT-4 "is a 2.3% CTR good?" and you'll get a confident answer that's wrong for half of industries and right for the other half, with no way to audit which half you're in.
Performance evaluation in MarkBase is deterministic code, not LLM reasoning.
When a user asks "How is my Google Ads performing?", the system:
- Pulls the raw metrics from the connected channel (CTR, CPC, conversion rate, ROAS, cost per conversion, etc.)
- Looks up industry benchmarks from our benchmarks database, filtered by the user's industry and channel
- Computes weighted scores per metric using a formula that accounts for the metric's relative importance to the channel (CTR matters differently on Google Search vs. Meta display) and the user's stated goals
- Flags deviations against benchmark thresholds with clear magnitudes, like "23% below fintech median" rather than "a bit low"
- Passes the structured result to the LLM only for natural-language explanation
This architecture has three properties that matter:
Auditability. Every number in a MarkBase response traces back to either a raw API metric, a benchmark value, or a deterministic calculation. There's no hallucinated math. When a customer asks "how did you get that?", we can show them.
Testability. The scoring logic has unit tests. LLM judgment doesn't.
Consistency. The same question asked twice returns the same analysis. For a product that replaces agency reporting, that's not a nice-to-have. It's the baseline trust requirement.
The LLM's job is to explain, prioritize, and converse. The judgment itself is in code.
Decision 3: Channel Routing as a Deterministic Control Plane, Not an Agent Supervisor
The original architecture used a supervisor-agent pattern, where an LLM interpreted the user's query and delegated to specialized channel agents (Google Ads, Meta, LinkedIn, GA). It was elegant on paper and in demos. In beta testing, it introduced three problems:
- Latency. Every query paid for two LLM round-trips instead of one.
- Non-determinism. The supervisor occasionally routed to the wrong agent, or to none, and the failure mode was hard to debug.
- Cost. Supervisor reasoning burned tokens that weren't producing user-visible value.
I moved the orchestration layer from an LLM supervisor into deterministic routing code built around LLM function calling. The intelligence didn't go away. It moved somewhere I can test and reason about.
The routing layer handles three cases explicitly:
Single-channel queries. "What's my Google Ads CTR this week?" The function schema makes the channel explicit, the LLM calls one function, done.
Cross-channel queries where a metric exists in multiple places. This is the interesting case. "Conversions" exists on Google Ads, Meta Ads, LinkedIn Ads, and Google Analytics, each with different definitions and attribution windows. Instead of picking one or silently aggregating (which would be wrong), the router:
- Checks which channels the user has actually authenticated
- Queries every authenticated channel that reports the metric
- Returns a per-channel breakdown in the response (same metric, each source labeled)
- Offers the user two explicit next steps: drill into one channel or explain how the numbers relate and what the cross-channel picture means
That last step is where real marketing intelligence happens. A 40% gap between GA conversions and Meta's reported conversions isn't a bug. It's an attribution story. Surfacing it as a user-initiated next step, rather than auto-synthesizing a potentially misleading "total," is the choice that keeps MarkBase honest.
Action queries. "Send an event invite to 50 fintech CEOs." These route through the contacts database and MarMail (SES-backed), with explicit confirmation before anything hits a customer's authenticated domain.
The architectural win: the reasoning is still there, but it's in code I can unit-test, log, and reason about, not in a non-deterministic supervisor loop.
What's Live Today
- MVP in production with 10 paying beta partners on the Intelligence tier
- 1,600+ users on the early-access waitlist (opened February 2026)
- Integrations live: Google Ads, Meta Ads, LinkedIn Ads, Google Analytics
- MarMail outbound via authenticated domains on SES
- Contacts intelligence for executive-level targeting across industries
- Industry benchmarking across all integrated channels
What's Next
- Shopify, HubSpot, and Salesforce integrations to connect marketing performance to revenue
- Multi-channel agent squads for autonomous playbook execution
- Fine-tuned models on SageMaker for the enterprise tier
Reflections
MarkBase is the project where I stopped thinking about AI engineering as "orchestrating LLMs" and started thinking about it as deciding what belongs in code and what belongs in the model. Almost every interesting decision in the system is a version of that question:
- Performance judgment? Code.
- Channel routing? Code.
- Explanation and conversation? Model.
- Cross-channel synthesis? User-initiated, with the model generating the narrative over code-computed facts.
That split is the part I'd argue most strongly for in any serious AI product. LLMs are excellent at language and bad at being a system of record. Building around that constraint, instead of against it, is what turns an AI demo into a product people pay for.
Have an AI product you’re trying to ship?
I build and ship real AI products — from architecture to production. If you have a project or idea, let’s talk.