Work With Me

I take a small number of advisory engagements for Series A-C voice and agentic AI startups whose AWS spend is climbing faster than revenue. The work is hands-on, scoped tightly, and focused on the parts of your architecture that move the bill or the latency budget.

Is this you?

You're running production AI workloads on AWS — Bedrock, SageMaker, or self-hosted on ECS/EKS — and the inference bill is now a board-level line item.
You're shipping voice or agentic experiences and your p95 latency is embarrassing you in demos. You can feel the lag but can't tell which hop is the culprit.
You're multi-tenant (or about to be) and you can't cleanly answer "what does customer X cost us per month" because the data isn't there to answer it.

What I look at

Every engagement starts with the AWS account. I'm reading Cost Explorer, CloudWatch, and your Bedrock invocation logs before I'm reading your code. The levers I rank, in roughly the order they usually pay off:

Bedrock cost — model routing, prompt caching, batch mode, provisioned throughput vs on-demand math, output token discipline.
Voice latency budget — STT/LLM/TTS hop accounting, on-device vs cloud STT, streaming TTS, region placement, the 300-500ms perceived-natural threshold.
Multi-tenant architecture — tenant isolation patterns, per-tenant cost attribution, cross-account boundaries, noisy-neighbor mitigation.
Production hardening — prompt injection defense, RAG leakage, rate limiting, circuit breakers, the ops boring stuff that keeps you off the front page of HN.
Inference path architecture — what's running where, why, and what should move. Lambda vs ECS vs Bedrock-direct, cold start economics, async vs streaming.
Data and eval — what you're logging, what you can replay, and how you'd know if a model swap regressed quality.

How an engagement works

Engagements run 8 weeks, scope-locked, 100% upfront.

Weeks 1-2: Diagnostic. I read your AWS account, CloudWatch, the relevant slice of your codebase. Stakeholder interviews. By end of week 2 we have a written report ranking 5-7 specific levers across cost, architecture, latency, multi-tenant, and hardening.
Weeks 3-6: Implementation. Hands-on with your team to ship the top 2-3 levers from the report.
Weeks 7-8: Handoff. Documentation, runbooks, knowledge transfer, measurable before/after numbers.

The 8-week duration is a target, not a contract limit. If the work needs 9 or 10 weeks to deliver the SOW, I keep working at no additional cost. Time-overrun risk is on me, not you. Out-of-scope work surfaced mid-engagement is a separate engagement, scoped and priced separately.

Larger sprints and fractional engagements are scoped privately for clients who clear the initial 8-week. Everything is remote. I'm in Mountain Time and responsive during the working day.

Apply for Advisory

Starting at $48,000

Need a quick consultation? Book a 1-hour session.