Work With Me
I take a small number of advisory engagements for Series A-C voice and agentic AI startups whose AWS spend is climbing faster than revenue. The work is hands-on, scoped tightly, and focused on the parts of your architecture that move the bill or the latency budget.
Is this you?
- You're running production AI workloads on AWS — Bedrock, SageMaker, or self-hosted on ECS/EKS — and the inference bill is now a board-level line item.
- You're shipping voice or agentic experiences and your p95 latency is embarrassing you in demos. You can feel the lag but can't tell which hop is the culprit.
- You're multi-tenant (or about to be) and you can't cleanly answer "what does customer X cost us per month" because the data isn't there to answer it.
What I look at
Every engagement starts with the AWS account. I'm reading Cost Explorer, CloudWatch, and your Bedrock invocation logs before I'm reading your code. The levers I rank, in roughly the order they usually pay off:
- Bedrock cost — model routing, prompt caching, batch mode, provisioned throughput vs on-demand math, output token discipline.
- Voice latency budget — STT/LLM/TTS hop accounting, on-device vs cloud STT, streaming TTS, region placement, the 300-500ms perceived-natural threshold.
- Multi-tenant architecture — tenant isolation patterns, per-tenant cost attribution, cross-account boundaries, noisy-neighbor mitigation.
- Production hardening — prompt injection defense, RAG leakage, rate limiting, circuit breakers, the ops boring stuff that keeps you off the front page of HN.
- Inference path architecture — what's running where, why, and what should move. Lambda vs ECS vs Bedrock-direct, cold start economics, async vs streaming.
- Data and eval — what you're logging, what you can replay, and how you'd know if a model swap regressed quality.
How an engagement works
Engagements run 8 weeks, scope-locked, 100% upfront.
- Weeks 1-2: Diagnostic. I read your AWS account, CloudWatch, the relevant slice of your codebase. Stakeholder interviews. By end of week 2 we have a written report ranking 5-7 specific levers across cost, architecture, latency, multi-tenant, and hardening.
- Weeks 3-6: Implementation. Hands-on with your team to ship the top 2-3 levers from the report.
- Weeks 7-8: Handoff. Documentation, runbooks, knowledge transfer, measurable before/after numbers.
The 8-week duration is a target, not a contract limit. If the work needs 9 or 10 weeks to deliver the SOW, I keep working at no additional cost. Time-overrun risk is on me, not you. Out-of-scope work surfaced mid-engagement is a separate engagement, scoped and priced separately.
Larger sprints and fractional engagements are scoped privately for clients who clear the initial 8-week. Everything is remote. I'm in Mountain Time and responsive during the working day.
Starting at $48,000
Need a quick consultation? Book a 1-hour session.