Work With Me

I take a small number of advisory engagements for Series A-C voice and agentic AI startups whose AWS spend is climbing faster than revenue. The work is hands-on, scoped tightly, and focused on the parts of your architecture that move the bill or the latency budget.

Is this you?

  • You're running production AI workloads on AWS — Bedrock, SageMaker, or self-hosted on ECS/EKS — and the inference bill is now a board-level line item.
  • You're shipping voice or agentic experiences and your p95 latency is embarrassing you in demos. You can feel the lag but can't tell which hop is the culprit.
  • You're multi-tenant (or about to be) and you can't cleanly answer "what does customer X cost us per month" because the data isn't there to answer it.

What I look at

Every engagement starts with the AWS account. I'm reading Cost Explorer, CloudWatch, and your Bedrock invocation logs before I'm reading your code. The levers I rank, in roughly the order they usually pay off:

  • Bedrock cost — model routing, prompt caching, batch mode, provisioned throughput vs on-demand math, output token discipline.
  • Voice latency budget — STT/LLM/TTS hop accounting, on-device vs cloud STT, streaming TTS, region placement, the 300-500ms perceived-natural threshold.
  • Multi-tenant architecture — tenant isolation patterns, per-tenant cost attribution, cross-account boundaries, noisy-neighbor mitigation.
  • Production hardening — prompt injection defense, RAG leakage, rate limiting, circuit breakers, the ops boring stuff that keeps you off the front page of HN.
  • Inference path architecture — what's running where, why, and what should move. Lambda vs ECS vs Bedrock-direct, cold start economics, async vs streaming.
  • Data and eval — what you're logging, what you can replay, and how you'd know if a model swap regressed quality.

How an engagement works

Engagements run 8 weeks, scope-locked, 100% upfront.

  • Weeks 1-2: Diagnostic. I read your AWS account, CloudWatch, the relevant slice of your codebase. Stakeholder interviews. By end of week 2 we have a written report ranking 5-7 specific levers across cost, architecture, latency, multi-tenant, and hardening.
  • Weeks 3-6: Implementation. Hands-on with your team to ship the top 2-3 levers from the report.
  • Weeks 7-8: Handoff. Documentation, runbooks, knowledge transfer, measurable before/after numbers.

The 8-week duration is a target, not a contract limit. If the work needs 9 or 10 weeks to deliver the SOW, I keep working at no additional cost. Time-overrun risk is on me, not you. Out-of-scope work surfaced mid-engagement is a separate engagement, scoped and priced separately.

Larger sprints and fractional engagements are scoped privately for clients who clear the initial 8-week. Everything is remote. I'm in Mountain Time and responsive during the working day.

Apply for Advisory

Starting at $48,000

Need a quick consultation? Book a 1-hour session.