Problem
LLM apps assume you have a smartphone, a browser, and reliable bandwidth. In a lot of the world, none of that is true.
SMSAI started from a simple question: what if the only thing a user has is a basic phone and SMS? Could you still give them useful AI — search, translation, summarization, agriculture/health Q&A — through the most universal channel that exists?
Constraints
- Channel limits. SMS is 160 chars per segment. The product had to feel useful within 1–3 segments.
- Latency. Carriers will retry. The whole round-trip — user → carrier → API → LLM → reply — had to feel snappy.
- Cost. Every message has a real per-unit cost. The architecture had to make cost predictable, not "depends on the prompt."
- Reliability over magic. Better a slightly simpler answer that always arrives than a brilliant one that fails 5% of the time.
My role
End-to-end engineer. Designed the API and data model, built the FastAPI backend and React admin, integrated the LLM provider, wired up Twilio + AWS, set up CI/CD.
Architecture
┌────────┐ SMS ┌─────────┐ HTTPS ┌──────────────┐
│ Phone │──────────▶│ Twilio │─────────▶│ API Gateway │
└────────┘ └─────────┘ │ + Lambda │
│ (FastAPI) │
└──────┬───────┘
│
┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌────────────┐ ┌──────────────┐
│ DynamoDB │ │ LLM API │ │ S3 (logs / │
│ (sessions, │ │ + tools │ │ audits) │
│ rate limits)│ └────────────┘ └──────────────┘
└──────────────┘Two big simplifications that made everything else easier:
- DynamoDB single-table design. One table, three access patterns: by phone, by session, by date. No relational joins, no schema migrations.
- FastAPI behind API Gateway + Lambda. Pay per request, autoscale to zero overnight, no servers to babysit.
A small React app handles ops: live message queue, error logs, prompt/tool versioning, per-user spend caps.
Key decisions
1. Treat the LLM as untrusted
The LLM is a remote API that can be slow, flaky, or expensive. Everything around it is built to compensate:
- Hard timeouts on every call
- Cached canned responses for the top intents (greeting, help, language switch)
- A "router" before the LLM so simple intents skip the model entirely
- Per-user rate limits stored in DynamoDB with TTL
The router alone took the LLM call rate down by ~35% in early testing.
2. SMS-aware prompting
SYSTEM = """You answer over SMS. Reply in <=2 messages of 160 chars.
No markdown. No emoji unless asked. If you must trim,
trim explanations first, never the answer."""Prompts were treated as code: versioned in Git, A/B tested, and tied to specific tool sets. A "summarize" prompt is not allowed to call the "send_payment" tool — that boundary lives in the registry, not in the model's head.
3. Observability before scale
Every message hop is logged with a correlation ID. Dashboards show:
- Median + p95 latency per intent
- Cost per message (LLM + Twilio)
- Failure mode breakdown (timeout vs LLM error vs tool error)
When something regresses, the dashboard tells me what and where in under a minute.
Why DynamoDB and not Postgres
Most "SMS in, reply out" workloads are key-value lookups with TTL. DynamoDB on-demand pricing matches the load shape, and there's no idle RDS instance burning money at 3am.
Results
- SMS-first AI assistant running end-to-end on AWS
- Sub-2s p95 latency including LLM call + carrier round-trip
- Predictable per-message cost through the router + cap system
- Zero ops servers — full serverless on Lambda + DynamoDB + S3
What I would do differently
- Start with a structured eval harness (offline test suite + scoring) on day one — I added it after the fact and wished it had been there for every prompt change
- Push more logic into Step Functions for retry / fanout instead of inside FastAPI handlers
- Consider on-device fallbacks for languages where the LLM was weak — sometimes a 10MB local model beats a 200B remote one for SMS-length answers
Stack at a glance
- Frontend (ops): React (Next.js compatible), TypeScript
- Backend: FastAPI (Python), Lambda, API Gateway
- Storage: DynamoDB, S3 (audit logs)
- AI: LangChain orchestration, hosted LLM API + tool routing
- Comms: Twilio Programmable SMS
- Infra: AWS, GitHub Actions