SMSAI — AI-powered SMS services on AWS

Problem

LLM apps assume you have a smartphone, a browser, and reliable bandwidth. In a lot of the world, none of that is true.

SMSAI started from a simple question: what if the only thing a user has is a basic phone and SMS? Could you still give them useful AI — search, translation, summarization, agriculture/health Q&A — through the most universal channel that exists?

Constraints

Channel limits. SMS is 160 chars per segment. The product had to feel useful within 1–3 segments.
Latency. Carriers will retry. The whole round-trip — user → carrier → API → LLM → reply — had to feel snappy.
Cost. Every message has a real per-unit cost. The architecture had to make cost predictable, not "depends on the prompt."
Reliability over magic. Better a slightly simpler answer that always arrives than a brilliant one that fails 5% of the time.

My role

End-to-end engineer. Designed the API and data model, built the FastAPI backend and React admin, integrated the LLM provider, wired up Twilio + AWS, set up CI/CD.

Architecture

   ┌────────┐    SMS     ┌─────────┐  HTTPS   ┌──────────────┐
   │ Phone  │──────────▶│ Twilio  │─────────▶│  API Gateway │
   └────────┘            └─────────┘          │   + Lambda   │
                                              │  (FastAPI)   │
                                              └──────┬───────┘
                                                     │
                              ┌──────────────────────┼──────────────────────┐
                              ▼                      ▼                      ▼
                       ┌──────────────┐      ┌────────────┐         ┌──────────────┐
                       │  DynamoDB    │      │  LLM API   │         │ S3 (logs /   │
                       │ (sessions,   │      │  + tools   │         │   audits)    │
                       │  rate limits)│      └────────────┘         └──────────────┘
                       └──────────────┘

Two big simplifications that made everything else easier:

DynamoDB single-table design. One table, three access patterns: by phone, by session, by date. No relational joins, no schema migrations.
FastAPI behind API Gateway + Lambda. Pay per request, autoscale to zero overnight, no servers to babysit.

A small React app handles ops: live message queue, error logs, prompt/tool versioning, per-user spend caps.

Key decisions

1. Treat the LLM as untrusted

The LLM is a remote API that can be slow, flaky, or expensive. Everything around it is built to compensate:

Hard timeouts on every call
Cached canned responses for the top intents (greeting, help, language switch)
A "router" before the LLM so simple intents skip the model entirely
Per-user rate limits stored in DynamoDB with TTL

The router alone took the LLM call rate down by ~35% in early testing.

2. SMS-aware prompting

SYSTEM = """You answer over SMS. Reply in <=2 messages of 160 chars.
No markdown. No emoji unless asked. If you must trim,
trim explanations first, never the answer."""

Prompts were treated as code: versioned in Git, A/B tested, and tied to specific tool sets. A "summarize" prompt is not allowed to call the "send_payment" tool — that boundary lives in the registry, not in the model's head.

3. Observability before scale

Every message hop is logged with a correlation ID. Dashboards show:

Median + p95 latency per intent
Cost per message (LLM + Twilio)
Failure mode breakdown (timeout vs LLM error vs tool error)

When something regresses, the dashboard tells me what and where in under a minute.

Why DynamoDB and not Postgres

Most "SMS in, reply out" workloads are key-value lookups with TTL. DynamoDB on-demand pricing matches the load shape, and there's no idle RDS instance burning money at 3am.

Results

SMS-first AI assistant running end-to-end on AWS
Sub-2s p95 latency including LLM call + carrier round-trip
Predictable per-message cost through the router + cap system
Zero ops servers — full serverless on Lambda + DynamoDB + S3

What I would do differently

Start with a structured eval harness (offline test suite + scoring) on day one — I added it after the fact and wished it had been there for every prompt change
Push more logic into Step Functions for retry / fanout instead of inside FastAPI handlers
Consider on-device fallbacks for languages where the LLM was weak — sometimes a 10MB local model beats a 200B remote one for SMS-length answers

Stack at a glance

Frontend (ops): React (Next.js compatible), TypeScript
Backend: FastAPI (Python), Lambda, API Gateway
Storage: DynamoDB, S3 (audit logs)
AI: LangChain orchestration, hosted LLM API + tool routing
Comms: Twilio Programmable SMS
Infra: AWS, GitHub Actions