Sprint 5: Backend Architecture & Local LLM Integration
Overview
Sprint 5 established the core backend infrastructure required to operate a fully local, privacy-first AI application without relying on external cloud APIs.
Technical Choices
- Backend Framework: FastAPI (Python)
- Reasoning: Chosen for its native asynchronous capabilities (
async/await), which are critical for handling streaming responses from the LLM without blocking other user requests. It also automatically generates OpenAPI documentation.
- Database: SQLite
- Reasoning: Selected over heavy databases like PostgreSQL because this is a local application running on a constrained VM. SQLite is incredibly fast for read-heavy operations, requires zero setup, and naturally aligns with the "local file" privacy philosophy.
- Optimization: Enabled
WAL (Write-Ahead Logging) mode and check_same_thread=False to allow FastAPI's asynchronous workers to read/write concurrently without immediate locking issues.
- AI Engine: Qwen 3.5:9B via Ollama (Upgraded from Llama 3.1)
- Reasoning: Selected because it provides exceptional natural language understanding while fitting within the RAM constraints of the local VM. Ollama handles the model serving locally on port
11434.
- Authentication: Custom Token-Based Auth
- Reasoning: Rather than relying on external OAuth providers (Google/Auth0), we built a local token system (
secrets.token_urlsafe) stored in the SQLite database to ensure zero data leaves the server.
RAG (Retrieval-Augmented Generation) Design
To prevent the LLM from hallucinating nutritional facts:
- Keyword Extraction: The user's query is filtered for stopwords (e.g., "protein in salmon" -> ["protein", "salmon"]).
- Fuzzy Search: The SQLite database (
foods table) is queried using LIKE pattern matching.
- Context Injection: The resulting verified nutritional data is secretly prepended to the user's prompt as a "System Message", forcing the AI to base its answer on local, hard data.