Sprint5_Architecture_Decisions.md 2.0 KB

Sprint 5: Backend Architecture & Local LLM Integration

Overview

Sprint 5 established the core backend infrastructure required to operate a fully local, privacy-first AI application without relying on external cloud APIs.

Technical Choices

  • Backend Framework: FastAPI (Python)
    • Reasoning: Chosen for its native asynchronous capabilities (async/await), which are critical for handling streaming responses from the LLM without blocking other user requests. It also automatically generates OpenAPI documentation.
  • Database: SQLite
    • Reasoning: Selected over heavy databases like PostgreSQL because this is a local application running on a constrained VM. SQLite is incredibly fast for read-heavy operations, requires zero setup, and naturally aligns with the "local file" privacy philosophy.
    • Optimization: Enabled WAL (Write-Ahead Logging) mode and check_same_thread=False to allow FastAPI's asynchronous workers to read/write concurrently without immediate locking issues.
  • AI Engine: Qwen 3.5:9B via Ollama (Upgraded from Llama 3.1)
    • Reasoning: Selected because it provides exceptional natural language understanding while fitting within the RAM constraints of the local VM. Ollama handles the model serving locally on port 11434.
  • Authentication: Custom Token-Based Auth
    • Reasoning: Rather than relying on external OAuth providers (Google/Auth0), we built a local token system (secrets.token_urlsafe) stored in the SQLite database to ensure zero data leaves the server.

RAG (Retrieval-Augmented Generation) Design

To prevent the LLM from hallucinating nutritional facts:

  1. Keyword Extraction: The user's query is filtered for stopwords (e.g., "protein in salmon" -> ["protein", "salmon"]).
  2. Fuzzy Search: The SQLite database (foods table) is queried using LIKE pattern matching.
  3. Context Injection: The resulting verified nutritional data is secretly prepended to the user's prompt as a "System Message", forcing the AI to base its answer on local, hard data.