Ver código fonte

TG-97: Replace llama3.1:8b by qwen3.5:4b

FerRo988 1 semana atrás
pai
commit
9a4985b476

+ 1 - 1
DOCUMENTATION_CHECKLIST.md

@@ -5,7 +5,7 @@ This file tracks information that Antigravity (the AI) must provide to the Tech
 ## Technical Document Requirements
 - [ ] **Installation & Configuration:** Step-by-step guide for a clean Ubuntu 24.04 VM.
 - [ ] **Tech Stack Rationale:** Why FastAPI, SQLite/PostgreSQL, etc.
-- [ ] **LLM Selection:** Explain which local model is used (e.g., Llama 3.1 8B), why it was chosen, and its quantization level.
+- [ ] **LLM Selection:** Explain which local model is used (e.g., Qwen 3.5:9B), why it was chosen, and its quantization level.
 - [ ] **Agent Permissions:** Explain how and why Antigravity model permissions were configured.
 - [ ] **Infrastructure Diagram:** Description/code for a diagram showing how app components communicate locally.
 - [ ] **Privacy Proof:** Explanation of how we verified that no user data leaves the server.

+ 1 - 1
documentation/Sprint5_Architecture_Decisions.md

@@ -9,7 +9,7 @@ Sprint 5 established the core backend infrastructure required to operate a fully
 *   **Database**: **SQLite**
     *   *Reasoning*: Selected over heavy databases like PostgreSQL because this is a local application running on a constrained VM. SQLite is incredibly fast for read-heavy operations, requires zero setup, and naturally aligns with the "local file" privacy philosophy.
     *   *Optimization*: Enabled `WAL` (Write-Ahead Logging) mode and `check_same_thread=False` to allow FastAPI's asynchronous workers to read/write concurrently without immediate locking issues.
-*   **AI Engine**: **Llama 3.1 8B via Ollama**
+*   **AI Engine**: **Qwen 3.5:9B via Ollama** (Upgraded from Llama 3.1)
     *   *Reasoning*: Selected because it provides exceptional natural language understanding while fitting within the RAM constraints of the local VM. Ollama handles the model serving locally on port `11434`.
 *   **Authentication**: **Custom Token-Based Auth**
     *   *Reasoning*: Rather than relying on external OAuth providers (Google/Auth0), we built a local token system (`secrets.token_urlsafe`) stored in the SQLite database to ensure zero data leaves the server.

+ 4 - 2
main.py

@@ -51,7 +51,7 @@ async def get_current_user(authorization: Optional[str] = Header(None)):
     return user
 
 OLLAMA_URL = "http://localhost:11434/api/chat"
-MODEL_NAME = "llama3.1:8b"
+MODEL_NAME = "qwen3.5:4b"
 
 # Common stopwords to strip before searching the food database
 _STOPWORDS = {
@@ -104,6 +104,7 @@ def extract_food_context(messages: list) -> str | None:
     lines = [
         "[SYSTEM: NUTRITIONAL ANALYST MODE]",
         "You are the LocalFoodAI Analyst. Use ONLY verified local data for values.",
+        "CRITICAL: Provide direct, concise answers. Skip all internal monologues, <thought> tags, or reasoning steps.",
         "For each food discussed, you MUST follow this structure:",
         "1. Header: ### 🥗 [Name] (per 100g)",
         "2. Macros: A markdown table for Cal, P, F, C, Fib, Sug, Chol.",
@@ -231,7 +232,8 @@ async def chat_endpoint(request: ChatRequest, current_user: dict = Depends(get_c
     payload = {
         "model": MODEL_NAME,
         "messages": messages,
-        "stream": True
+        "stream": True,
+        "think": False  # Disable reasoning/thinking mode for faster responses
     }
     
     async def generate_response():

+ 1 - 1
static/index.html

@@ -124,7 +124,7 @@
                     <svg width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><line x1="22" y1="2" x2="11" y2="13"></line><polygon points="22 2 15 22 11 13 2 9 22 2"></polygon></svg>
                 </button>
             </form>
-            <div class="footer-note">Powered by Llama 3.1 8B running locally on Ubuntu 24.04 via Ollama</div>
+            <div class="footer-note">Powered by Qwen 3.5:4B running locally on Ubuntu 24.04 via Ollama</div>
         </footer>
     </div>
     

+ 1 - 1
user_stories.md

@@ -34,7 +34,7 @@ _Note: Sprints 1–3 covered initial VM setup, Ollama framework installation, Go
   - **[Back] (2 pts):** Create a fast fuzzy-matching API endpoint `GET /api/food/search`.
   - **[Front] (2 pts):** Implement the search bar component with real-time fetch autocomplete.
 - **[US-06]** As a developer, I want to connect the AI logic to the local database securely.
-  - **[Back] (2 pts):** Equip the Llama 3.1 8B agent with local SQL lookup tools so it can query the DB.
+  - **[Back] (2 pts):** Equip the Qwen 3.5:9B agent with local SQL lookup tools so it can query the DB.
 
 ## 🏃 Sprint 6: Comprehensive Nutritional Information
 **Total Points: 10** | **Goal:** Expose deep nutritional data (macros, minerals, vitamins, amino acids) to the user.