1
0
Эх сурвалжийг харах

Sprint 6: Complete documentation and code cleanup

lanfr144 2 долоо хоног өмнө
parent
commit
94194f9b99

+ 32 - 69
PROJECT_CONTEXT.md

@@ -1,76 +1,39 @@
 # Project Context: Local Food AI
 
 ## 🎯 Vision Statement
-A local food AI that provides full nutritional value information on any food and can generate complete menu proposals based on the user's specification. The system is designed with a strict privacy-first focus, ensuring no user data leaves the server, and fits within specific hardware limits.
+A strictly local, privacy-first Food AI that acts as a clinical dietitian. It provides complete nutritional analysis, recipe formulation, and menu planning based on dynamic user health profiles (e.g., pregnancy, kidney disease, specific diets). No user data leaves the server.
 
 ## 🏗️ Architecture & Tech Stack
-
-### Remote Environment
-- **Server**: Ubuntu 24.04 VM at `192.168.130.170` (8 vCPUs, 30 GB RAM, no dedicated GPU). Accessed via SSH as `francois` or `root`.
-- **Containerization**: Docker (for backend/frontend) or native deployment.
-- **LLM Engine**: Ollama (for running lightweight, quantized local language models like `mistral` or `llama3-8b`).
-- **Database Server**: MySQL (for user data, saved lists, and nutritional database).
-
-### Frontend Web Interface
-- **Framework**: Streamlit (Python)
-- **Purpose**: To provide an interactive chat interface for the AI, search functionality for food nutrition, user account management, and food combination calculators.
-
-### Local Environment
-- **Workspace**: `c:\Users\lanfr144\Documents\DOPRO1\Antigravity\Food`
-- **OS**: Windows
-
-### Python Environment
-Python will be used for scripting, data manipulation, and interacting with the LLM and the Database. Required libraries:
-- `streamlit`: To build the web application.
-- `ollama`: For querying local models.
-- `pandas`: For data processing (e.g., ingesting nutrition CSVs).
-- `mysql-connector-python` or `SQLAlchemy`: For database access.
-- **Web Search Tool**: (e.g., DuckDuckGo API wrapper) for the AI to dynamically gather external information anonymously.
-
-## 🔐 Core Requirements & Privacy
-- **User Accounts**: Secure login and registration system.
-- **Data Privacy**: No user data leaves the server.
-- **Repository**: Public Git repository at `https://git.btshub.lu` named `LocalFoodAI_<your IAM>`. Contains a strict `.gitignore`. Teacher (`evegi144`) added as collaborator.
-- **Ease of Use**: Anyone should be able to clone the repo and run it easily (via Docker/scripts).
-
-## 🚀 Key Features (User Stories)
-1. **Nutritional Information**: View complete macros, minerals, vitamins, etc., for any food.
-2. **Food Combinations**: Enter quantities for multiple foods to get a combined nutritional overview. Store and edit these in named lists.
-3. **Nutrient Search**: Search for specific nutrients and sort foods containing them.
-4. **AI Menu Proposals**: Get AI-generated menu proposals based on nutritional goals and constraints (e.g., allergies).
-5. **AI Nutrition Chat**: Freely chat with the AI about nutrition.
-6. **Anonymous Web Search**: The AI can perform local background web searches for missing information.
-
-## 🚀 Installation Prerequisites & Deployment
-### Server Prerequisites (Ubuntu 24.04 Native)
-- `gcc` and `build-essential`.
-- `python3-venv`, `python3-dev`, and `python3-pip`.
-- `mysql-server` and `curl`.
-
-### Automated Deployment (`deploy.sh`)
-Executing this file on a naked server will automatically:
-1. Fetch and install all apt-level system prerequisites.
-2. Install Ollama natively.
-3. Push custom configurations (`my.cnf`) to MySQL server and configure the local virtual environment.
-4. Pip-install the project dependencies.
-
-## 💾 Database Configuration & Data Loading
-### 1. Initial MySQL Setup
-- `init.sql` script loads into MySQL to create the database, users, and tables for User Profiles, Food Combos, and the Nutrition Data.
-
-### 2. Data Import (CSV)
-- A nutritional database `.csv` ingestion script (using `pandas`) populates the MySQL tables.
-
-### 3. Search Capabilities
-- The MySQL database must be optimized for text/context queries to support the AI's Retrieval-Augmented Generation (RAG).
-
-## 📝 Roadmap & Next Steps (Sprints)
-- [ ] **Sprint 1 (Foundation)**: Initialize Git repository (`LocalFoodAI_<IAM>`), setup `.gitignore`, finalize `deploy.sh`, initialize MySQL (`init.sql`), and build Streamlit user login.
-- [ ] **Sprint 2 (Data Core)**: Import food nutritional CSV via Pandas into MySQL. Build Streamlit pages for food search and details.
-- [ ] **Sprint 3 (Combinations)**: Implement Streamlit logic to combine foods by gram amounts and save lists to MySQL.
-- [ ] **Sprint 4 (Local AI)**: Deploy lightweight Ollama models and build the Streamlit chat interface.
-- [ ] **Sprint 5 (Advanced AI)**: Implement RAG for menu proposals and integrate anonymous web search tool.
-- [ ] **Sprint 6 (Polish)**: Thorough testing and perfect the `README.md`.
+- **Server Environment**: Ubuntu 24.04 VM (`192.168.130.170`). 
+- **Containerization**: Docker & Kubernetes scripts available in `docker/` and `k8s/`.
+- **LLM Engine**: Ollama running locally (`mistral:latest`).
+- **Database**: MySQL 8.0 with `mysql_config_editor` login paths (`app_reader`, `app_auth`).
+- **Frontend Web Interface**: Streamlit (`app.py`).
+
+## 💾 Database Design: Grouped Vertical Partitioning
+To bypass InnoDB row limits and optimize for massive data ingestion (~24GB OpenFoodFacts), the database is vertically partitioned:
+1. `products_core` (Base data, FULLTEXT indexing)
+2. `products_allergens`
+3. `products_macros` (Strict `DOUBLE` datatypes)
+4. `products_vitamins`
+5. `products_minerals`
+
+**CRITICAL NOTE**: The frontend and AI RAG tools interact with a unified `VIEW` named `products` that elegantly `LEFT JOIN`s these partitions.
+
+## 🧠 AI Capabilities & RAG Tools
+The Ollama `mistral` model is fully integrated with Streamlit using **Tool Calling**:
+- **Tool**: `search_nutrition_db`. The AI can autonomously execute SQL queries against the local database to pull exact nutritional macros.
+- **Tool**: `local_web_search`. The AI can anonymously search the web if the DB lacks recipe ideas.
+- **Dynamic Profiling**: The Streamlit app extracts the user's EAV health profile and securely injects it into the AI's `sys_prompt`. The AI dynamically acts as a specialized dietitian for that precise condition (e.g., automatically flagging raw meats as forbidden for pregnancy).
+
+## 🚀 Key Features
+1. **Dynamic Tabular Analytics**: In the Clinical Search tab, users can click "Ask AI to Evaluate This Table" to grade database rows against their specific illnesses/diets.
+2. **Plate Builder & Unit Converter**: `unit_converter.py` parses natural language strings (e.g., "1.5 cups") and converts them to metric grams based on product density.
+3. **AI Meal Planner**: Multi-turn RAG loop where the AI queries the database for verified foods before outputting a strict Markdown menu table.
+
+## 📝 Roadmap History
+- **Sprint 1-6 [COMPLETED]**: The project has successfully evolved from a foundation into a heavily optimized, vertically partitioned, RAG-integrated medical platform. All code is audited and documentation is finalized in the `docs/` folder.
+- **Future Work**: The system is in a stable state. Any future AI agents modifying this project should strictly adhere to the vertical partitioning structure and use `search_nutrition_db` for data fetching.
 
 ---
-*Generated by Antigravity. Update this file as technical requirements and data schemas evolve.*
+*Generated by Antigravity.*

+ 22 - 0
README.md

@@ -0,0 +1,22 @@
+# Local Food AI 🍔
+
+A strictly local, privacy-first AI Medical Dietitian and Food Explorer. This project leverages the OpenFoodFacts dataset and local LLMs (Ollama) to provide medically sound dietary advice, recipe parsing, and menu planning without sending any user data to the cloud.
+
+## Features
+- **Dynamic Medical Profiling**: Configure your health profile (e.g., Kidney issues, pregnancy, vegan). The AI dynamically adjusts all responses, recommendations, and warnings based on these exact medical needs.
+- **RAG Architecture**: The AI is connected to a massively partitioned local MySQL database. When you ask a question or request a meal plan, the AI executes SQL queries autonomously to fetch precise nutritional data.
+- **Plate Builder & Unit Conversion**: Input culinary recipes (e.g., "1.5 cups of flour") and the system converts them to metric standard weights based on the product's density.
+- **High-Performance Database**: Implements Grouped Vertical Partitioning to bypass InnoDB limits, featuring `FULLTEXT` indexing for lightning-fast search capabilities across millions of foods.
+
+## Documentation
+Please refer to the `docs/` folder for detailed guides:
+- [Installation Guide](docs/Installation_Guide.md)
+- [User Guide](docs/User_Guide.md)
+- [Data Ingestion Guide](docs/Data_Ingestion.md)
+
+## Tech Stack
+- **Frontend**: Streamlit
+- **Database**: MySQL 8.0
+- **AI Engine**: Ollama (Mistral / Llama3)
+- **Deployment**: Native Ubuntu, Docker, Kubernetes
+- **Project Management**: Taiga (Synced dynamically via Python)

+ 28 - 0
docs/Data_Ingestion.md

@@ -0,0 +1,28 @@
+# Data Ingestion Guide
+
+The Local Food AI relies on the OpenFoodFacts dataset. Because this dataset is massive (~24GB), a specialized ingestion pipeline was built to bypass MySQL InnoDB row limits.
+
+## The Architecture
+The database is structured using **Grouped Vertical Partitioning**. Instead of a single monolithic table with 200+ columns, data is sliced into 5 distinct tables:
+1. `products_core` (Names, text, ingredients)
+2. `products_allergens` (Allergy data)
+3. `products_macros` (Fats, proteins, carbs, etc. as `DOUBLE`)
+4. `products_vitamins` (Vitamin traces)
+5. `products_minerals` (Mineral traces)
+
+A MySQL `VIEW` named `products` elegantly joins these together so the frontend can query them seamlessly.
+
+## How to Ingest
+1. Download the CSV using `download_csv.sh`. It will fetch `en.openfoodfacts.org.products.csv`.
+2. Do **not** run the ingestion script directly in the terminal, as SSH disconnects will kill the process.
+3. Use the `nohup` wrapper:
+   ```bash
+   nohup bash ./start_batch_ingest.sh > remote_ingest.log 2>&1 &
+   ```
+4. You can monitor the ingestion progress by tailing the logs:
+   ```bash
+   tail -f ingestion_process.log
+   ```
+
+## Script Internals
+The `ingest_csv.py` uses `pandas` chunking (`chunksize=10000`). For every chunk, it slices the DataFrame into the 5 partitions and executes an `INSERT IGNORE` into the MySQL database. This ensures robustness and allows the script to be safely interrupted and restarted.

+ 42 - 0
docs/Installation_Guide.md

@@ -0,0 +1,42 @@
+# Installation & Deployment Guide
+
+This guide details how to deploy the Local Food AI stack on an Ubuntu 24.04 server.
+
+## 1. Prerequisites
+- Ubuntu 24.04 (e.g., VM at `192.168.130.170`).
+- Git, curl.
+
+## 2. Setting Up MySQL
+1. Install MySQL Server: `sudo apt install mysql-server`
+2. Run `setup_db.py` to construct the schemas.
+3. Configure `mysql_config_editor` to store encrypted login paths for `app_auth` and `app_reader`.
+   ```bash
+   mysql_config_editor set --login-path=app_reader --host=127.0.0.1 --user=db_reader --password
+   mysql_config_editor set --login-path=app_auth --host=127.0.0.1 --user=db_auth --password
+   ```
+
+## 3. Setting Up Ollama (Local LLM)
+The application requires `ollama` to run the Mistral model locally for strict privacy.
+1. Install Ollama: `curl -fsSL https://ollama.com/install.sh | sh`
+2. Pull the model: `ollama run mistral`
+
+## 4. Python Environment
+1. Clone the repository: `git clone https://git.btshub.lu/...`
+2. Setup venv:
+   ```bash
+   python3 -m venv venv
+   source venv/bin/activate
+   pip install -r requirements.txt (streamlit, pandas, pymysql, bcrypt, ollama)
+   ```
+
+## 5. Running the Application
+To run the Streamlit frontend:
+```bash
+streamlit run app.py --server.address 0.0.0.0
+```
+
+*Note: If you run this locally on Windows without `mysql_config_editor` paths, you will receive a connection warning.*
+
+## 6. Docker & Kubernetes (Optional)
+This repository also contains a full containerized CI/CD suite. 
+Navigate to `k8s/` and run `kubectl apply -f .` to spin up the MySQL, Taiga Sync, Ingestion Jobs, and Streamlit App in a resilient cluster.

+ 30 - 0
docs/User_Guide.md

@@ -0,0 +1,30 @@
+# User Guide: Local Food AI
+
+Welcome to the **Local Food AI** medical explorer application. This guide will explain how to utilize the various modules within the Streamlit interface.
+
+## 🔐 1. User Authentication & Profiling
+When you launch the application, you will be greeted by the Login portal.
+- **Register**: Create a secure account (passwords are Bcrypt hashed).
+- **Dynamic Health Profile**: Once logged in, navigate to the sidebar to define your "EAV" (Entity-Attribute-Value) health profile.
+  - You can add specific `Illnesses` (e.g., Diabetes), `Conditions` (e.g., Pregnant), or `Diets` (e.g., Vegan).
+  - *This profile acts as the foundation for the AI.* The AI reads this profile dynamically to deduce which foods you can or cannot eat.
+
+## 💬 2. AI Chat
+The **AI Chat** tab allows you to speak conversationally with a clinical dietitian AI.
+- **RAG Powered**: If you ask "Which foods are high in protein?", the AI will actively run SQL queries against your local OpenFoodFacts database to find verifiable answers.
+- **Profile Aware**: The AI knows your health profile. If you have "Hypertension" registered, it will automatically warn you against high-sodium suggestions.
+
+## 🔬 3. Clinical Search
+The **Clinical Search** tab allows you to manually explore the massive 24GB dataset.
+- Type any product name (e.g., "apple") and set your macro limits (Max Sugar, Min Protein).
+- **AI Evaluation**: After loading a dataframe, click **"🤖 Ask AI to Evaluate This Table"**. The AI will analyze the visible rows against your active health profile and flag them as recommended or strictly forbidden!
+
+## 🍽️ 4. My Plate Builder
+Build recipes or daily plates.
+- Search for a food by name.
+- Input natural culinary measurements (e.g., "1.5 cups of flour", "2 tbsp of butter"). Our custom unit conversion engine will automatically translate this to metric grams based on the specific product's density and save it to your plate.
+
+## 🤖 5. AI Meal Planner
+Request full daily menus.
+- Input your target calories (e.g., 2000 kcal) and diet preference.
+- The AI will hit the database, find real products matching your needs, and construct a precise Markdown table (`| Meal | Food | Calories | Salt | Fat | Iron |`).

+ 0 - 0
check_projects.py → legacy_scripts/check_projects.py


+ 0 - 0
convert_datatypes.py → legacy_scripts/convert_datatypes.py


+ 0 - 0
fetch_tasks.py → legacy_scripts/fetch_tasks.py


+ 0 - 0
gen_presentation.py → legacy_scripts/gen_presentation.py


+ 0 - 0
reset_pwd.py → legacy_scripts/reset_pwd.py


+ 0 - 0
taiga_checker.py → legacy_scripts/taiga_checker.py


+ 0 - 0
taiga_closeout.py → legacy_scripts/taiga_closeout.py


+ 0 - 0
taiga_feed.py → legacy_scripts/taiga_feed.py


+ 0 - 0
taiga_sprint4.py → legacy_scripts/taiga_sprint4.py


+ 0 - 0
taiga_sprint4_deploy.py → legacy_scripts/taiga_sprint4_deploy.py


+ 0 - 0
test_mail.py → legacy_scripts/test_mail.py


+ 0 - 0
test_taiga.py → legacy_scripts/test_taiga.py