The Local Food AI relies on the OpenFoodFacts dataset. Because this dataset is massive (~24GB), a specialized ingestion pipeline was built to bypass MySQL InnoDB row limits.
The database is structured using Grouped Vertical Partitioning. Instead of a single monolithic table with 200+ columns, data is sliced into 5 distinct tables:
products_core (Names, text, ingredients)products_allergens (Allergy data)products_macros (Fats, proteins, carbs, etc. as DOUBLE)products_vitamins (Vitamin traces)products_minerals (Mineral traces)A MySQL VIEW named products elegantly joins these together so the frontend can query them seamlessly.
download_csv.sh. It will fetch en.openfoodfacts.org.products.csv.Use the nohup wrapper:
nohup bash ./start_batch_ingest.sh > remote_ingest.log 2>&1 &
You can monitor the ingestion progress by tailing the logs:
tail -f ingestion_process.log
The ingest_csv.py uses pandas chunking (chunksize=10000). For every chunk, it slices the DataFrame into the 5 partitions and executes an INSERT IGNORE into the MySQL database. This ensures robustness and allows the script to be safely interrupted and restarted.