add datasets and other usefull files

2024-06-25 13:24:22 +02:00
parent 8bd048085a
commit fc6a95334d
18 changed files with 245957 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,31 @@
+
+# Directory structure of the project
+
+## Virtual environment
+
+The following folders and files are part of the python venv directory structure : `bin/`, `include/`, `lib/`, `share/`, and `pyvenv.cfg`.
+
+The `requirements.txt` file lists the python packages required for the project.
+They should be already installed, but in case you reset the venv, you can reinstall them with `python3 -m pip install -r requirements.txt`
+
+## Source code
+
+All python source code is inside the `src/` directory.
+
+## Datasets
+
+Datasets are stored inside specific directories.
+
+Let's say you have a dataset named `XLII`.
+
+- All files relative to the dataset must be inside the `XLII_dataset/` folder
+- The `.csv` files containing the original data must be placed inside the `XLII_dataset/csv/` folder
+- The file containing the SQL code to create the tables with the correct schema must be in the `XLII_dataset/create_tables.sql` file
+
+Obviously, you can replace `XLII` with any dataset name you want (I used `flight_delay` and `SSB`).
+
+Then, if you run `make reset`, an SQLite database file named `XLII_dataset/XLII.db` will be created / overwritten. It will be initialized with the schema given in `XLII_dataset/create_tables.sql`, and populated with the data available in the `csv/*.csv` files.
+
+
+
+