add datasets and other usefull files

This commit is contained in:
Oscar Plaisant
2024-06-25 13:24:22 +02:00
parent 8bd048085a
commit fc6a95334d
18 changed files with 245957 additions and 0 deletions

31
README.md Normal file
View File

@@ -0,0 +1,31 @@
# Directory structure of the project
## Virtual environment
The following folders and files are part of the python venv directory structure : `bin/`, `include/`, `lib/`, `share/`, and `pyvenv.cfg`.
The `requirements.txt` file lists the python packages required for the project.
They should be already installed, but in case you reset the venv, you can reinstall them with `python3 -m pip install -r requirements.txt`
## Source code
All python source code is inside the `src/` directory.
## Datasets
Datasets are stored inside specific directories.
Let's say you have a dataset named `XLII`.
- All files relative to the dataset must be inside the `XLII_dataset/` folder
- The `.csv` files containing the original data must be placed inside the `XLII_dataset/csv/` folder
- The file containing the SQL code to create the tables with the correct schema must be in the `XLII_dataset/create_tables.sql` file
Obviously, you can replace `XLII` with any dataset name you want (I used `flight_delay` and `SSB`).
Then, if you run `make reset`, an SQLite database file named `XLII_dataset/XLII.db` will be created / overwritten. It will be initialized with the schema given in `XLII_dataset/create_tables.sql`, and populated with the data available in the `csv/*.csv` files.