Directory structure of the project
Virtual environment
The following folders and files are part of the python venv directory structure : bin/, include/, lib/, share/, and pyvenv.cfg.
The requirements.txt file lists the python packages required for the project.
They should be already installed, but in case you reset the venv, you can reinstall them with python3 -m pip install -r requirements.txt
Source code
All python source code is inside the src/ directory.
Datasets
Datasets are stored inside specific directories.
Let's say you have a dataset named XLII.
- All files relative to the dataset must be inside the
XLII_dataset/folder - The
.csvfiles containing the original data must be placed inside theXLII_dataset/csv/folder - The file containing the SQL code to create the tables with the correct schema must be in the
XLII_dataset/create_tables.sqlfile
Obviously, you can replace XLII with any dataset name you want (I used flight_delay and SSB).
Then, if you run make reset, an SQLite database file named XLII_dataset/XLII.db will be created / overwritten. It will be initialized with the schema given in XLII_dataset/create_tables.sql, and populated with the data available in the csv/*.csv files.