# Description of the project

## General informations

The execution is managed via the Makefile.

The python environment is managed via a virtual environment. Its configuration is standard.
If you need to install a new python package, add it to the `requirements.txt` file (using pip syntax). It shoule be installed automatically when you execute the project. Anyway, you can run `make requirements.txt`

The installation of new databases (from csv) is managed in the Makefile.


# Configuration

The configuration is stored the the `src/config.yaml` file.

## Database-specific configuration

`database_name` should contain the name of the database to use. The database has to be stored in the proper directory structure (See the [Directory structure > Datasets](README.md#datasets)). This parameter is case sensitive.

Each database can have a separated and independent config.
It is inside the key name like the database.
For example, the database named `SSB` has its configuration under the `SSB:` key (and this configuration will be used only when `database_name` is `SSB`).

The following table explains every parameter that is used in the database specific configuration.

| key | type | usage |
| --- | ---- | ----- |
| `orders_length` | integer | The length of considered orderings |
| `hypothesis_ordering` | list[str] | The ordering to test the correctness of |
| `parameter` | str | The "parameter" attribute in the query (an attribute in the database). |
| `authorized_parameter_values` | list[str] | The restriction over possibles values in the query's orderings (`WHERE parameter IN authorized_parameter_values`). |
| `summed_attribute` | str | The database attribute that is summed in the aggregation, and used to order the values. |
| `criterion` | list[str] | The list of possibles values for the criteria in the query. When getting a random query, one of these values is chosen randomly for the criteria. |

The `query_generator` key is a parameter containing the name of the query-generator object that is used when building the query. You should not modify this unless you modify the code accordingly.


# Directory structure of the project

## Virtual environment

The following folders and files are part of the python venv directory structure : `bin/`, `include/`, `lib/`, `share/`, and `pyvenv.cfg`.

The `requirements.txt` file lists the python packages required for the project.
They should be already installed, but in case you reset the venv, you can reinstall them with `python3 -m pip install -r requirements.txt`

## Source code

All python source code is inside the `src/` directory.

## Datasets

Datasets are stored inside specific directories.

Let's say you have a dataset named `XLII`.

- All files relative to the dataset must be inside the `XLII_dataset/` folder
- The `.csv` files containing the original data must be placed inside the `XLII_dataset/csv/` folder
- The file containing the SQL code to create the tables with the correct schema must be in the `XLII_dataset/create_tables.sql` file

Obviously, you can replace `XLII` with any dataset name you want (I used `flight_delay` and `SSB`).

Then, if you run `make reset`, an SQLite database file named `XLII_dataset/XLII.db` will be created / overwritten. It will be initialized with the schema given in `XLII_dataset/create_tables.sql`, and populated with the data available in the `csv/*.csv` files.