# Description of the project ## General informations The execution is managed via the Makefile. The python environment is managed via a virtual environment. Its configuration is standard. If you need to install a new python package, add it to the `requirements.txt` file (using pip syntax). It shoule be installed automatically when you execute the project. Anyway, you can run `make requirements.txt` The installation of new databases (from csv) is managed in the Makefile. # Configuration The configuration is stored the the `src/config.yaml` file. ## Database-specific configuration `database_name` should contain the name of the database to use. The database has to be stored in the proper directory structure (See the [Directory structure > Datasets](README.md#datasets)). This parameter is case sensitive. Each database can have a separated and independent config. It is inside the key name like the database. For example, the database named `SSB` has its configuration under the `SSB:` key (and this configuration will be used only when `database_name` is `SSB`). The following table explains every parameter that is used in the database specific configuration. | key | type | usage | | --- | ---- | ----- | | `orders_length` | integer | The length of considered orderings | | `hypothesis_ordering` | list[str] | The ordering to test the correctness of | | `parameter` | str | The "parameter" attribute in the query (an attribute in the database). | | `authorized_parameter_values` | list[str] | The restriction over possibles values in the query's orderings (`WHERE parameter IN authorized_parameter_values`). | | `summed_attribute` | str | The database attribute that is summed in the aggregation, and used to order the values. | | `criterion` | list[str] | The list of possibles values for the criteria in the query. When getting a random query, one of these values is chosen randomly for the criteria. | The `query_generator` key is a parameter containing the name of the query-generator object that is used when building the query. You should not modify this unless you modify the code accordingly. # Directory structure of the project ## Virtual environment The following folders and files are part of the python venv directory structure : `bin/`, `include/`, `lib/`, `share/`, and `pyvenv.cfg`. The `requirements.txt` file lists the python packages required for the project. They should be already installed, but in case you reset the venv, you can reinstall them with `python3 -m pip install -r requirements.txt` ## Source code All python source code is inside the `src/` directory. ## Datasets Datasets are stored inside specific directories. Let's say you have a dataset named `XLII`. - All files relative to the dataset must be inside the `XLII_dataset/` folder - The `.csv` files containing the original data must be placed inside the `XLII_dataset/csv/` folder - The file containing the SQL code to create the tables with the correct schema must be in the `XLII_dataset/create_tables.sql` file Obviously, you can replace `XLII` with any dataset name you want (I used `flight_delay` and `SSB`). Then, if you run `make reset`, an SQLite database file named `XLII_dataset/XLII.db` will be created / overwritten. It will be initialized with the schema given in `XLII_dataset/create_tables.sql`, and populated with the data available in the `csv/*.csv` files.