3.3 KiB
Description of the project
General informations
The execution is managed via the Makefile.
The python environment is managed via a virtual environment. Its configuration is standard.
If you need to install a new python package, add it to the requirements.txt
file (using pip syntax). It shoule be installed automatically when you execute the project. Anyway, you can run make requirements.txt
The installation of new databases (from csv) is managed in the Makefile.
Configuration
The configuration is stored the the src/config.yaml
file.
Database-specific configuration
database_name
should contain the name of the database to use. The database has to be stored in the proper directory structure (See the Directory structure > Datasets). This parameter is case sensitive.
Each database can have a separated and independent config.
It is inside the key name like the database.
For example, the database named SSB
has its configuration under the SSB:
key (and this configuration will be used only when database_name
is SSB
).
The following table explains every parameter that is used in the database specific configuration.
key | type | usage |
---|---|---|
orders_length |
integer | The length of considered orderings |
hypothesis_ordering |
list[str] | The ordering to test the correctness of |
parameter |
str | The "parameter" attribute in the query (an attribute in the database). |
authorized_parameter_values |
list[str] | The restriction over possibles values in the query's orderings (WHERE parameter IN authorized_parameter_values ). |
summed_attribute |
str | The database attribute that is summed in the aggregation, and used to order the values. |
criterion |
list[str] | The list of possibles values for the criteria in the query. When getting a random query, one of these values is chosen randomly for the criteria. |
The query_generator
key is a parameter containing the name of the query-generator object that is used when building the query. You should not modify this unless you modify the code accordingly.
Directory structure of the project
Virtual environment
The following folders and files are part of the python venv directory structure : bin/
, include/
, lib/
, share/
, and pyvenv.cfg
.
The requirements.txt
file lists the python packages required for the project.
They should be already installed, but in case you reset the venv, you can reinstall them with python3 -m pip install -r requirements.txt
Source code
All python source code is inside the src/
directory.
Datasets
Datasets are stored inside specific directories.
Let's say you have a dataset named XLII
.
- All files relative to the dataset must be inside the
XLII_dataset/
folder - The
.csv
files containing the original data must be placed inside theXLII_dataset/csv/
folder - The file containing the SQL code to create the tables with the correct schema must be in the
XLII_dataset/create_tables.sql
file
Obviously, you can replace XLII
with any dataset name you want (I used flight_delay
and SSB
).
Then, if you run make reset
, an SQLite database file named XLII_dataset/XLII.db
will be created / overwritten. It will be initialized with the schema given in XLII_dataset/create_tables.sql
, and populated with the data available in the csv/*.csv
files.