traveltimes_prediction.data_processing package

Submodules

traveltimes_prediction.data_processing.data_processor module

class traveltimes_prediction.data_processing.data_processor.DataProcessor(section)[source]

Bases: object

Class for processing of the data - aggregation, feature engineering.

static align_training_dataset(X, Y, Y_bck=None)[source]

Method for preparation of the data for training & visualization - aligning and leaving only the records with timestamp contained in both X and Y (and Y_bck).

Parameters:
  • X (pd.DataFrame) – dataframe of features
  • Y (pd.DataFrame) – dataframe of the true values of travel times
  • Y_bck (pd.DataFrame) – dataframe of the backward predicted traveltimes - optional
Returns:

tuple - pd.DataFrame of features, pd.DataFrame of true values of travel times, optionally pd.DataFrame with the backward predicted traveltimes + dataframe of time

get_features(time_interval_list)[source]

Method for retrieving data from DB, aggregating and feature engineering.

Parameters:time_interval_list (list) – list of dicts - intervals –> {‘from’: datetime, ‘to’: datetime}
Returns:tuple - pd.DataFrame features, int - confidence (0-100)
get_referential_data(time_interval_list)[source]

Method for retrieval of the referential data.

Parameters:time_interval_list (list) – list of dicts.
Returns:pd.DataFrame

traveltimes_prediction.data_processing.db_interface module

class traveltimes_prediction.data_processing.db_interface.DBInterface[source]

Bases: object

Class implementing database connections to data sources and storages.

check_latest_referential_traveltime(section)[source]

Method for retrieval of the most recent data timestamp of the section - checking the latest calculated referential traveltime.

Parameters:section (string) – e.g. ‘KOCE-LNCE’
Returns:datetime.datetime - latest calculated referential traveltime for given section
get_last_timestamp(section)[source]

Method for retrieval of the most recent data timestamp of the section.

Parameters:section (string) – e.g. ‘KOCE-LNCE’
Returns:datetime
get_model_params(section, model_type)[source]

Method used to retrieve the params of trained model.

Parameters:
  • section (string) – name of the section, e.g. ‘KOCE-LNCE”
  • model_type (string) – name of the model, e.g. ‘TimeDomainModel’
Returns:

tuple (model`s name, dictionary representation of model params used for training

load_model(section, model_type)[source]

Method for loading of the saved trained model from DB.

Parameters:
  • section (string) – name of the section, e.g. ‘KOCE-LNCE’
  • model_type (string) – name of the model, e.g. ‘TimeDomainModel’
Returns:

dict - dictionary representation of model`s attributes

model_timestamp(section, model_type)[source]

Method used to find out the timestamp of model creation.

Parameters:
  • section (string) – name of the section, e.g. ‘KOCE-LNCE”
  • model_type (string) – name of the model, e.g. ‘TimeDomainModel’
Returns:

boolean

save_model(section, model, time_from, time_to, model_params)[source]

Method for saving of the trained model to database.

Parameters:
  • section (string) – name of the section for which the model was created - e.g. ‘KOCE-LNCE’
  • model (dict) – dictionary representation of the model
  • time_from (datetime) – timestamp of the earliest data used for creation of this model
  • time_to (datetime) – timestamp of the latest data used for creation of this model
  • model_params (dict) – dictionary with the parameters of the mdoel which have been used by the training
save_prediction_result(rows)[source]

Method used for archiving the prediction results.

Parameters:rows (list) – list of tuples - each tuple contains parameters of the query. (set of parameters)

traveltimes_prediction.data_processing.feature_engineering module

Describes generation of the features.

In variable features_to_extract are defined the features to be created, sources of their data and the methods used for their processing. Features are created for every type of data entity (feature entity) individually. Feature/s is/are described by the dict with keys as follows:

  • c_name_feat - list of strings - Names of the generated features (column`s name).
  • c_name - list of strings - Names of the data columns, that are used for creation of the features.
  • f - method - identifier of the method (method`s name), that is used for creation of the feature/s c_name_feat

Module contents