test_suite.test_framework package

Submodules

test_suite.test_framework.test_framework module

class test_suite.test_framework.test_framework.TestFramework(X_path=None, Y_path=None)[source]

Bases: object

Class for testing of the system, models and features.

cluster_dataset(algorithm, params, cv_ratio, portion)[source]

Method for clustering of the dataset.

Parameters:
  • algorithm (class) – algorithm to use
  • params (dict) – hyperparameters for the algorithm
  • cv_ratio (float) – should be 1.0
  • portion (float) – portion of data to take, most recent data are taken as first.
Returns:

tuple - np.arrays x3 - estimated labels, X, Y

static convert_func(XY_data_path, section)[source]
est_better_than_bck(bck, est, ref)[source]

Function for comparison of quality od predictions - currently used model (bck) and estimated value using ML.

Parameters:
  • bck (pd.Series) –
  • est (pd.Series) –
  • ref (pd.Series) –
Returns:

boolean - True if the estimated traveltimes error is lower.

features_evaluation(X, Y)[source]

Function for measuring the importance of the features. Uses RandomizedLasso for stability measure, RidgeRegression, RandomForests and Recursive Feature Elimination with SVR (linear kernel).

Parameters:
  • X (pandas.DataFrame) – DataFrame with features
  • Y (pandas.DataFrame) – DataFrame with results (travel times)
Returns:

pandas.DataFrame

get_subset(portion, cv_ratio, bck, valid_ratio=0.5)[source]

Method for retrieving the subset of data.

Parameters:
  • portion (float) – Portion of data to be retrieved (the most recent data are taken as first, the oldest as latest).
  • cv_ratio (float) – Cross-validation ratio, how much data is selected for training.
  • bck (boolean) – if output bck part
  • valid_ratio – Split of data for testing & validation
Returns:

tuple

regressor_evaluation_parallel(algorithm, param_grid, loss_function=<function mean_absolute_percentage_error>, parallel=True)[source]

Proxy for _regressor_evaluation(), runs in parallel. Saves the results to csv.

regressor_evaluation_sklearn(algorithm, param_grid, loss_function, X, Y)[source]

Method using GridSearch for hyperparameters implemented in sklearn.

Parameters:
  • algorithm (class) – prediction algorithm, model...
  • param_grid (list) – list of dicts - params to try
  • loss_function (function) – loss function to use
  • X (numpy.ndarray) – array of features
  • Y (numpy.ndarray) – array of ground truth
Returns:

dict - best classifier`s hyperparameters

select_features(list_of_features)[source]

Method for selection of the features.

Parameters:list_of_features (list) –
Returns:list
train_predict_classifier(algorithm, params, cv_ratio, portion, valid_ratio=0.5)[source]

Method for training and predicting using classifier.

Parameters:
  • algorithm (class) – classifier
  • params (dict) – hyperparameters of classifier
  • cv_ratio (float) – validation setting
  • portion (float) – portion of data to be taken , most recent data are always included.
  • valid_ratio (float) – portion of data to be used for validation - taken from test portion
Returns:

tuple - bck tt, prediction of tt using ‘algorithm’ with ‘params’, true value of tt, time

Module contents