test_suite.test_framework package¶
Submodules¶
test_suite.test_framework.test_framework module¶
-
class
test_suite.test_framework.test_framework.
TestFramework
(X_path=None, Y_path=None)[source]¶ Bases:
object
Class for testing of the system, models and features.
-
cluster_dataset
(algorithm, params, cv_ratio, portion)[source]¶ Method for clustering of the dataset.
Parameters: - algorithm (class) – algorithm to use
- params (dict) – hyperparameters for the algorithm
- cv_ratio (float) – should be 1.0
- portion (float) – portion of data to take, most recent data are taken as first.
Returns: tuple - np.arrays x3 - estimated labels, X, Y
-
est_better_than_bck
(bck, est, ref)[source]¶ Function for comparison of quality od predictions - currently used model (bck) and estimated value using ML.
Parameters: - bck (pd.Series) –
- est (pd.Series) –
- ref (pd.Series) –
Returns: boolean - True if the estimated traveltimes error is lower.
-
features_evaluation
(X, Y)[source]¶ Function for measuring the importance of the features. Uses RandomizedLasso for stability measure, RidgeRegression, RandomForests and Recursive Feature Elimination with SVR (linear kernel).
Parameters: - X (pandas.DataFrame) – DataFrame with features
- Y (pandas.DataFrame) – DataFrame with results (travel times)
Returns: pandas.DataFrame
-
get_subset
(portion, cv_ratio, bck, valid_ratio=0.5)[source]¶ Method for retrieving the subset of data.
Parameters: - portion (float) – Portion of data to be retrieved (the most recent data are taken as first, the oldest as latest).
- cv_ratio (float) – Cross-validation ratio, how much data is selected for training.
- bck (boolean) – if output bck part
- valid_ratio – Split of data for testing & validation
Returns: tuple
-
regressor_evaluation_parallel
(algorithm, param_grid, loss_function=<function mean_absolute_percentage_error>, parallel=True)[source]¶ Proxy for _regressor_evaluation(), runs in parallel. Saves the results to csv.
-
regressor_evaluation_sklearn
(algorithm, param_grid, loss_function, X, Y)[source]¶ Method using GridSearch for hyperparameters implemented in sklearn.
Parameters: - algorithm (class) – prediction algorithm, model...
- param_grid (list) – list of dicts - params to try
- loss_function (function) – loss function to use
- X (numpy.ndarray) – array of features
- Y (numpy.ndarray) – array of ground truth
Returns: dict - best classifier`s hyperparameters
-
select_features
(list_of_features)[source]¶ Method for selection of the features.
Parameters: list_of_features (list) – Returns: list
-
train_predict_classifier
(algorithm, params, cv_ratio, portion, valid_ratio=0.5)[source]¶ Method for training and predicting using classifier.
Parameters: - algorithm (class) – classifier
- params (dict) – hyperparameters of classifier
- cv_ratio (float) – validation setting
- portion (float) – portion of data to be taken , most recent data are always included.
- valid_ratio (float) – portion of data to be used for validation - taken from test portion
Returns: tuple - bck tt, prediction of tt using ‘algorithm’ with ‘params’, true value of tt, time
-