Trainers

Hyperopt trainers

class modelgym.trainers.hyperopt_trainer.HyperoptTrainer(model_spaces, algo=None, tracker=None)

Bases: modelgym.trainers.trainer.Trainer

HyperoptTrainer is a class for models hyperparameter optimization, based on hyperopt library

Parameters:
  • model_spaces (list of modelgym.models.Model or modelgym.utils.ModelSpaces) – list of model spaces (model classes and parameter spaces to look in). If some list item is Model, it is converted in ModelSpace with default space and name equal to model class __name__
  • algo (function, e.g hyperopt.rand.suggest or hyperopt.tpe.suggest) – algorithm to use for optimization
  • tracker (modelgym.trackers.Tracker, optional) – tracker to save (and load, if there was any) optimization progress.
Raises:

ValueError if there are several model_spaces with similar names

crossval_optimize_params(opt_metric, dataset, cv=3, opt_evals=50, metrics=None, verbose=False, batch_size=10, client=None, **kwargs)

Find optimal hyperparameters for all models

Parameters:
  • opt_metric (modelgym.metrics.Metric) – metric to optimize
  • dataset (modelgym.utils.XYCDataset or None) – dataset
  • cv (int or list of tuples of (XYCDataset, XYCDataset)) – if int, then number of cross-validation folds or cross-validation folds themselves otherwise.
  • opt_evals (int) – number of cross-validation evaluations
  • metrics (list of modelgym.metrics.Metric, optional) – additional metrics to evaluate
  • verbose (bool) – Enable verbose output.
  • batch_size (int) – periodicity of saving results to tracker
  • client
  • **kwargs – ignored

Note

if cv is int, than dataset is split into cv parts for cross validation. Otherwise, cv folds are used.

get_best_results()

When training is complete, return best parameters (and additional information) for each model space

Returns:dict of shape:
{
    name (str): {
        "result": {
            "loss": float,
            "loss_variance": float,
            "status": "ok",
            "metric_cv_results": list,
            "params": dict
        },
        "model_space": modelgym.utils.ModelSpace
    }
}

name is a name of corresponding model_space,

metric_cv_results contains dict’s from metric names to calculated metric values for each fold in cv_fold,

params is optimal parameters of corresponding model

model_space is corresponding model_space.

class modelgym.trainers.hyperopt_trainer.RandomTrainer(model_spaces, tracker=None)

Bases: modelgym.trainers.hyperopt_trainer.HyperoptTrainer

TpeTrainer is a HyperoptTrainer using Random search

class modelgym.trainers.hyperopt_trainer.TpeTrainer(model_spaces, tracker=None)

Bases: modelgym.trainers.hyperopt_trainer.HyperoptTrainer

TpeTrainer is a HyperoptTrainer using Tree-structured Parzen Estimator

Skopt trainers

class modelgym.trainers.skopt_trainer.GPTrainer(model_spaces, tracker=None)

Bases: modelgym.trainers.skopt_trainer.SkoptTrainer

GPTrainer is a SkoptTrainer, using Bayesian optimization using Gaussian Processes.

class modelgym.trainers.skopt_trainer.RFTrainer(model_spaces, tracker=None)

Bases: modelgym.trainers.skopt_trainer.SkoptTrainer

RFTrainer is a SkoptTrainer, using Sequential optimisation using decision trees

class modelgym.trainers.skopt_trainer.SkoptTrainer(model_spaces, optimizer, tracker=None)

Bases: modelgym.trainers.trainer.Trainer

SkoptTrainer is a class for models hyperparameter optimization, based on skopt library

Parameters:
  • model_spaces (list of modelgym.models.Model or modelgym.utils.ModelSpaces) – list of model spaces (model classes and parameter spaces to look in). If some list item is Model, it is converted in ModelSpace with default space and name equal to model class __name__
  • (function, e.g forest_minimize or gp_minimize (optimizer) –
  • tracker (modelgym.trackers.Tracker, optional) – ignored
Raises:

ValueError if there are several model_spaces with similar names

crossval_optimize_params(opt_metric, dataset, cv=3, opt_evals=50, metrics=None, verbose=False, **kwargs)

Find optimal hyperparameters for all models

Parameters:
  • opt_metric (modelgym.metrics.Metric) – metric to optimize
  • dataset (modelgym.utils.XYCDataset or None) – dataset
  • cv (int or list of tuples of (XYCDataset, XYCDataset)) – if int, then number of cross-validation folds or cross-validation folds themselves otherwise.
  • opt_evals (int) – number of cross-validation evaluations
  • metrics (list of modelgym.metrics.Metric, optional) – additional metrics to evaluate
  • verbose (bool) – Enable verbose output.
  • **kwargs – ignored

Note

if cv is int, than dataset is split into cv parts for cross validation. Otherwise, cv folds are used.

get_best_results()

When training is complete, return best parameters (and additional information) for each model space

Returns:dict of shape:
{
    name (str): {
        "result": {
            "loss": float,
            "metric_cv_results": list,
            "params": dict
        },
        "model_space": modelgym.utils.ModelSpace
    }
}

name is a name of corresponding model_space,

metric_cv_results contains dict’s from metric names to calculated metric values for each fold in cv_fold,

params is optimal parameters of corresponding model,

model_space is corresponding model_space.