Models

In order to use our Trainer you need the wrapper on your model. You can find the required Model interface below.

We implement wrappers for several models:

Also, we implement an Ensemble Model.

Model interface

class modelgym.models.model.Model(params=None)

Model is a base class for a specific ML algorithm implementation factory, i.e. it defines algorithm-specific hyperparameter space and generic methods for model training & inference

Parameters:params (dict or None) – parameters for model.
fit(dataset, weights=None)
Parameters:
  • X (np.array, shape (n_samples, n_features)) – the input data
  • y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data
  • weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
Returns:

self

static get_default_parameter_space()
Returns:default parameter space
Return type:dict from parameter name to hyperopt distribution
static get_learning_task()
Returns:task
Return type:modelgym.models.LearningTask
is_possible_predict_proba()
Returns:bool, whether model can predict proba
static load_from_snapshot(filename)

:snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset)
Parameters:dataset (modelgym.utils.XYCDataset) – the input data, dataset.y may be None
Returns:predictions
Return type:np.array, shape (n_samples, )
predict_proba(X)
Parameters:dataset (np.array, shape (n_samples, n_features)) – the input data
Returns:predicted probabilities
Return type:np.array, shape (n_samples, n_classes)
save_snapshot(filename)
Returns:serializable internal model state snapshot.

XGBoost

class modelgym.models.xgboost_model.XGBClassifier(params=None)

Bases: modelgym.models.model.Model

Parameters:params (dict) – parameters for model.
fit(dataset, weights=None)
Parameters:
  • X (np.array, shape (n_samples, n_features)) – the input data
  • y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data
  • weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
Returns:

self

static get_default_parameter_space()
Returns:dict of DistributionWrappers
static get_learning_task()
is_possible_predict_proba()
Returns:bool, whether model can predict proba
static load_from_snapshot(filename)

:snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset)
Parameters:X (np.array, shape (n_samples, n_features)) – the input data
Returns:np.array, shape (n_samples, ) or (n_samples, n_outputs)
predict_proba(dataset)
Parameters:X (np.array, shape (n_samples, n_features)) – the input data
Returns:np.array, shape (n_samples, n_classes)
save_snapshot(filename)
Returns:serializable internal model state snapshot.
class modelgym.models.xgboost_model.XGBRegressor(params=None)

Bases: modelgym.models.model.Model

Parameters:
  • params (dict or None) – parameters for model. If None default params are fetched.
  • learning_task (str) – set type of task(classification, regression, …)
fit(dataset, weights=None)
Parameters:
  • X (np.array, shape (n_samples, n_features)) – the input data
  • y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data
  • weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
Returns:

self

static get_default_parameter_space()
Returns:dict of DistributionWrappers
static get_learning_task()
is_possible_predict_proba()
Returns:bool, whether model can predict proba
static load_from_snapshot(filename)

:snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset)
Parameters:X (np.array, shape (n_samples, n_features)) – the input data
Returns:np.array, shape (n_samples, ) or (n_samples, n_outputs)
predict_proba(dataset)
Parameters:X (np.array, shape (n_samples, n_features)) – the input data
Returns:np.array, shape (n_samples, n_classes)
save_snapshot(filename)
Returns:serializable internal model state snapshot.

LightGBM

class modelgym.models.lightgbm_model.LGBMClassifier(params=None)

Bases: modelgym.models.model.Model

Parameters:
  • params (dict or None) – parameters for model. If None default params are fetched.
  • learning_task (str) – set type of task(classification, regression, …)
fit(dataset, weights=None)
Parameters:
  • X (np.array, shape (n_samples, n_features)) – the input data
  • y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data
  • weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
Returns:

self

static get_default_parameter_space()
Returns:dict of DistributionWrappers
static get_learning_task()
is_possible_predict_proba()
Returns:bool, whether model can predict proba
static load_from_snapshot(filename)

:snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset)
Parameters:X (np.array, shape (n_samples, n_features)) – the input data
Returns:np.array, shape (n_samples, ) or (n_samples, n_outputs)
predict_proba(dataset)
Parameters:X (np.array, shape (n_samples, n_features)) – the input data
Returns:np.array, shape (n_samples, n_classes)
save_snapshot(filename)
Returns:serializable internal model state snapshot.
class modelgym.models.lightgbm_model.LGBMRegressor(params=None)

Bases: modelgym.models.model.Model

Parameters:
  • params (dict or None) – parameters for model. If None default params are fetched.
  • learning_task (str) – set type of task(classification, regression, …)
fit(dataset, weights=None)
Parameters:
  • X (np.array, shape (n_samples, n_features)) – the input data
  • y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data
  • weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
Returns:

self

static get_default_parameter_space()
Returns:dict of DistributionWrappers
static get_learning_task()
is_possible_predict_proba()
Returns:bool, whether model can predict proba
static load_from_snapshot(filename)

:snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset)
Parameters:X (np.array, shape (n_samples, n_features)) – the input data
Returns:np.array, shape (n_samples, ) or (n_samples, n_outputs)
predict_proba(dataset)
Parameters:X (np.array, shape (n_samples, n_features)) – the input data
Returns:np.array, shape (n_samples, n_classes)
save_snapshot(filename)

Return: serializable internal model state snapshot.

RandomForestClassifier

class modelgym.models.rf_model.RFClassifier(params=None)

Bases: modelgym.models.model.Model

Parameters:
  • params (dict or None) – parameters for model. If None default params are fetched.
  • learning_task (str) – set type of task(classification, regression, …)
fit(dataset, weights=None)
Parameters:
  • X (np.array, shape (n_samples, n_features)) – the input data
  • y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data
  • weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
Returns:

self

static get_default_parameter_space()
Returns:dict of DistributionWrappers
static get_learning_task()
is_possible_predict_proba()
Returns:bool, whether model can predict proba
static load_from_snapshot(filename)

:snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset)
Parameters:X (np.array, shape (n_samples, n_features)) – the input data
Returns:np.array, shape (n_samples, ) or (n_samples, n_outputs)
predict_proba(dataset)
Parameters:X (np.array, shape (n_samples, n_features)) – the input data
Returns:np.array, shape (n_samples, n_classes)
save_snapshot(filename)
Returns:serializable internal model state snapshot.

Catboost

class modelgym.models.catboost_model.CtBClassifier(params=None)

Bases: modelgym.models.model.Model

Wrapper for CatBoostClassifier

Parameters:params (dict) – parameters for model.
fit(dataset, weights=None, eval_dataset=None, **kwargs)
Parameters:
  • dataset (XYCDataset) – train
  • y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data
  • weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
  • eval_dataset – same as dataset
  • kwargs – CatBoost.Pool kwargs if eval_dataset is None or {'train': train_kwargs, 'eval': eval_kwargs} otherwise
Returns:

self

static get_default_parameter_space()
Returns:dict of DistributionWrappers
static get_learning_task()
is_possible_predict_proba()
Returns:bool, whether model can predict proba
static load_from_snapshot(filename)

:snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset, **kwargs)
Parameters:
  • X (np.array, shape (n_samples, n_features)) – the input data
  • kwargs – CatBoost.Pool kwargs
Returns:

np.array, shape (n_samples, ) or (n_samples, n_outputs)

predict_proba(dataset, **kwargs)
Parameters:
  • X (np.array, shape (n_samples, n_features)) – the input data
  • kwargs – CatBoost.Pool kwargs
Returns:

np.array, shape (n_samples, n_classes)

save_snapshot(filename)
Returns:serializable internal model state snapshot.
class modelgym.models.catboost_model.CtBRegressor(params=None)

Bases: modelgym.models.model.Model

Wrapper for CatBoostRegressor

Parameters:
  • params (dict or None) – parameters for model. If None default params are fetched.
  • learning_task (str) – set type of task(classification, regression, …)
fit(dataset, weights=None, eval_dataset=None, **kwargs)
Parameters:
  • dataset (XYCDataset) –
  • weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
  • eval_dataset – same as dataset
  • kwargs – CatBoost.Pool kwargs if eval_dataset is None or {'train': train_kwargs, 'eval': eval_kwargs} otherwise
Returns:

self

static get_default_parameter_space()
Returns:dict of DistributionWrappers
static get_learning_task()
is_possible_predict_proba()
Returns:bool, whether model can predict proba
static load_from_snapshot(filename)

:snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset, **kwargs)
Parameters:
  • X (np.array, shape (n_samples, n_features)) – the input data
  • kwargs – CatBoost.Pool kwargs
Returns:

np.array, shape (n_samples, ) or (n_samples, n_outputs)

predict_proba(dataset, **kwargs)
Parameters:
  • X (np.array, shape (n_samples, n_features)) – the input data
  • kwargs – CatBoost.Pool kwargs
Returns:

np.array, shape (n_samples, n_classes)

save_snapshot(filename)
Returns:serializable internal model state snapshot.

Ensemble Model

class modelgym.models.ensemble_model.EnsembleClassifier(params=None)

Bases: modelgym.models.model.Model

Parameters:params (dict) – parameters for model.
fit(dataset, weights=None, **kwargs)
Parameters:
  • dataset (XYCDataset) – train
  • y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data
  • weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
  • eval_dataset – same as dataset
  • kwargs – CatBoost.Pool kwargs if eval_dataset == None or {'train': train_kwargs, 'eval': eval_kwargs} otherwise
Returns:

self

static get_default_parameter_space()
Returns:dict of DistributionWrappers
static get_learning_task()
static get_one_hot(targets, nb_classes)
is_possible_predict_proba()
Returns:bool, whether model can predict proba
static load_from_snapshot(filename, models)
Parameters:filename – prefix for models’ files
Returns:EnsembleClassifier
predict(dataset, **kwargs)
Parameters:
  • X (np.array, shape (n_samples, n_features)) – the input data
  • kwargs – CatBoost.Pool kwargs
Returns:

np.array, shape (n_samples, ) or (n_samples, n_outputs)

predict_proba(dataset, **kwargs)
Parameters:
  • X (np.array, shape (n_samples, n_features)) – the input data
  • kwargs – CatBoost.Pool kwargs
Returns:

np.array, shape (n_samples, n_classes)

save_snapshot(filename)
Parameters:filename – prefix for models’ files
Returns:serializable internal model state snapshot.
class modelgym.models.ensemble_model.EnsembleRegressor(params=None)

Bases: modelgym.models.model.Model

Parameters:params (dict) – parameters for model
fit(dataset, weights=None, **kwargs)
Parameters:
  • dataset (XYCDataset) – train
  • y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data
  • weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
  • eval_dataset – same as dataset
  • kwargs – CatBoost.Pool kwargs if eval_dataset == None or {'train': train_kwargs, 'eval': eval_kwargs} otherwise
Returns:

self

static get_default_parameter_space()
Returns:dict of DistributionWrappers
static get_learning_task()
is_possible_predict_proba()
Returns:bool, whether model can predict proba
static load_from_snapshot(filename, models)
Parameters:filename – prefix for models’ files
Returns:EnsembleClassifier
predict(dataset, **kwargs)
Parameters:
  • X (np.array, shape (n_samples, n_features)) – the input data
  • kwargs – CatBoost.Pool kwargs
Returns:

np.array, shape (n_samples, ) or (n_samples, n_outputs)

predict_proba(dataset, **kwargs)

Regressor can’t predict proba

save_snapshot(filename)
Parameters:filename – prefix for models’ files
Returns:serializable internal model state snapshot.