Models¶

In order to use our Trainer you need the wrapper on your model. You can find the required Model interface below.

We implement wrappers for several models:

XGBoost
LightGBM
RandomForestClassifier
Catboost

Also, we implement an Ensemble Model.

Model interface¶

class modelgym.models.model.Model(params=None)¶

Model is a base class for a specific ML algorithm implementation factory, i.e. it defines algorithm-specific hyperparameter space and generic methods for model training & inference

Parameters:	params (dict or None) – parameters for model.

fit(dataset, weights=None)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
Returns:	self

static get_default_parameter_space()¶

Returns:	default parameter space
Return type:	dict from parameter name to hyperopt distribution

static get_learning_task()¶

Returns:	task
Return type:	modelgym.models.LearningTask

is_possible_predict_proba()¶

Returns:	bool, whether model can predict proba

static load_from_snapshot(filename)¶: :snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset)¶

Parameters:	dataset (modelgym.utils.XYCDataset) – the input data, dataset.y may be None
Returns:	predictions
Return type:	np.array, shape (n_samples, )

predict_proba(X)¶

Parameters:	dataset (np.array, shape (n_samples, n_features)) – the input data
Returns:	predicted probabilities
Return type:	np.array, shape (n_samples, n_classes)

save_snapshot(filename)¶

Returns:	serializable internal model state snapshot.

XGBoost¶

class modelgym.models.xgboost_model.XGBClassifier(params=None)¶

Bases: modelgym.models.model.Model

Parameters:	params (dict) – parameters for model.

fit(dataset, weights=None)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
Returns:	self

static get_default_parameter_space()¶

Returns:	dict of DistributionWrappers

static get_learning_task()¶

is_possible_predict_proba()¶

Returns:	bool, whether model can predict proba

static load_from_snapshot(filename)¶: :snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data
Returns:	np.array, shape (n_samples, ) or (n_samples, n_outputs)

predict_proba(dataset)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data
Returns:	np.array, shape (n_samples, n_classes)

save_snapshot(filename)¶

Returns:	serializable internal model state snapshot.

class modelgym.models.xgboost_model.XGBRegressor(params=None)¶

Bases: modelgym.models.model.Model

Parameters:	params (dict or None) – parameters for model. If None default params are fetched. learning_task (str) – set type of task(classification, regression, …)

fit(dataset, weights=None)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
Returns:	self

static get_default_parameter_space()¶

Returns:	dict of DistributionWrappers

static get_learning_task()¶

is_possible_predict_proba()¶

Returns:	bool, whether model can predict proba

static load_from_snapshot(filename)¶: :snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data
Returns:	np.array, shape (n_samples, ) or (n_samples, n_outputs)

predict_proba(dataset)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data
Returns:	np.array, shape (n_samples, n_classes)

save_snapshot(filename)¶

Returns:	serializable internal model state snapshot.

LightGBM¶

class modelgym.models.lightgbm_model.LGBMClassifier(params=None)¶

Bases: modelgym.models.model.Model

Parameters:	params (dict or None) – parameters for model. If None default params are fetched. learning_task (str) – set type of task(classification, regression, …)

fit(dataset, weights=None)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
Returns:	self

static get_default_parameter_space()¶

Returns:	dict of DistributionWrappers

static get_learning_task()¶

is_possible_predict_proba()¶

Returns:	bool, whether model can predict proba

static load_from_snapshot(filename)¶: :snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data
Returns:	np.array, shape (n_samples, ) or (n_samples, n_outputs)

predict_proba(dataset)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data
Returns:	np.array, shape (n_samples, n_classes)

save_snapshot(filename)¶

Returns:	serializable internal model state snapshot.

class modelgym.models.lightgbm_model.LGBMRegressor(params=None)¶

Bases: modelgym.models.model.Model

Parameters:	params (dict or None) – parameters for model. If None default params are fetched. learning_task (str) – set type of task(classification, regression, …)

fit(dataset, weights=None)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
Returns:	self

static get_default_parameter_space()¶

Returns:	dict of DistributionWrappers

static get_learning_task()¶

is_possible_predict_proba()¶

Returns:	bool, whether model can predict proba

static load_from_snapshot(filename)¶: :snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data
Returns:	np.array, shape (n_samples, ) or (n_samples, n_outputs)

predict_proba(dataset)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data
Returns:	np.array, shape (n_samples, n_classes)

save_snapshot(filename)¶: Return: serializable internal model state snapshot.

RandomForestClassifier¶

class modelgym.models.rf_model.RFClassifier(params=None)¶

Bases: modelgym.models.model.Model

Parameters:	params (dict or None) – parameters for model. If None default params are fetched. learning_task (str) – set type of task(classification, regression, …)

fit(dataset, weights=None)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
Returns:	self

static get_default_parameter_space()¶

Returns:	dict of DistributionWrappers

static get_learning_task()¶

is_possible_predict_proba()¶

Returns:	bool, whether model can predict proba

static load_from_snapshot(filename)¶: :snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data
Returns:	np.array, shape (n_samples, ) or (n_samples, n_outputs)

predict_proba(dataset)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data
Returns:	np.array, shape (n_samples, n_classes)

save_snapshot(filename)¶

Returns:	serializable internal model state snapshot.

Catboost¶

class modelgym.models.catboost_model.CtBClassifier(params=None)¶

Bases: modelgym.models.model.Model

Wrapper for CatBoostClassifier

Parameters:	params (dict) – parameters for model.

fit(dataset, weights=None, eval_dataset=None, **kwargs)¶

Parameters:

dataset (XYCDataset) – train
y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data
weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
eval_dataset – same as dataset
kwargs – CatBoost.Pool kwargs if eval_dataset is None or {'train': train_kwargs, 'eval': eval_kwargs} otherwise

Returns:

self

static get_default_parameter_space()¶

Returns:	dict of DistributionWrappers

static get_learning_task()¶

is_possible_predict_proba()¶

Returns:	bool, whether model can predict proba

static load_from_snapshot(filename)¶: :snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset, **kwargs)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data kwargs – CatBoost.Pool kwargs
Returns:	np.array, shape (n_samples, ) or (n_samples, n_outputs)

predict_proba(dataset, **kwargs)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data kwargs – CatBoost.Pool kwargs
Returns:	np.array, shape (n_samples, n_classes)

save_snapshot(filename)¶

Returns:	serializable internal model state snapshot.

class modelgym.models.catboost_model.CtBRegressor(params=None)¶

Bases: modelgym.models.model.Model

Wrapper for CatBoostRegressor

Parameters:	params (dict or None) – parameters for model. If None default params are fetched. learning_task (str) – set type of task(classification, regression, …)

fit(dataset, weights=None, eval_dataset=None, **kwargs)¶

Parameters:	dataset (XYCDataset) – weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data eval_dataset – same as dataset kwargs – CatBoost.Pool kwargs if eval_dataset is None or `{'train': train_kwargs, 'eval': eval_kwargs}` otherwise
Returns:	self

static get_default_parameter_space()¶

Returns:	dict of DistributionWrappers

static get_learning_task()¶

is_possible_predict_proba()¶

Returns:	bool, whether model can predict proba

static load_from_snapshot(filename)¶: :snapshot serializable internal model state loads from serializable internal model state snapshot.

predict(dataset, **kwargs)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data kwargs – CatBoost.Pool kwargs
Returns:	np.array, shape (n_samples, ) or (n_samples, n_outputs)

predict_proba(dataset, **kwargs)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data kwargs – CatBoost.Pool kwargs
Returns:	np.array, shape (n_samples, n_classes)

save_snapshot(filename)¶

Returns:	serializable internal model state snapshot.

Ensemble Model¶

class modelgym.models.ensemble_model.EnsembleClassifier(params=None)¶

Bases: modelgym.models.model.Model

Parameters:	params (dict) – parameters for model.

fit(dataset, weights=None, **kwargs)¶

Parameters:

dataset (XYCDataset) – train
y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data
weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
eval_dataset – same as dataset
kwargs – CatBoost.Pool kwargs if eval_dataset == None or {'train': train_kwargs, 'eval': eval_kwargs} otherwise

Returns:

self

static get_default_parameter_space()¶

Returns:	dict of DistributionWrappers

static get_learning_task()¶

static get_one_hot(targets, nb_classes)¶

is_possible_predict_proba()¶

Returns:	bool, whether model can predict proba

static load_from_snapshot(filename, models)¶

Parameters:	filename – prefix for models’ files
Returns:	EnsembleClassifier

predict(dataset, **kwargs)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data kwargs – CatBoost.Pool kwargs
Returns:	np.array, shape (n_samples, ) or (n_samples, n_outputs)

predict_proba(dataset, **kwargs)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data kwargs – CatBoost.Pool kwargs
Returns:	np.array, shape (n_samples, n_classes)

save_snapshot(filename)¶

Parameters:	filename – prefix for models’ files
Returns:	serializable internal model state snapshot.

class modelgym.models.ensemble_model.EnsembleRegressor(params=None)¶

Bases: modelgym.models.model.Model

Parameters:	params (dict) – parameters for model

fit(dataset, weights=None, **kwargs)¶

Parameters:

dataset (XYCDataset) – train
y (np.array, shape (n_samples, ) or (n_samples, n_outputs)) – the target data
weights (np.array, shape (n_samples, ) or (n_samples, n_outputs) or None) – weights of the data
eval_dataset – same as dataset
kwargs – CatBoost.Pool kwargs if eval_dataset == None or {'train': train_kwargs, 'eval': eval_kwargs} otherwise

Returns:

self

static get_default_parameter_space()¶

Returns:	dict of DistributionWrappers

static get_learning_task()¶

is_possible_predict_proba()¶

Returns:	bool, whether model can predict proba

static load_from_snapshot(filename, models)¶

Parameters:	filename – prefix for models’ files
Returns:	EnsembleClassifier

predict(dataset, **kwargs)¶

Parameters:	X (np.array, shape (n_samples, n_features)) – the input data kwargs – CatBoost.Pool kwargs
Returns:	np.array, shape (n_samples, ) or (n_samples, n_outputs)

predict_proba(dataset, **kwargs)¶: Regressor can’t predict proba

save_snapshot(filename)¶

Parameters:	filename – prefix for models’ files
Returns:	serializable internal model state snapshot.