Basic Tutorial¶
Welcome to Modelgym Basic Tutorial.
As an example, we will show you how to use Modelgym for binary classification problem.
Choosing the models.
Searching for the best hyperparameters on default spaces using TPE algorithm locally.
Visualizing the results.
In this tutorial we will go through the following steps:
Define models we want to use¶
In this tutorial, we will use
- LightGBMClassifier
- XGBoostClassifier
- RandomForestClassifier
- CatBoostClassifier
from modelgym.models import LGBMClassifier, XGBClassifier, RFClassifier, CtBClassifier
/Users/f-minkin/.pyenv/versions/3.6.2/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
models = [LGBMClassifier, XGBClassifier, RFClassifier, CtBClassifier]
Get dataset¶
For tutorial purposes we will use toy dataset
from sklearn.datasets import make_classification
from modelgym.utils import XYCDataset
X, y = make_classification(n_samples=500, n_features=20, n_informative=10, n_classes=2)
dataset = XYCDataset(X, y)
Create a TPE trainer¶
from modelgym.trainers import TpeTrainer
trainer = TpeTrainer(models)
Optimize hyperparams¶
We chose accuracy as a main metric that we rely on when optimizing hyperparams.
Also keep track for RocAuc and F1 measure besides accuracy for our best models.
Please, keep in mind, that now we’re optimizing hyperparameters from the default space of hyperparameters. That means, they are not optimal, for optimal ones and complete understanding follow advanced tutorial.
from modelgym.metrics import Accuracy, RocAuc, F1
Of course, it will take some time.
%%time
trainer.crossval_optimize_params(Accuracy(), dataset, metrics=[Accuracy(), RocAuc(), F1()])
/Users/f-minkin/.pyenv/versions/3.6.2/lib/python3.6/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 due to no predicted samples.
'precision', 'predicted', average, warn_for)
CPU times: user 2h 2min 45s, sys: 47min 59s, total: 2h 50min 45s
Wall time: 28min 17s
Report best results¶
from modelgym.report import Report
reporter = Report(trainer.get_best_results(), dataset, [Accuracy(), RocAuc(), F1()])
Report in text form¶
reporter.print_all_metric_results()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ accuracy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tuned
LGBMClassifier 0.776002 (0.00%)
XGBClassifier 0.838059 (8.00%)
RFClassifier 0.800075 (3.10%)
CtBClassifier 0.861963 (11.08%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ roc_auc ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tuned
LGBMClassifier 0.815768 (0.00%)
XGBClassifier 0.904991 (10.94%)
RFClassifier 0.875230 (7.29%)
CtBClassifier 0.926832 (13.61%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ f1_score ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tuned
LGBMClassifier 0.777157 (0.00%)
XGBClassifier 0.835813 (7.55%)
RFClassifier 0.792136 (1.93%)
CtBClassifier 0.859078 (10.54%)
Report plots¶
reporter.plot_all_metrics()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ accuracy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ roc_auc ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ f1_score ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Report heatmaps for each metric¶
reporter.plot_heatmaps()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ accuracy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ roc_auc ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ f1_score ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
That’s it!
If you like it, please follow the advanced tutorial and learn all features modelgym can provide.