In this fourth industrial age – the Age of AI, the knowledge and applications of machine learning is essential for every business and industry. It has a lot of potential and benefit when incorporated into your business . But not everyone has the time and the energy to learn the intricacies of Data Science and the various ML algorithms that are required to benefit from machine learning and Data Science.
This is where PyCaret comes to play. PyCaret is an open source simple to use python library for doing machine learning. It is quite easy yet powerful in the sense that you don’t need to know all the ML algorithms and the nitty gritty before creating a production ready model for your business.
PyCaret makes it easier for you. It acts as a wrapper around the most popular Machine Learning Libraries such as Scikit-learn,Xgboost, LightGB,etc. It also offers a simple API of functions that you can use to build and evaluate several models without much stress.
In this tutorial we will explore PyCaret and see how to use for predicting the mortality of Heart Failure among patients.
Installation
To install PyCaret you can use pip as below
pip install pycaret
So what can one do with PyCaret? With PyCaret you can easily do the following
Compare Models
Create Models
Tune Models
Evaluate Models
Interpret Models
Make predictions with the Model
Save and Load Models(Model Serialization)
Deploy Models
PyCaret is low-code but powerful – you just need a little code to do all the required activities. So to compare different models you just call compare_models() function and boom – you have a dataframe of several ML Models builts that you can select as you wish.
Let us start with the basic workflow
Workflow
Prepare Data
Initialize Setup
Define the data and the target class
Compare Model
Create Model
Select the one you want
Check accuracy of a selected model -predict
Tune model
Evaluate model
Interpret Model
Save model
We will be using our dataset from UCI and will be working inside Google’ s Colab -but you can also try it locally on your system.
# Simple Tools to Get The Short/Abbrev for an Estimator/Ml Algorithm#! pip install neatutilsimportneatutilsneatutils.get_abbrev('Extreme Gradient Boosting')
Out[24]:
'xgboost'
In [25]:
# Create the modelxgboost_model=pc.create_model('xgboost')
Accuracy
AUC
Recall
Prec.
F1
Kappa
0
0.8095
0.8556
0.6667
0.6667
0.6667
0.5333
1
0.8095
0.8444
0.8333
0.6250
0.7143
0.5758
2
0.8095
0.8673
0.5714
0.8000
0.6667
0.5385
3
0.7619
0.8776
0.7143
0.6250
0.6667
0.4828
4
0.7143
0.8469
0.5714
0.5714
0.5714
0.3571
5
0.9524
0.9490
0.8571
1.0000
0.9231
0.8889
6
0.7619
0.9286
0.4286
0.7500
0.5455
0.4000
7
0.8571
0.9796
0.7143
0.8333
0.7692
0.6667
8
0.8095
0.9490
0.5714
0.8000
0.6667
0.5385
9
0.9000
1.0000
0.6667
1.0000
0.8000
0.7368
Mean
0.8186
0.9098
0.6595
0.7671
0.6990
0.5718
SD
0.0661
0.0552
0.1233
0.1429
0.1047
0.1497
In [26]:
# LogReg Modellogreg_model=pc.create_model('lr')
Accuracy
AUC
Recall
Prec.
F1
Kappa
0
0.8095
0.8222
0.6667
0.6667
0.6667
0.5333
1
0.6190
0.7556
0.5000
0.3750
0.4286
0.1515
2
0.7143
0.7551
0.5714
0.5714
0.5714
0.3571
3
0.7619
0.8367
0.5714
0.6667
0.6154
0.4444
4
0.7143
0.7755
0.5714
0.5714
0.5714
0.3571
5
0.9524
0.9898
0.8571
1.0000
0.9231
0.8889
6
0.9048
0.9592
0.7143
1.0000
0.8333
0.7692
7
0.8571
0.8673
0.5714
1.0000
0.7273
0.6400
8
0.7619
0.7449
0.4286
0.7500
0.5455
0.4000
9
0.8500
0.9286
0.6667
0.8000
0.7273
0.6250
Mean
0.7945
0.8435
0.6119
0.7401
0.6610
0.5167
SD
0.0949
0.0854
0.1137
0.2018
0.1388
0.2080
In [27]:
# Tune the Modeltuned_xgb=pc.tune_model('xgboost')
# optimize threshold for trained modelpc.optimize_threshold(tuned_xgb,true_negative=1500,false_negative=-5000)
Optimized Probability Threshold: 0.11 | Optimized Cost Function: 45000
In [ ]:
# Save Modelspc.save_model(tuned_xgb,'xgb_saved_model_02072020')
Transformation Pipeline and Model Succesfully Saved
In [ ]:
# Loading the saved modelloaded_model=pc.load_model('xgb_saved_model_02072020')
In [38]:
# Interpret Modelpc.interpret_model(tuned_xgb)
In [70]:
# Finalize Model For Predictionfinal_xgb_model=pc.finalize_model(tuned_xgb)
Making a Single Prediction with PyCaret
It is required that you use a dataframe when making your prediction. The predict_model() takes a dataframe and the model that you have built and then returns a dataframe consisting of the unseen data with two additional columns : one for the prediction label and the other for the accuracy score(probability score) for that prediction.
Hence to make a prediction you need to supply your unseen data as a DataFrame.
In [ ]:
## Making A Simple Prediction with PyCaret#### Create A Dataframe#### Dictionary(columns_name:values)
In [71]:
# Method 1df.iloc[1]
Out[71]:
age 55.00
anaemia 0.00
creatinine_phosphokinase 7861.00
diabetes 0.00
ejection_fraction 38.00
high_blood_pressure 0.00
platelets 263358.03
serum_creatinine 1.10
serum_sodium 136.00
sex 1.00
smoking 0.00
time 6.00
class 1.00
Name: 1, dtype: float64
In [72]:
df.iloc[[1]]
Out[72]:
age
anaemia
creatinine_phosphokinase
diabetes
ejection_fraction
high_blood_pressure
platelets
serum_creatinine
serum_sodium
sex
smoking
time
class
1
55.0
0
7861
0
38
0
263358.03
1.1
136
1
0
6
1
In [73]:
unseen_data=df.iloc[[1],:-1]
In [74]:
unseen_data
Out[74]:
age
anaemia
creatinine_phosphokinase
diabetes
ejection_fraction
high_blood_pressure
platelets
serum_creatinine
serum_sodium
sex
smoking
time
1
55.0
0
7861
0
38
0
263358.03
1.1
136
1
0
6
In [75]:
type(unseen_data)
Out[75]:
pandas.core.frame.DataFrame
In [76]:
# Predict with Modelprediction=pc.predict_model(final_xgb_model,data=unseen_data)