In this tutorial we will be building a simple ML app with the awesome ML framework -Streamlit.
We will be using the car evaluation data set from here. Let us see the basic workflow we will be using for this simple project.
- Building the ML Model
- Interpreting the Model
- Building the ML App with Streamlit
Let us start
Building the Machine Learning Model
First of all we will get the dataset from UCI and do some data pre-processing.
# Load EDA Pkgs
import pandas as pd
import numpy as np
# Load Data Vis Pkg
import matplotlib.pyplot as plt
import seaborn as sns
# Load ML Pkgs
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
# For Neural network (MultiLayerPerceptron)
from sklearn.neural_network import MLPClassifier
col_names = ['buying','maint','doors' ,'persons','lug_boot','safety','class']
# Load dataset
df = pd.read_csv("data/car.data",names=col_names)
We will then label-encode our data set using either of these methods:
- Custom Function
- Label Encoder from Sklearn
- OneHot Encoding
- Pandas Get Dummies
In our Case we will be using a custom function to help us encode our data set and then map them to our values for each column respectively. We will then save these labels as dictionaries and use it for building the options sections of our ML app.
# Custom Function buying_label = { ni: n for n,ni in enumerate(set(df['buying']))} maint_label = { ni: n for n,ni in enumerate(set(df['maint']))} doors_label = { ni: n for n,ni in enumerate(set(df['doors']))} persons_label = { ni: n for n,ni in enumerate(set(df['persons']))} lug_boot_label = { ni: n for n,ni in enumerate(set(df['lug_boot']))} safety_label = { ni: n for n,ni in enumerate(set(df['safety']))} class_label = { ni: n for n,ni in enumerate(set(df['class']))}
df1['buying'] = df1['buying'].map(buying_label)
df1['maint'] = df1['maint'].map(maint_label)
df1['doors'] = df1['doors'].map(doors_label)
df1['persons'] = df1['persons'].map(persons_label)
df1['lug_boot'] = df1['lug_boot'].map(lug_boot_label)
df1['safety'] = df1['safety'].map(safety_label)
df1['class'] = df1['class'].map(class_label)
We can also use the label encoder option.
Using LabelEncoder
from sklearn.preprocessing import LabelEncoder
lb=LabelEncoder()
df2 = df
for i in df2.columns:
df2[i]=lb.fit_transform(df2[i])
Building the Model
To summarize we will be using 3 different ML algorithms (LogisticRegression,Naive Bayes and Multi-Layer Perceptron Classifier).
We will first split our dataset into training and test dataset.
Xfeatures = df1[['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety']] ylabels = df1['class']
Split Dataset
X_train, X_test, Y_train, Y_test = train_test_split(Xfeatures, ylabels, test_size=0.30, random_state=7)
# Using - Logisitic Regression
logit = LogisticRegression()
logit.fit(X_train, Y_train)
We can then check for the accuracy of our model using accuracy_score from sklearn.metrics.
print("Accuracy Score:",accuracy_score(Y_test, logit.predict(X_test)))
Our model gave us an accuracy of 0.76. Let us try another algorithm, this time a neural network algorithm.
Using Multi-Layer Perceptron (Neural Network)
# Using Neural Network
nn_clf = MLPClassifier(solver='lbfgs', alpha=1e-5,hidden_layer_sizes=(5, 2), random_state=1)
nn_clf.fit(X_train,Y_train)
print("Accuracy Score:",accuracy_score(Y_test, nn_clf.predict(X_test)))
This gave us a slightly higher accuracy then the LogisticRegression.
Saving the Model For Our App
To build our ML app we will need to save or serialized our ML models and use them for the next section. You can use pickle or joblib, but you should make sure to use the same when loading/de-serializing your model.
# Save Models
import joblib
logit_model = open("logit_car_model.pkl","wb")
joblib.dump(logit,logit_model)
logit_model.close()
nb_model = open("nb_car_model.pkl","wb")
joblib.dump(nb,nb_model)
nb_model.close()
nn_clf_model = open("nn_clf_car_model.pkl","wb")
joblib.dump(nn_clf,nn_clf_model)
nn_clf_model.close()
It is also recommend to use joblib instead of sklearn’s joblib.
Interpreting the Model
Having finished building our model we can use packages such as Lime, Eli5, Shap, etc to help us interpret our model.
Building the ML App with Streamlit
Streamlit makes it quite easier when building ML apps or productionizing your machine learning models. Let us see the structure of our ML app.
Our ML app will have 3 sections or activities.
- EDA
- Prediction
- About
Below is the entire code for the app.
# Core Pkg
import streamlit as st
import os
# EDA Pkgs
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib
matplotlib.use('Agg')
import joblib
@st.cache
def load_data(dataset):
df = pd.read_csv(dataset)
return df
def load_prediction_models(model_file):
loaded_model = joblib.load(open(os.path.join(model_file),"rb"))
return loaded_model
buying_label = {'vhigh': 0, 'low': 1, 'med': 2, 'high': 3}
maint_label = {'vhigh': 0, 'low': 1, 'med': 2, 'high': 3}
doors_label = {'2': 0, '3': 1, '5more': 2, '4': 3}
persons_label = {'2': 0, '4': 1, 'more': 2}
lug_boot_label = {'small': 0, 'big': 1, 'med': 2}
safety_label = {'high': 0, 'med': 1, 'low': 2}
class_label = {'good': 0, 'acceptable': 1, 'very good': 2, 'unacceptable': 3}
# Get the Keys
def get_value(val,my_dict):
for key ,value in my_dict.items():
if val == key:
return value
# Find the Key From Dictionary
def get_key(val,my_dict):
for key ,value in my_dict.items():
if val == value:
return key
def main():
"""Car Evaluation with ML Streamlit App"""
st.title("Car Evaluation")
st.subheader("Streamlit ML App")
# st.image(load_image("cars_images/car1.jpg"),width=300, caption='Images')
activities = ['EDA','Prediction','Gallery','About']
choices = st.sidebar.selectbox("Select Activity",activities)
if choices == 'EDA':
st.subheader("EDA")
data = load_data('data/car_eval_dataset.csv')
st.dataframe(data.head(5))
if st.checkbox("Show Summary of Dataset"):
st.write(data.describe())
# Show Plots
if st.checkbox("Simple Value Plots "):
st.write(sns.countplot(data['class']))
# Use Matplotlib to render seaborn
st.pyplot()
# Show Columns By Selection
if st.checkbox("Select Columns To Show"):
all_columns = data.columns.tolist()
selected_columns = st.multiselect('Select',all_columns)
new_df = data[selected_columns]
st.dataframe(new_df)
if st.checkbox("Pie Plot"):
all_columns_names = data.columns.tolist()
if st.button("Generate Pie Plot"):
st.write(data.iloc[:,-1].value_counts().plot.pie(autopct="%1.1f%%"))
st.pyplot()
if choices == 'Prediction':
st.subheader("Prediction")
buying = st.selectbox('Select Buying Level',tuple(buying_label.keys()))
maint = st.selectbox('Select Maintenance Level',tuple(maint_label.keys()))
doors = st.selectbox('Select Doors',tuple(doors_label.keys()))
persons = st.number_input('Select Num of Persons',2,10)
lug_boot = st.selectbox("Select Lug Boot",tuple(lug_boot_label.keys()))
safety = st.selectbox('Select Safety',tuple(safety_label.keys()))
k_buying = get_value(buying,buying_label)
k_maint = get_value(maint,maint_label)
k_doors = get_value(doors,doors_label)
# k_persons = get_value(persons,persons_label)
k_lug_boot = get_value(lug_boot,lug_boot_label)
k_safety = get_value(safety,safety_label)
pretty_data = {
"buying":buying,
"maint":maint,
"doors":doors,
"persons":persons,
"lug_boot":lug_boot,
"safety":safety,
}
st.subheader("Options Selected")
st.json(pretty_data)
st.subheader("Data Encoded As")
# Data To Be Used
sample_data = [k_buying,k_maint,k_doors,persons,k_lug_boot,k_safety]
st.write(sample_data)
prep_data = np.array(sample_data).reshape(1, -1)
model_choice = st.selectbox("Model Type",['logit','naive bayes','MLP classifier'])
if st.button('Evaluate'):
if model_choice == 'logit':
predictor = load_prediction_models("models/logit_car_model.pkl")
prediction = predictor.predict(prep_data)
st.write(prediction)
if model_choice == 'naive bayes':
predictor = load_prediction_models("models/nb_car_model.pkl")
prediction = predictor.predict(prep_data)
st.write(prediction)
if model_choice == 'MLP classifier':
predictor = load_prediction_models("models/nn_clf_car_model.pkl")
prediction = predictor.predict(prep_data)
st.write(prediction)
final_result = get_key(prediction,class_label)
st.success(final_result)
if __name__ == '__main__':
main()
You can check out the video tutorial here
Thanks For Your Time
Jesus Saves
By Jesse E.Agbe(JCharis)
Great content! Super high-quality! Keep it up! 🙂
Amazing! Really useful, you missed out the naive bayes model in the tutorial text above though; but really great.
Thanks Adam, I appreciate it.
How can i get the “Model” in .pkl format ??? like this >>> nn_clf_car_model.pkl
Hi Maruf, you will need to use joblib or pickle to save your model to pkl format
eg
import joblib
nn_clf_file = open("nn_clf_car_model.pkl","wb")
joblib.dump(your_model,nn_clf_file)
nn_clf_file.close()
Hope it helps