Simple Machine Learning App with Streamlit using Car Evaluation Dataset

In this tutorial we will be building a simple ML app with the awesome ML framework -Streamlit.

We will be using the car evaluation data set from here. Let us see the basic workflow we will be using for this simple project.

  • Building the ML Model
  • Interpreting the Model
  • Building the ML App with Streamlit

Let us start

Building the Machine Learning Model

First of all we will get the dataset from UCI and do some data pre-processing.

In [1]:
# Load EDA Pkgs
import pandas as pd 
import numpy as np
In [2]:
# Load Data Vis Pkg
import matplotlib.pyplot as plt 
import seaborn as sns
In [5]:
# Load ML Pkgs
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
In [6]:
# For Neural network (MultiLayerPerceptron)
from sklearn.neural_network import MLPClassifier
In [9]:
col_names = ['buying','maint','doors' ,'persons','lug_boot','safety','class']
In [93]:
# Load dataset
df = pd.read_csv("data/car.data",names=col_names)

We will then label-encode our data set using either of these methods:

  1. Custom Function
  2. Label Encoder from Sklearn
  3. OneHot Encoding
  4. Pandas Get Dummies

In our Case we will be using a custom function to help us encode our data set and then map them to our values for each column respectively. We will then save these labels as dictionaries and use it for building the options sections of our ML app.

# Custom Function
buying_label = { ni: n for n,ni in enumerate(set(df['buying']))}
maint_label = { ni: n for n,ni in enumerate(set(df['maint']))}
doors_label = { ni: n for n,ni in enumerate(set(df['doors']))}
persons_label = { ni: n for n,ni in enumerate(set(df['persons']))}
lug_boot_label = { ni: n for n,ni in enumerate(set(df['lug_boot']))}
safety_label = { ni: n for n,ni in enumerate(set(df['safety']))}
class_label = { ni: n for n,ni in enumerate(set(df['class']))}
df1['buying'] = df1['buying'].map(buying_label)
df1['maint'] = df1['maint'].map(maint_label)
df1['doors'] = df1['doors'].map(doors_label)
df1['persons'] = df1['persons'].map(persons_label)
df1['lug_boot'] = df1['lug_boot'].map(lug_boot_label)
df1['safety'] = df1['safety'].map(safety_label)
df1['class'] = df1['class'].map(class_label)

We can also use the label encoder option.

Using LabelEncoder
from sklearn.preprocessing import LabelEncoder
lb=LabelEncoder()
df2 = df
for i in df2.columns:
    df2[i]=lb.fit_transform(df2[i])
Building the Model

To summarize we will be using 3 different ML algorithms (LogisticRegression,Naive Bayes and Multi-Layer Perceptron Classifier).

We will first split our dataset into training and test dataset.

Xfeatures = df1[['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety']]
ylabels = df1['class']
Split Dataset
X_train, X_test, Y_train, Y_test = train_test_split(Xfeatures, ylabels, test_size=0.30, random_state=7)
Using LogisticRegression
# Using - Logisitic Regression
logit = LogisticRegression()
logit.fit(X_train, Y_train)

We can then check for the accuracy of our model using accuracy_score from sklearn.metrics.

print("Accuracy Score:",accuracy_score(Y_test, logit.predict(X_test)))
Accuracy Score: 0.7610789980732178

Our model gave us an accuracy of 0.76. Let us try another algorithm, this time a neural network algorithm.

Using Multi-Layer Perceptron (Neural Network)
# Using Neural Network
nn_clf = MLPClassifier(solver='lbfgs', alpha=1e-5,hidden_layer_sizes=(5, 2), random_state=1)
nn_clf.fit(X_train,Y_train)
Out:
MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(5, 2), learning_rate='constant',
              learning_rate_init=0.001, max_fun=15000, max_iter=200,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=1, shuffle=True, solver='lbfgs',
              tol=0.0001, validation_fraction=0.1, verbose=False,
              warm_start=False)
print("Accuracy Score:",accuracy_score(Y_test, nn_clf.predict(X_test)))
Accuracy Score: 0.7707129094412332

This gave us a slightly higher accuracy then the LogisticRegression.

 

Saving the Model For Our App

To build our ML app we will need to save or serialized our ML models and use them for the next section. You can use pickle or joblib, but you should make sure to use the same when loading/de-serializing your model.

In [86]:
# Save Models
import joblib
In [87]:
logit_model = open("logit_car_model.pkl","wb")
joblib.dump(logit,logit_model)
logit_model.close()
In [88]:
nb_model = open("nb_car_model.pkl","wb")
joblib.dump(nb,nb_model)
nb_model.close()
In [89]:
nn_clf_model = open("nn_clf_car_model.pkl","wb")
joblib.dump(nn_clf,nn_clf_model)
nn_clf_model.close()

It is also recommend to use joblib instead of sklearn’s joblib.

Interpreting the Model

Having finished building our model we can use packages such as Lime, Eli5, Shap, etc to help us interpret our model.

 

Building the ML App with Streamlit

Streamlit makes it quite easier when building ML apps or productionizing your machine learning models.  Let us see the structure of our ML app.

Our ML app will have 3 sections or activities.

  • EDA
  • Prediction
  • About

Below is the entire code for the app.

# Core Pkg
import streamlit as st
import os

# EDA Pkgs
import pandas as pd
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt 
import matplotlib
matplotlib.use('Agg') 
import joblib

@st.cache
def load_data(dataset):
	df = pd.read_csv(dataset)
	return df


def load_prediction_models(model_file):
	loaded_model = joblib.load(open(os.path.join(model_file),"rb"))
	return loaded_model

buying_label = {'vhigh': 0, 'low': 1, 'med': 2, 'high': 3}
maint_label = {'vhigh': 0, 'low': 1, 'med': 2, 'high': 3}
doors_label = {'2': 0, '3': 1, '5more': 2, '4': 3}
persons_label = {'2': 0, '4': 1, 'more': 2}
lug_boot_label = {'small': 0, 'big': 1, 'med': 2}
safety_label = {'high': 0, 'med': 1, 'low': 2}
class_label = {'good': 0, 'acceptable': 1, 'very good': 2, 'unacceptable': 3}

# Get the Keys
def get_value(val,my_dict):
	for key ,value in my_dict.items():
		if val == key:
			return value

# Find the Key From Dictionary
def get_key(val,my_dict):
	for key ,value in my_dict.items():
		if val == value:
			return key


def main():
	"""Car Evaluation with ML Streamlit App"""

	st.title("Car Evaluation")
	st.subheader("Streamlit ML App")
	# st.image(load_image("cars_images/car1.jpg"),width=300, caption='Images')

	activities = ['EDA','Prediction','Gallery','About']
	choices = st.sidebar.selectbox("Select Activity",activities)

	if choices == 'EDA':
		st.subheader("EDA")
		data = load_data('data/car_eval_dataset.csv')
		st.dataframe(data.head(5))

		if st.checkbox("Show Summary of Dataset"):
			st.write(data.describe())

		# Show Plots
		if st.checkbox("Simple Value Plots "):
			st.write(sns.countplot(data['class']))
			# Use Matplotlib to render seaborn
			st.pyplot()

		# Show Columns By Selection
		if st.checkbox("Select Columns To Show"):
			all_columns = data.columns.tolist()
			selected_columns = st.multiselect('Select',all_columns)
			new_df = data[selected_columns]
			st.dataframe(new_df)

		if st.checkbox("Pie Plot"):
				all_columns_names = data.columns.tolist()
				if st.button("Generate Pie Plot"):
					st.write(data.iloc[:,-1].value_counts().plot.pie(autopct="%1.1f%%"))
					st.pyplot()


	if choices == 'Prediction':
		st.subheader("Prediction")

		buying = st.selectbox('Select Buying Level',tuple(buying_label.keys()))
		maint = st.selectbox('Select Maintenance Level',tuple(maint_label.keys()))
		doors = st.selectbox('Select Doors',tuple(doors_label.keys()))
		persons = st.number_input('Select Num of Persons',2,10)
		lug_boot = st.selectbox("Select Lug Boot",tuple(lug_boot_label.keys()))
		safety = st.selectbox('Select Safety',tuple(safety_label.keys()))

		k_buying = get_value(buying,buying_label)
		k_maint = get_value(maint,maint_label)
		k_doors = get_value(doors,doors_label)
		# k_persons = get_value(persons,persons_label)
		k_lug_boot = get_value(lug_boot,lug_boot_label)
		k_safety = get_value(safety,safety_label)

		
		pretty_data = {
		"buying":buying,
		"maint":maint,
		"doors":doors,
		"persons":persons,
		"lug_boot":lug_boot,
		"safety":safety,
		}
		st.subheader("Options Selected")
		st.json(pretty_data)

		st.subheader("Data Encoded As")
		# Data To Be Used
		sample_data = [k_buying,k_maint,k_doors,persons,k_lug_boot,k_safety]
		st.write(sample_data)

		prep_data = np.array(sample_data).reshape(1, -1)

		model_choice = st.selectbox("Model Type",['logit','naive bayes','MLP classifier'])
		if st.button('Evaluate'):
			if model_choice == 'logit':
				predictor = load_prediction_models("models/logit_car_model.pkl")
				prediction = predictor.predict(prep_data)
				st.write(prediction)

			if model_choice == 'naive bayes':
				predictor = load_prediction_models("models/nb_car_model.pkl")
				prediction = predictor.predict(prep_data)
				st.write(prediction)

			if model_choice == 'MLP classifier':
				predictor = load_prediction_models("models/nn_clf_car_model.pkl")
				prediction = predictor.predict(prep_data)
				st.write(prediction)


			final_result = get_key(prediction,class_label)
			st.success(final_result)


if __name__ == '__main__':
	main()

 

You can check out the video tutorial here

Thanks For Your Time

Jesus Saves

By Jesse E.Agbe(JCharis)

 

 

 

5 thoughts on “Simple Machine Learning App with Streamlit using Car Evaluation Dataset”

  1. Amazing! Really useful, you missed out the naive bayes model in the tutorial text above though; but really great.

    1. jesse_jcharis

      Hi Maruf, you will need to use joblib or pickle to save your model to pkl format
      eg

      import joblib
      nn_clf_file = open("nn_clf_car_model.pkl","wb")
      joblib.dump(your_model,nn_clf_file)
      nn_clf_file.close()

      Hope it helps

Leave a Comment

Your email address will not be published. Required fields are marked *