text classification with machine learning and streamlit

Building a News Classifier Machine Learning App with Streamlit

Among the many applications of machine learning and AI is text classification. In this tutorial we will see how to build a news classifier app with streamlit and python. We will be using our already prepared ML models to help us with our prediction.

First of all, let us install the various packages we will be using.

pip install streamlit scikit-learn joblib wordcloud pandas matplotlib

The basic structure of our ML app will consist of two main sections.

  • Prediction with ML Section
  • NLP with Spacy and WordCloud

We will use streamlit’s sidebar to create a menu for selecting our activities. All our code will be in a main function called main()

# IMPORT ALL PACKAGES HERE

def main():
    #OUR CODE GOES HERE

if __name__ == '__main__':
	main()

 

Building the News Classifier Section

We will be using streamlit and scikit-learn to work on this section.

In building our ML app we will need to have a means of receiving input from the end user and then process that input with our models. That means we will be using streamlit’s text_area() function to get input from the user like this.

news_text = st.text_area("Enter Text","Type Here")

Since our models cannot work with text, we will need to vectorized them or convert them into numbers. Hence we will be using countvectorizer to vectorize our text into an array of numbers so that our ML model will be able to process them.

# Load Our CountVectorizer
news_vectorizer = open("models/final_news_cv_vectorizer.pkl","rb")
news_cv = joblib.load(news_vectorizer)

For making our predictions we will load our already prepared models using joblib a serialization package.

# Load Our Models
def load_prediction_models(model_file):
	loaded_models = joblib.load(open(os.path.join(model_file),"rb"))
	return loaded_models

This approach will save us a lot of time and also reduce the size of our code.

Finally for our News Classification Section we will convert our result which will be in numbers to a user friendly one using a dictionary of our prediction labels. So we will add a function to do that aspect for us.

Let us see the  code for our ML Prediction Section.

if choice == 'Prediction':
		st.info("Prediction with ML")

		news_text = st.text_area("Enter Text","Type Here")
		all_ml_models = ["LR","NB","RFOREST","DECISION_TREE"]
		model_choice = st.selectbox("Choose ML Model",all_ml_models)
		prediction_labels = {'business':0,'tech':1,'sport':2,'health':3,'politics':4,'entertainment':5}
		if st.button("Classify"):
			st.text("Original test ::\n{}".format(news_text))
			vect_text = news_cv.transform([news_text]).toarray()
			if model_choice == 'LR':
				predictor = load_prediction_models("models/newsclassifier_Logit_model.pkl")
				prediction = predictor.predict(vect_text)
				# st.write(prediction)
			elif model_choice == 'RFOREST':
				predictor = load_prediction_models("models/newsclassifier_RFOREST_model.pkl")
				prediction = predictor.predict(vect_text)
				# st.write(prediction)
			elif model_choice == 'NB':
				predictor = load_prediction_models("models/newsclassifier_NB_model.pkl")
				prediction = predictor.predict(vect_text)
				# st.write(prediction)
			elif model_choice == 'DECISION_TREE':
				predictor = load_prediction_models("models/newsclassifier_CART_model.pkl")
				prediction = predictor.predict(vect_text)
				# st.write(prediction)

			final_result = get_keys(prediction,prediction_labels)
			st.success("News Categorized as:: {}".format(final_result))

 

Building the NLP Section of our App

For our natural language processing, we will be using spacy and wordcloud. Spacy is a powerful NLP library for doing various natural language activities such as tokenization,named entity recognition, dependency parsing and more. In our case we will be using spacy for our tokenization,named entity recognition and lemmatization for our nlp task.

We will then display our result as both a json format and in a table using pandas dataframe.

We will then use wordcloud to build a pictorial form for the most commonest words in our text.

This is the entire code for the NLP Section of our App.

if choice == 'NLP':
		st.info("Natural Language Processing")
		news_text = st.text_area("Enter Text","Type Here")
		nlp_task = ["Tokenization","NER","Lemmatization","POS Tags"]
		task_choice = st.selectbox("Choose NLP Task",nlp_task)
		if st.button("Analyze"):
			st.info("Original Text {}".format(news_text))

			docx = nlp(news_text)
			if task_choice == 'Tokenization':
				result = [ token.text for token in docx ]
				
			elif task_choice == 'Lemmatization':
				result = ["'Token':{},'Lemma':{}".format(token.text,token.lemma_) for token in docx]
			elif task_choice == 'NER':
				result = [(entity.text,entity.label_)for entity in docx.ents]
			elif task_choice == 'POS Tags':
				result = ["'Token':{},'POS':{},'Dependency':{}".format(word.text,word.tag_,word.dep_) for word in docx]

			st.json(result)

		if st.button("Tabulize"):
			docx = nlp(news_text)
			c_tokens = [ token.text for token in docx ]
			c_lemma = [token.lemma_ for token in docx]
			c_pos = [word.tag_ for word in docx]

			new_df = pd.DataFrame(zip(c_tokens,c_lemma,c_pos),columns=['Tokens','Lemma','POS'])
			st.dataframe(new_df)

		if st.checkbox("Wordcloud"):
			wordcloud =  WordCloud().generate(news_text)
			plt.imshow(wordcloud,interpolation='bilinear')
			plt.axis("off")
			st.pyplot()

In summary our entire code will be like

import streamlit as st 
import joblib,os

# NLP Pkgs
import spacy
nlp = spacy.load('en')

# EDA pkgs
import pandas as pd

# Wordcloud
from wordcloud import WordCloud
from PIL import Image

import matplotlib.pyplot as plt 
import matplotlib
matplotlib.use('Agg')
# Vectorizer
news_vectorizer = open("models/final_news_cv_vectorizer.pkl","rb")
news_cv = joblib.load(news_vectorizer)

# Load Our Models
def load_prediction_models(model_file):
	loaded_models = joblib.load(open(os.path.join(model_file),"rb"))
	return loaded_models

def get_keys(val,my_dict):
	for key,value in my_dict.items():
		if val == value:
			return key


def main():
	"""News Classifier App with Streamlit """
	st.title("News Classifer ML App")
	st.subheader("NLP and ML App with Streamlit")

	activities = ["Prediction","NLP"]

	choice = st.sidebar.selectbox("Choose Activity",activities)

	if choice == 'Prediction':
		st.info("Prediction with ML")

		news_text = st.text_area("Enter Text","Type Here")
		all_ml_models = ["LR","NB","RFOREST","DECISION_TREE"]
		model_choice = st.selectbox("Choose ML Model",all_ml_models)
		prediction_labels = {'business':0,'tech':1,'sport':2,'health':3,'politics':4,'entertainment':5}
		if st.button("Classify"):
			st.text("Original test ::\n{}".format(news_text))
			vect_text = news_cv.transform([news_text]).toarray()
			if model_choice == 'LR':
				predictor = load_prediction_models("models/newsclassifier_Logit_model.pkl")
				prediction = predictor.predict(vect_text)
				# st.write(prediction)
			elif model_choice == 'RFOREST':
				predictor = load_prediction_models("models/newsclassifier_RFOREST_model.pkl")
				prediction = predictor.predict(vect_text)
				# st.write(prediction)
			elif model_choice == 'NB':
				predictor = load_prediction_models("models/newsclassifier_NB_model.pkl")
				prediction = predictor.predict(vect_text)
				# st.write(prediction)
			elif model_choice == 'DECISION_TREE':
				predictor = load_prediction_models("models/newsclassifier_CART_model.pkl")
				prediction = predictor.predict(vect_text)
				# st.write(prediction)

			final_result = get_keys(prediction,prediction_labels)
			st.success("News Categorized as:: {}".format(final_result))



	if choice == 'NLP':
		st.info("Natural Language Processing")
		news_text = st.text_area("Enter Text","Type Here")
		nlp_task = ["Tokenization","NER","Lemmatization","POS Tags"]
		task_choice = st.selectbox("Choose NLP Task",nlp_task)
		if st.button("Analyze"):
			st.info("Original Text {}".format(news_text))

			docx = nlp(news_text)
			if task_choice == 'Tokenization':
				result = [ token.text for token in docx ]
				
			elif task_choice == 'Lemmatization':
				result = ["'Token':{},'Lemma':{}".format(token.text,token.lemma_) for token in docx]
			elif task_choice == 'NER':
				result = [(entity.text,entity.label_)for entity in docx.ents]
			elif task_choice == 'POS Tags':
				result = ["'Token':{},'POS':{},'Dependency':{}".format(word.text,word.tag_,word.dep_) for word in docx]

			st.json(result)

		if st.button("Tabulize"):
			docx = nlp(news_text)
			c_tokens = [ token.text for token in docx ]
			c_lemma = [token.lemma_ for token in docx]
			c_pos = [word.tag_ for word in docx]

			new_df = pd.DataFrame(zip(c_tokens,c_lemma,c_pos),columns=['Tokens','Lemma','POS'])
			st.dataframe(new_df)

		if st.checkbox("Wordcloud"):
			wordcloud =  WordCloud().generate(news_text)
			plt.imshow(wordcloud,interpolation='bilinear')
			plt.axis("off")
			st.pyplot()






if __name__ == '__main__':
	main()

 

You can check the entire video tutorial here.

To get more on building machine learning and natural language processing apps, you can check out this upcoming course.

Thanks For Your Time

Jesus Saves

By Jesse E.Agbe (JCharis)

 

 

6 thoughts on “Building a News Classifier Machine Learning App with Streamlit”

    1. Hello Bishesh, what kind of problems if I may ask?
      Is it possible to use a virtual environment such as pipenv or virtualenv.
      Installing Pipenv
      pip3 install pipenv

      Setting Up your Virtual environment
      pipenv install streamlit pandas matplotlib

      Hope it helps

      1. Bishesh amatya

        I created virtual environment but when i try to install spacy it shows error… Cannot run ‘rc.exe’

          1. Hi Bishesh,pls are you on windows. If so and you have space you
            can try installing anaconda if you like. It comes with several
            python packages and with that you can install spacy and wordcloud without any issues.
            Please let me know the outcome.
            Thanks

    2. Hello Bishesh, what kind of problems if I may ask?
      Is it possible to use a virtual environment such as pipenv or virtualenv.
      Installing Pipenv
      pip3 install pipenv

      Setting Up your Virtual environment
      pipenv install streamlit pandas matplotlib

      Hope it helps

Leave a Comment

Your email address will not be published. Required fields are marked *