Predicting Bible Verse and their location using ML with Python
In this tutorial we will be learning how to predict the location of a particular bible verse or passage whether they are in the old testament or new testament using Machine Learning. This is a supervised machine learning approach in which we have a set of features and a target label.
The features would be built from the various bible verses and the target label will be the Old Testament as 0 and the New Testament As 1.
Since we will be dealing with text document it will be recommended to use a very good machine learning algorithms that is good with text classification and binary classification problems.
We will be using Naive Bayes Classifier for building our model,since it is very good when working with text. We will need to convert the text into word vectors using the CountVectorizer/ TermFrequency Inverse Document Vectorizer to arrive at our vectors.
Requirements
- Python 3x
- Scikit Learn
- Our Dataset precisely KJV
Let us start
# Load EDA Packages
import pandas as pd
# Load ML Packages
from sklearn.feature_extraction.text import CountVectorizer
#from sklearn.cross_validation import train_test_split b17
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
# Load Dataset
df = pd.read_csv("kjv_cleandata1.csv")
df.head()
# EDA
df.columns
df.shape
# Missing NA
df.isnull().sum()
# Find the longest verse
df.text.str.len().max()
# Location
df.text.str.len().idxmax()
df.loc[12826]
df.loc[12826].text
### Model Building
- Label all old testament as 0
- Label new testament as 1
df2 = df
df2.loc[0:23144,'label'] = 0
df2.loc[23145:,'label'] = 1
df2.head()
df2.to_csv("kjv2mindata.csv")
Xfeatures = df2['text']
y = df2['label']
# Feature Extraction
cv = CountVectorizer()
X = cv.fit_transform(Xfeatures)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
# Naive Bayes Classifier
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(X_train,y_train)
clf.score(X_test,y_test)
# Accuracy of our Model
print("Accuracy of Model",clf.score(X_test,y_test)*100,"%")
# Accuracy of our Model
print("Accuracy of Model",clf.score(X_train,y_train)*100,"%")
Predicting A Text
- Whether therefore ye eat, or drink, or whatsoever ye do, do all to the glory of God.(1 Corinthians 10:31 )
# Sample1 Prediction
sample_verse = ["Whether therefore ye eat, or drink, or whatsoever ye do, do all to the glory of God"]
vect = cv.transform(sample_verse).toarray()
# Old Testament is 0, New Testament is 1
clf.predict(vect)
### Example
#+ Isaiah 41:10
sample_verse2 = ["Fear thou not; for I am with thee: be not dismayed; for I am thy God: I will strengthen thee; yea, I will help thee; yea, I will uphold thee with the right hand of my righteousness."]
vect2 = cv.transform(sample_verse2).toarray()
clf.predict(vect2)
### Save Model
from sklearn.externals import joblib
biblepredictionNV_model = open("biblepredNV_model.pkl","wb")
joblib.dump(clf,biblepredictionNV_model)
biblepredictionNV_model.close()
Download the Full Code here
You can also check the video tutorial here
Thanks For Reading
Jesus Saves