Predicting Location of Bible Passages/Verses Using Machine Learning with Python

Predicting Bible Verse and their location using ML with Python

In this tutorial we will be learning how to predict the location of a particular bible verse or passage whether they are in the old testament or new testament using Machine Learning. This is a supervised machine learning approach in which we have a set of features and a target label.

The features would be built from the various bible verses and the target label will be the Old Testament as 0 and the New Testament As 1.

Since we will be dealing with text document it will be recommended to use a very good machine learning algorithms that is good with text classification and binary classification problems.

We will be using Naive Bayes Classifier for building our model,since it is very good when working with text. We will need to convert the text into word vectors using the CountVectorizer/ TermFrequency Inverse Document Vectorizer to arrive at our vectors.

Requirements

  • Python 3x
  • Scikit Learn
  • Our Dataset precisely KJV

Let us start

In [9]:
# Load EDA Packages
import pandas as pd
In [10]:
# Load ML Packages
from sklearn.feature_extraction.text import CountVectorizer
#from sklearn.cross_validation import train_test_split b17
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
In [11]:
# Load Dataset
df = pd.read_csv("kjv_cleandata1.csv")
In [12]:
df.head()
Out[12]:
Unnamed: 0 id book chapter verse text
0 0 1001001 Genesis 1 1 In the beginning God created the heaven and th…
1 1 1001002 Genesis 1 2 And the earth was without form, and void; and …
2 2 1001003 Genesis 1 3 And God said, Let there be light: and there wa…
3 3 1001004 Genesis 1 4 And God saw the light, that it was good: and G…
4 4 1001005 Genesis 1 5 And God called the light Day, and the darkness…
In [13]:
# EDA
df.columns
Out[13]:
Index(['Unnamed: 0', 'id', 'book', 'chapter', 'verse', 'text'], dtype='object')
In [14]:
df.shape
Out[14]:
(31103, 6)
In [15]:
# Missing NA
df.isnull().sum()
Out[15]:
Unnamed: 0    0
id            0
book          0
chapter       0
verse         0
text          0
dtype: int64
In [23]:
# Find the longest verse
df.text.str.len().max()
Out[23]:
528
In [24]:
# Location 
df.text.str.len().idxmax()
Out[24]:
12826
In [25]:
df.loc[12826]
Out[25]:
Unnamed: 0                                                12826
id                                                     17008009
book                                                     Esther
chapter                                                       8
verse                                                         9
text          Then were the king's scribes called at that ti...
Name: 12826, dtype: object
In [26]:
df.loc[12826].text
Out[26]:
"Then were the king's scribes called at that time in the third month, that is, the month Sivan, on the three and twentieth day thereof; and it was written according to all that Mordecai commanded unto the Jews, and to the lieutenants, and the deputies and rulers of the provinces which are from India unto Ethiopia, an hundred twenty and seven provinces, unto every province according to the writing thereof, and unto every people after their language, and to the Jews according to their writing, and according to their language."
In [ ]:
### Model Building
- Label all old testament as 0
- Label new testament as 1
In [27]:
df2 = df
In [28]:
df2.loc[0:23144,'label'] = 0
In [30]:
df2.loc[23145:,'label'] = 1
In [31]:
df2.head()
Out[31]:
Unnamed: 0 id book chapter verse text label
0 0 1001001 Genesis 1 1 In the beginning God created the heaven and th… 0.0
1 1 1001002 Genesis 1 2 And the earth was without form, and void; and … 0.0
2 2 1001003 Genesis 1 3 And God said, Let there be light: and there wa… 0.0
3 3 1001004 Genesis 1 4 And God saw the light, that it was good: and G… 0.0
4 4 1001005 Genesis 1 5 And God called the light Day, and the darkness… 0.0
In [32]:
df2.to_csv("kjv2mindata.csv")
In [33]:
Xfeatures = df2['text']
y = df2['label']
In [34]:
# Feature Extraction 
cv = CountVectorizer()
X = cv.fit_transform(Xfeatures)
In [35]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
In [36]:
# Naive Bayes Classifier
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(X_train,y_train)
clf.score(X_test,y_test)
Out[36]:
0.9158222915042868
In [37]:
# Accuracy of our Model
print("Accuracy of Model",clf.score(X_test,y_test)*100,"%")
Accuracy of Model 91.58222915042869 %
In [38]:
# Accuracy of our Model
print("Accuracy of Model",clf.score(X_train,y_train)*100,"%")
Accuracy of Model 93.61773597581458 %

Predicting A Text

  • Whether therefore ye eat, or drink, or whatsoever ye do, do all to the glory of God.(1 Corinthians 10:31 )
In [39]:
# Sample1 Prediction
sample_verse = ["Whether therefore ye eat, or drink, or whatsoever ye do, do all to the glory of God"]
vect = cv.transform(sample_verse).toarray()
In [40]:
# Old Testament is 0, New Testament is 1
clf.predict(vect)
Out[40]:
array([1.])
In [41]:
### Example
#+ Isaiah 41:10
sample_verse2 = ["Fear thou not; for I am with thee: be not dismayed; for I am thy God: I will strengthen thee; yea, I will help thee; yea, I will uphold thee with the right hand of my righteousness."]
In [42]:
vect2 = cv.transform(sample_verse2).toarray()
In [43]:
clf.predict(vect2)
Out[43]:
array([0.])
In [ ]:
### Save Model
In [44]:
from sklearn.externals import joblib
In [45]:
biblepredictionNV_model = open("biblepredNV_model.pkl","wb")

joblib.dump(clf,biblepredictionNV_model)
In [46]:
biblepredictionNV_model.close()

Download the Full Code here

You can also check the video tutorial here

 

Thanks For Reading

Jesus Saves

 

Leave a Comment

Your email address will not be published. Required fields are marked *