Natural Language Processing with Spark NLU

Natural Language Processing ( NLP for short) is an exciting and useful field of Data Science. Some of applications of NLP involves

  • Text Classification
  • Sentiment Analysis
  • Machine Translation
  • Chatbots Creation
  • Keyword Extraction
  • NER
  • etc

With the increase of textual data, comes the increase of performant and fine tuned State of the Art(SoTA) models. These models may take several days and compute to generate, fortunately in the field of IT and Data Science people are very generous to open source their models and work for others to use. In this tutorial we will be exploring a simple NLP package created by John Snow Labs that offers several pretrained NLP models with simplicity.

By the end of this tutorial you will learn

  • Difference between Spark NLU and Spark NLP
  • How to Perform Sentiment Analysis with Spark NLU
  • Text Classification with Spark NLU
  • Question Classification & Q and A with Spark NLU
  • etc

Let us start.

Difference Between Spark NLU and Spark NLP

Both Spark NLU and Spark NLP libraries were developed by the same company John Snow Labs in collaboration with Apache Spark, however the difference between spark nlu library is that is it a simplified one liner API that gives you access to several pretrained language models to perform NLP task such as text classification, sentiment analysis,Q&A, NER etc. It is can be installed with pip via the command

pip install nlu pyspark

Spark NLP on the other hand is the robust NLP library.This is under the name sparknlp

Overview of Spark NLU

As we stated earlier Spark NLU is a simple one liner API that gives you access to several pretrained models for your task. The usage is as below

Text Classification with Spark NLU

Spark NLU from John Snow Labs has several models for different text classification task such as sentiment analysis,emotion in text classification,spam classification,etc. To use it you will need to download the models via the same command use to load the model. Any time you load the model , if the model is not available on your system, it will be downloaded from their servers/ modelhub. Hence you may need internet connection and space to download it for the first time.

Let us see a simple example

# Load Pkgs
import nlu

#Usage
nlu.load('sentiment').predict('I like coding and writing')

Alternatively you can try this method

sentiment_model = nlu.load('sentiment')
sentiment_model.predict('I like coding and writing')

To check for all the various components and supported language models you can use

import nlu
nlu.print_components()
nlu.print_all_models()

NER and Visualizing NER with Spark NLU

You can also perform Named Entity Recognition with Spark NLU. John Snow Labs also offers powerful clinical NER models for clinical NLP with is part of their enterprise. versions. To perform NER you can use

nlu.load('ner').predict('John lives in Accra but works in a remote job at London')

To visualize your NER, Spark NLU offers two options the .viz() and the .viz_streamlit().

nlu.load('ner').viz('John lives in Accra but works in a remote job at London')

There are others task we can perform with John Snow Labs NLU but we will limit ourselves to these for now.

You can check out the video tutorial below for more

To conclude Spark NLU makes it easy to perform several NLP task with state of the art language and ML models.

Thanks For your Time
Jesus Saves

By Jesse E.Agbe(JCharis)

Leave a Comment

Your email address will not be published. Required fields are marked *