Natural Language Processing with Polyglot

In this tutorial we will be exploring another Python NLP package called Polyglot.

Polyglot is a natural language pipeline that supports massive multilingual applications.Polyglot has a similar learning curve with TextBlob making it easier to pick up quickly if you know TextBlob.

Installation on Unix

sudo apt-get install python-numpy libicu-dev
pip install polyglot

Installation on Windows

To install on windows you can either us the normal pip method or try the next method.

pip install polyglot

Or try using this method

Download the PyCLD2 and PyICU From

https://www.lfd.uci.edu/~gohlke/pythonlibs/
- pip install pycld2-0.31-cp36-cp36m-win_amd64.whl
- pip install PyICU-1.9.8-cp36-cp36m-win_amd64.whl
- pip install Morfessor-2.0.4-py2.py3-none-any.whl
- git clone https://github.com/aboSamoor/polyglot.git
- cd polyglot
- python setup.py install

You will need to download some models to allow you to do some of the tasks.

polyglot download embeddings2.en
polyglot download ner2.en
polyglot download sentiment2.en
polyglot download pos2.en
polyglot download morph2.en
polyglot download transliteration2.ar

Uses and Application

Fundamentals or Basics of NLP
Transliteration
Named Entity Recognition
Sentiment Analysis

Let us begin with Polyglot.

Tokenization

Splitting text into words

In [47]:

# Load packages
import polyglot
from polyglot.text import Text,Word

In [48]:

# Word Tokens
docx = Text(u"He likes reading and painting")

In [49]:

docx.words

Out[49]:

WordList(['He', 'likes', 'reading', 'and', 'painting'])

In [50]:

docx2 = Text(u"He exclaimed, 'what're you doing? Reading?'.")

In [51]:

docx2.words

Out[51]:

WordList(['He', 'exclaimed', ',', "'", "what're", 'you', 'doing', '?', 'Reading', '?', "'", '.'])

In [52]:

# Sentence tokens
docx3 = Text(u"He likes reading and painting.He exclaimed, 'what're you doing? Reading?'.")

In [53]:

docx3.sentences

Out[53]:

[Sentence("He likes reading and painting.He exclaimed, 'what're you doing?"),
 Sentence("Reading?'.")]

In [ ]:

Parts of Speech Tagging

polyglot download embeddings2.la
pos_tags

In [54]:

docx

Out[54]:

Text("He likes reading and painting")

In [55]:

docx.pos_tags

Out[55]:

[('He', 'PRON'),
 ('likes', 'VERB'),
 ('reading', 'VERB'),
 ('and', 'CONJ'),
 ('painting', 'NOUN')]

Language Detection

polyglot.detect
language.name
language.code

In [56]:

docx

Out[56]:

Text("He likes reading and painting")

In [57]:

docx.language.name

Out[57]:

'English'

In [58]:

docx.language.code

Out[58]:

'en'

In [59]:

from polyglot.detect  import Detector

In [60]:

en_text = "He is a student "
fr_text = "Il est un étudiant"
ru_text = "Он студент"

In [67]:

detect_en = Detector(en_text)
detect_fr = Detector(fr_text)
detect_ru = Detector(ru_text)

Detector is not able to detect the language reliably.
Detector is not able to detect the language reliably.

In [63]:

print(detect_en.language)

name: English     code: en       confidence:  94.0 read bytes:   704

In [66]:

print(detect_fr.language)

name: French      code: fr       confidence:  95.0 read bytes:   870

In [68]:

print(detect_ru.language)

name: Serbian     code: sr       confidence:  95.0 read bytes:   614

In [ ]:

Sentiment Analysis

polarity

In [71]:

docx4 = Text(u"He hates reading and playing")

In [69]:

docx

Out[69]:

Text("He likes reading and painting")

In [70]:

docx.polarity

Out[70]:

1.0

In [72]:

docx4.polarity

Out[72]:

-1.0

Named Entities

entities

In [73]:

docx5 = Text(u"John Jones was a FBI detector")

In [74]:

docx5.entities

Out[74]:

[I-PER(['John', 'Jones']), I-ORG(['FBI'])]

Morphology

morpheme is the smallest grammatical unit in a language.
morpheme may or may not stand alone, word, by definition, is freestanding.
morphemes

In [75]:

docx6 = Text(u"preprocessing")

In [76]:

docx6.morphemes

Out[76]:

WordList(['pre', 'process', 'ing'])

Transliteration

In [77]:

# Load 
from polyglot.transliteration import Transliterator
translit = Transliterator(source_lang='en',target_lang='fr')

In [78]:

translit.transliterate(u"working")

Out[78]:

‘working’

You can check the video tutorials below

Thanks , Happy Coding

By Jesse E. Agbe (JCharis)

1 thought on “Introduction to Natural Language Processing with Polyglot”

Plotti
June 4, 2019 at 1:50 pm

Thank you for this introduction, very helpful! Just one little thing: the sound quality of the video is really bad and it’s practically not possible to understand you properly.