polyglot python

Introduction to Natural Language Processing with Polyglot

Natural Language Processing with Polyglot

In this tutorial we will be exploring another Python NLP package called Polyglot.

Polyglot is a natural language pipeline that supports massive multilingual applications.Polyglot has a similar learning curve with TextBlob making it easier to pick up quickly if you know TextBlob.

Installation on Unix

  • sudo apt-get install python-numpy libicu-dev
  • pip install polyglot

Installation on Windows

To install on windows you can either us the normal pip method or try the next method.

  • pip install polyglot

Or try using this method

Download the PyCLD2 and PyICU From

You will need to download some models to allow you to do some of the tasks.

  • polyglot download embeddings2.en
  • polyglot download ner2.en
  • polyglot download sentiment2.en
  • polyglot download pos2.en
  • polyglot download morph2.en
  • polyglot download transliteration2.ar

Uses and Application

  • Fundamentals or Basics of NLP
  • Transliteration
  • Named Entity Recognition
  • Sentiment Analysis

Let us begin with  Polyglot.

Tokenization

  • Splitting text into words
In [47]:
# Load packages
import polyglot
from polyglot.text import Text,Word
In [48]:
# Word Tokens
docx = Text(u"He likes reading and painting")
In [49]:
docx.words
Out[49]:
WordList(['He', 'likes', 'reading', 'and', 'painting'])
In [50]:
docx2 = Text(u"He exclaimed, 'what're you doing? Reading?'.")
In [51]:
docx2.words
Out[51]:
WordList(['He', 'exclaimed', ',', "'", "what're", 'you', 'doing', '?', 'Reading', '?', "'", '.'])
In [52]:
# Sentence tokens
docx3 = Text(u"He likes reading and painting.He exclaimed, 'what're you doing? Reading?'.")
In [53]:
docx3.sentences
Out[53]:
[Sentence("He likes reading and painting.He exclaimed, 'what're you doing?"),
 Sentence("Reading?'.")]
In [ ]:

Parts of Speech Tagging

  • polyglot download embeddings2.la
  • pos_tags
In [54]:
docx
Out[54]:
Text("He likes reading and painting")
In [55]:
docx.pos_tags
Out[55]:
[('He', 'PRON'),
 ('likes', 'VERB'),
 ('reading', 'VERB'),
 ('and', 'CONJ'),
 ('painting', 'NOUN')]

Language Detection

  • polyglot.detect
  • language.name
  • language.code
In [56]:
docx
Out[56]:
Text("He likes reading and painting")
In [57]:
docx.language.name
Out[57]:
'English'
In [58]:
docx.language.code
Out[58]:
'en'
In [59]:
from polyglot.detect  import Detector
In [60]:
en_text = "He is a student "
fr_text = "Il est un étudiant"
ru_text = "Он студент"
In [67]:
detect_en = Detector(en_text)
detect_fr = Detector(fr_text)
detect_ru = Detector(ru_text)
Detector is not able to detect the language reliably.
Detector is not able to detect the language reliably.
In [63]:
print(detect_en.language)
name: English     code: en       confidence:  94.0 read bytes:   704
In [66]:
print(detect_fr.language)
name: French      code: fr       confidence:  95.0 read bytes:   870
In [68]:
print(detect_ru.language)
name: Serbian     code: sr       confidence:  95.0 read bytes:   614
In [ ]:

Sentiment Analysis

  • polarity
In [71]:
docx4 = Text(u"He hates reading and playing")
In [69]:
docx
Out[69]:
Text("He likes reading and painting")
In [70]:
docx.polarity
Out[70]:
1.0
In [72]:
docx4.polarity
Out[72]:
-1.0

Named Entities

  • entities
In [73]:
docx5 = Text(u"John Jones was a FBI detector")
In [74]:
docx5.entities
Out[74]:
[I-PER(['John', 'Jones']), I-ORG(['FBI'])]

Morphology

  • morpheme is the smallest grammatical unit in a language.
  • morpheme may or may not stand alone, word, by definition, is freestanding.
  • morphemes
In [75]:
docx6 = Text(u"preprocessing")
In [76]:
docx6.morphemes
Out[76]:
WordList(['pre', 'process', 'ing'])

Transliteration

In [77]:
# Load 
from polyglot.transliteration import Transliterator
translit = Transliterator(source_lang='en',target_lang='fr')
In [78]:
translit.transliterate(u"working")
Out[78]:

‘working’

You can check the video tutorials below

Thanks , Happy Coding

By Jesse E. Agbe (JCharis)

1 thought on “Introduction to Natural Language Processing with Polyglot”

  1. Thank you for this introduction, very helpful! Just one little thing: the sound quality of the video is really bad and it’s practically not possible to understand you properly.

Leave a Comment

Your email address will not be published. Required fields are marked *