Exploratory Data Analysis is an important aspect of any data science project. It forms the initial steps before moving into the Machine learning aspects.

In this tutorial we will be exploring the drug review dataset using python in an elaborate way. In doing EDA (exploratory data analysis) it is recommended to keep in mind the basic questions you want to find answers to using your dataset. This will direct you on the various analysis to use and how deep to explore the given data for more insight. In our case we will be breaking our questions into questions on the following

Drugs
Reviews
Ratings
Conditions
Combinations

We will be using the dataset from UCI machine learning repository which already have some basic info about what we will be doing.

By the end of this tutorial you will learn about

The various libraries to use for EDA
Descriptive analytics
How to do value counts
How to generate some plots for more insights
How to classify drugs based on their suffixes
How to do sentiment analysis on drug reviews
How to find and identify genuine review
Time series analysis on drug review and rating
Distribution Analysis
and More

You can get the entire code on Github here.

Let us start.

Data Science EDA Project From Scratch with Python

Tools & Libraries
- EDA: Pandas
- Viz: Seaborn,Matplotlib
- NLP:spaCy,TextBlob,NeatText
- ML: sklearn,xgboost,pycaret

DataSource

https://archive.ics.uci.edu/ml/datasets/Drug+Review+Dataset+%28Drugs.com%29

Attributes

drugName (categorical): name of drug
condition (categorical): name of condition
review (text): patient review
rating (numerical): 10 star patient rating
date (date): date of review entry
usefulCount (numerical): number of users who found review useful

Questions

Types of questions we can ask?(Drugs,Review,Rating,Conditions,Time,Genuiness,etc)
What is the most popular drug?
What are the groups/classification of drugs used?
Which Drug has the best review?
How many drugs do we have?
The number of drugs per condition
Number of patients that searched on a particular drug
How genuine is the review? (Using sentiment analysis)
How many reviews are positive,negative,neutral?
Correlation between rating and review and users who found the review useful
Can you predict the rating using the review?
Distribution of rating
Amount of review made per year and per month
Which condition has the most review on drugs

In [2]:

# Load EDA Pkgs
import pandas as pd
import numpy as np

In [3]:

# Load Data Viz
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

/usr/local/lib/python3.7/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  import pandas.util.testing as tm

In [4]:

# Load Sentiment Pkgs
from textblob import TextBlob

Question on Drugs

How many drugs do we have?
What is the most popular drug?
What are the groups/classification of drugs used?
Which Drug has the best review?
The number of drugs per condition
Number of patients that searched on a particular drug

In [27]:

# Load Dataset
df = pd.read_csv("drugsCom_raw/drugsComTrain_raw.tsv",sep='\t')

In [28]:

# Preview Dataset
df.head()

Out[28]:

	Unnamed: 0	drugName	condition	review	rating	date	usefulCount
0	206461	Valsartan	Left Ventricular Dysfunction	“It has no side effect, I take it in combinati…	9.0	May 20, 2012	27
1	95260	Guanfacine	ADHD	“My son is halfway through his fourth week of …	8.0	April 27, 2010	192
2	92703	Lybrel	Birth Control	“I used to take another oral contraceptive, wh…	5.0	December 14, 2009	17
3	138000	Ortho Evra	Birth Control	“This is my first time using any form of birth…	8.0	November 3, 2015	10
4	35696	Buprenorphine / naloxone	Opiate Dependence	“Suboxone has completely turned my life around…	9.0	November 27, 2016	37

In [29]:

# Columns
df.columns

Out[29]:

Index(['Unnamed: 0', 'drugName', 'condition', 'review', 'rating', 'date',
       'usefulCount'],
      dtype='object')

In [30]:

# Missing Values
df.isnull().sum()

Out[30]:

Unnamed: 0       0
drugName         0
condition      899
review           0
rating           0
date             0
usefulCount      0
dtype: int64

Narrative

Most of the missing values are in the condition column
This implies that most people don’t know their condition by name or privacy

Question on Drugs

How many drugs do we have?

In [32]:

# How many drugs do we have?
df['drugName'].unique().tolist()

Out[32]:

['Valsartan',
 'Guanfacine',
 'Lybrel',
 'Ortho Evra',
 'Buprenorphine / naloxone',
 'Cialis',
 'Levonorgestrel',
 'Aripiprazole',
 'Keppra',
 'Ethinyl estradiol / levonorgestrel',
 'Topiramate',
 'L-methylfolate',
 'Pentasa',
 'Dextromethorphan',
 'Nexplanon',
 'Liraglutide',
 'Trimethoprim',
 'Amitriptyline',
 'Lamotrigine',
 'Nilotinib',
 'Atripla',
 'Trazodone',
 'Etonogestrel',
 'Etanercept',
 'Tioconazole',
 'Azithromycin',
 'Eflornithine',
 'Daytrana',
 'Ativan',
 'Imitrex',
 'Sertraline',
 'Toradol',
 'Viberzi',
 'Mobic',
 'Dulcolax',
 'Morphine',
 'MoviPrep',
 'Trilafon',
 'Fluconazole',
 'Contrave',
 'Clonazepam',
 'Metaxalone',
 'Venlafaxine',
 'Ledipasvir / sofosbuvir',
 'Symbyax',
 'Tamsulosin',
 'Doxycycline',
 'Dulaglutide',
 'Intuniv',
 'Buprenorphine',
 'Qvar',
 'Opdivo',
 'Pyridium',
 'Latuda',
 'Bupropion',
 'Implanon',
 'Effexor XR',
 'Drospirenone / ethinyl estradiol',
 'NuvaRing',
 'Prepopik',
 'Tretinoin',
 'Gildess Fe 1 / 20',
 'Ethinyl estradiol / norgestimate',
 'Elbasvir / grazoprevir',
 'Clomiphene',
 'Docusate / senna',
 'Amitiza',
 'Sildenafil',
 'Lo Loestrin Fe',
 'Oxcarbazepine',
 'Wellbutrin',
 "Phillips' Milk of Magnesia",
 'Nature-Throid',
 'Lithium',
 'Oxycodone',
 'Estradiol',
 'Sronyx',
 'Augmentin XR',
 'Monistat 7-Day Combination Pack',
 'Plan B One-Step',
 'Alprazolam',
 'Fluoxetine',
 'Spironolactone',
 'Fluvoxamine',
 'Macrobid',
 'Lurasidone',
 'Adapalene / benzoyl peroxide',
 'Brimonidine',
 'Amlodipine / olmesartan',
 'Loestrin 24 Fe',
 'Linaclotide',
 'Mirtazapine',
 'Acetaminophen / hydrocodone',
 'Isotretinoin',
 'Ropinirole',
 'Zoledronic acid',
 'Lamictal',
 'Buspirone',
 'Propranolol',
 'Focalin',
 'Jolivette',
 'Levofloxacin',
 'Phentermine / topiramate',
 'Cephalexin',
 'Aviane',
 'Saxenda',
 'Clomipramine',
 'Medroxyprogesterone',
 'Aczone',
 'Nicoderm CQ',
 'Naltrexone',
 'Restasis',
 'Depo-Provera',
 'Olanzapine',
 'Oxytrol',
 'Fentanyl',
 'Epiduo',
 'Accutane',
 'Xanax',
 'Desvenlafaxine',
 'Urea',
 'Lyrica',
 'Phenergan',
 'Loestrin 21 1 / 20',
 'Loratadine',
 'Cardura XL',
 'Viibryd',
 'Mirena',
 'Ethinyl estradiol / norelgestromin',
 'Propofol',
 'Camphor / menthol',
 'Hydroxychloroquine',
 'Lorcaserin',
 'Insulin degludec',
 'Trintellix',
 'Lupron Depot',
 'Zanaflex',
 'Miconazole',
 'Opana ER',
 'Provera',
 'Diflucan',
 'Ibrance',
 'Reclipsen',
 'Lisinopril',
 'Empagliflozin',
 'Naproxen',
 'Amoxicillin / clarithromycin / lansoprazole',
 'Metoprolol',
 'Naloxegol',
 'Skyla',
 'Leuprolide',
 'Ulipristal',
 'Benzonatate',
 'Sulfamethoxazole / trimethoprim',
 'Eletriptan',
 'Escitalopram',
 'Dulera',
 'Prempro',
 'Gemfibrozil',
 'Depakote',
 'Testosterone',
 'Zomig',
 'Vyvanse',
 'Solodyn',
 'Efavirenz / emtricitabine / tenofovir',
 'Methimazole',
 'Ortho Tri-Cyclen',
 'Aleve',
 'Tylenol with Codeine #3',
 'Victoza',
 'Lubiprostone',
 'Ethinyl estradiol / norethindrone',
 'Sovaldi',
 'Pristiq',
 'Temozolomide',
 'Nabumetone',
 'Meloxicam',
 'Cevimeline',
 'ProAir RespiClick',
 'Gabapentin',
 'Relpax',
 'Levomilnacipran',
 'Yaz',
 'Valtrex',
 'Clindamycin',
 'BuSpar',
 'Plan B',
 'Trolamine salicylate',
 'Lisdexamfetamine',
 'Qsymia',
 'Rizatriptan',
 'Ziana',
 'Boudreaux Butt Paste',
 'Cymbalta',
 'Zoloft',
 'Tizanidine',
 'Gastrocrom',
 'Seasonique',
 'Amphetamine / dextroamphetamine',
 'Liletta',
 'Exenatide',
 'Paroxetine',
 'Bontril Slow Release',
 'Levothroid',
 'Carbamazepine',
 'Adipex-P',
 'Bydureon',
 'Bupropion / naltrexone',
 'Voltaren-XR',
 'Pimecrolimus',
 'Acetaminophen / oxycodone',
 'Monistat 7',
 'Pramipexole',
 'AndroGel',
 'Nitrofurantoin',
 'Metronidazole',
 'Ziprasidone',
 'Acetaminophen / butalbital / caffeine',
 'Nuvigil',
 'Moxifloxacin',
 'Methadone',
 'Celecoxib',
 'Aspirin / butalbital / caffeine',
 'Montelukast',
 'Saliva substitutes',
 'Atomoxetine',
 'Anastrozole',
 'Phenol',
 'Duloxetine',
 'Magnesium sulfate / potassium sulfate / sodium sulfate',
 'Lansoprazole',
 'Nardil',
 'Milnacipran',
 'Oxymorphone',
 'Acetaminophen / aspirin / caffeine',
 'Levora',
 'ParaGard',
 'Levaquin',
 'Ciprofloxacin',
 'Avelox',
 'Acidophilus',
 'Metformin',
 'Terconazole',
 'Saphris',
 'Augmentin',
 'Lexapro',
 'Tamiflu',
 'Prazosin',
 'Liothyronine',
 'Seroquel',
 'Terbinafine',
 'Valium',
 'Norco',
 'Progesterone',
 'Concerta',
 'Ocella',
 'Strattera',
 'Mylanta',
 'TriNessa',
 'Goserelin',
 'Quetiapine',
 'Testim',
 'Emend',
 'Methylphenidate',
 'Acyclovir',
 'Linzess',
 'Orthovisc',
 'Silodosin',
 'Metoclopramide',
 'Indomethacin',
 'Copper',
 'Meclizine',
 'Gilenya',
 'Microgestin Fe 1 / 20',
 'Klonopin',
 'Codeine / guaifenesin',
 'Citalopram',
 'Colazal',
 'Fiorinal with Codeine',
 'Zolpidem',
 'Wellbutrin XL',
 'Climara Pro',
 'Clarithromycin',
 'Bactrim DS',
 'Varenicline',
 'Amoxicillin / clavulanate',
 'Tolterodine',
 'Hydroxyzine',
 'Blisovi Fe 1 / 20',
 'Ramelteon',
 'Infliximab',
 'Rabeprazole',
 'Dexilant',
 'Immune globulin oral',
 'Nucynta',
 'Hysingla ER',
 'Pantoprazole',
 'Sprycel',
 'Tri-Sprintec',
 'Doxepin',
 'Zofran',
 'Versed',
 'Cipro',
 'MS Contin',
 'Avonex',
 'Focalin XR',
 'Junel Fe 1 / 20',
 'Imdur',
 'Diazepam',
 'Bisacodyl',
 'Nortrel 1 / 35',
 'Suvorexant',
 'Risperidone',
 'Sprintec',
 'Risperdal',
 'Simponi',
 'Euflexxa',
 'Diclofenac',
 'Dimenhydrinate',
 'Benztropine',
 'Ortho Tri-Cyclen Lo',
 'Ibuprofen',
 'Sumatriptan',
 'Polyethylene glycol 3350',
 'Xenical',
 'Glatiramer',
 'Atarax',
 'Orsythia',
 'Alirocumab',
 'Sublimaze',
 'Ethinyl estradiol / etonogestrel',
 'femhrt',
 'Lortab',
 'Haldol',
 'Tapentadol',
 'Ketorolac',
 'Phentermine',
 'Effexor',
 'Remicade',
 'Flexeril',
 'SMZ-TMP DS',
 'Dutasteride',
 'Thyroid desiccated',
 'Opana',
 'Zyprexa',
 'Vedolizumab',
 'Chantix',
 'Suboxone',
 'Mirapex',
 'Uribel',
 'Vicks Sinex Nasal Spray (old formulation)',
 'Bactrim',
 'Depo-Provera Contraceptive',
 'Ceftriaxone',
 'Ranexa',
 'Trulicity',
 'Percocet',
 'Sitagliptin',
 'Formoterol / mometasone',
 'Ambien',
 'Belviq',
 'Retapamulin',
 'Vigamox',
 'Levetiracetam',
 'Elidel',
 'Abilify Discmelt',
 'Delsym 12 Hour Cough Relief',
 'Baclofen',
 'Halobetasol',
 'Azelastine / fluticasone',
 'Drysol',
 'Alavert D-12 Hour Allergy and Sinus',
 'Cimzia',
 'Rexulti',
 'Proctofoam',
 'Alli',
 'Cefdinir',
 'Tramadol',
 'Norflex',
 'Humira',
 'Tecfidera',
 'Acetaminophen / dichloralphenazone / isometheptene mucate',
 'Benzoyl peroxide / clindamycin',
 'Methyl salicylate',
 'Monistat 3-Day Combination Pack',
 'Clindamycin / tretinoin',
 'Flonase',
 'Norethindrone',
 'Alvesco',
 'Nystatin',
 'EContra EZ',
 'TriCor',
 'Diphenhydramine',
 'Neulasta',
 'Zolmitriptan',
 'Minastrin 24 Fe',
 'Levothyroxine',
 'Mononessa',
 'Differin',
 'Ibandronate',
 'Zithromax',
 'Compazine',
 'Topamax',
 'Ustekinumab',
 'Minocycline',
 'Ultram ER',
 'Nortriptyline',
 'Pregabalin',
 'Suprep Bowel Prep Kit',
 'Armour Thyroid',
 'Jolessa',
 'Mirvaso',
 'Atralin',
 'Crestor',
 'Rozerem',
 'Cryselle',
 'Sucralfate',
 'Efinaconazole',
 'Cetirizine',
 'Amoxicillin',
 'Soma',
 'Neupro',
 'Valacyclovir',
 'Toprol-XL',
 'Sodium oxybate',
 'Mesalamine',
 'Orlistat',
 'Butorphanol',
 'Humatrope',
 'Diltiazem',
 'Hydrocodone',
 'Ritalin',
 'Kapvay',
 'Prozac',
 'Vicodin',
 'Falmina',
 'Relafen',
 'Restoril',
 'Frovatriptan',
 'Losartan',
 'Sharobel',
 'Xyrem',
 'Apri',
 'ella',
 'Spiriva',
 'Tasigna',
 'Dupixent',
 'Lorazepam',
 'Cyproheptadine',
 'Repatha',
 'Docusate',
 'Hydrochlorothiazide',
 'Scopolamine',
 'Flurbiprofen',
 'Femara',
 'Methotrexate',
 'Hypercare',
 'Epinephrine',
 'Brimonidine / timolol',
 'Mucinex D',
 'Azor',
 'Voltaren',
 'Vortioxetine',
 'Velivet',
 'Necon 1 / 35',
 'Seroquel XR',
 'Detrol LA',
 'Sporanox',
 'Febuxostat',
 'Eluxadoline',
 'Neurontin',
 'Ondansetron',
 'Gardasil',
 'Avastin',
 'Paliperidone',
 'Nora-Be',
 'Penicillin v potassium',
 'Tofacitinib',
 "St. john's wort",
 'Rythmol SR',
 'Vytorin',
 'Dilaudid',
 'Vilazodone',
 'Librium',
 'Duac',
 'Dapsone',
 'Vicodin ES',
 'Hydromorphone',
 'Zyban',
 'Ranitidine',
 'Arava',
 'Acetaminophen / caffeine',
 'Adderall',
 'Ruconest',
 'Pseudoephedrine / triprolidine',
 'Etodolac',
 'Emsam',
 'Diovan',
 'Ammonium lactate',
 'Desmopressin',
 'Zepatier',
 'Magnesium hydroxide',
 'Butrans',
 'Botox',
 'Azelastine',
 'Xalkori',
 'Zyvox',
 'Rocephin',
 'Kyleena',
 'Toujeo',
 'Dexlansoprazole',
 'Brexpiprazole',
 'Roxicodone Intensol',
 'Reglan',
 'Divalproex sodium',
 'Tindamax',
 'Camrese',
 'MetroGel-Vaginal',
 'Tadalafil',
 'Flecainide',
 'Junel Fe 1.5 / 30',
 'Flagyl',
 'Ovace Plus',
 'Secobarbital',
 'Empagliflozin / linagliptin',
 'Raltegravir',
 'Tavaborole',
 'Ampyra',
 'Celebrex',
 'Colchicine',
 'Geodon',
 'Aluminum chloride hexahydrate',
 'Tryptophan',
 'Myrbetriq',
 'Malarone',
 'Fetzima',
 'Omeprazole',
 'Fluoride',
 'Jardiance',
 'Turmeric',
 'Acetaminophen / butalbital / caffeine / codeine',
 'Desogestrel / ethinyl estradiol',
 'Secukinumab',
 'Ethinyl estradiol / norgestrel',
 'Nifedipine',
 'Celexa',
 'Prednisone',
 'Methocarbamol',
 'Haldol Decanoate',
 'Beyaz',
 'Taclonex',
 'Decadron',
 'Vardenafil',
 'Oxazepam',
 'Dexmethylphenidate',
 'Firmagon',
 'Phenazopyridine',
 'Tiotropium',
 'Savella',
 'Cataflam',
 'Cobicistat / elvitegravir / emtricitabine / tenofovir',
 'Ramipril',
 'Relistor',
 'Paxil',
 'Stelara',
 'Cambia',
 'Ezetimibe',
 'Mefenamic acid',
 'Budesonide / formoterol',
 'Doryx',
 'Dymista',
 'Omalizumab',
 'Conjugated estrogens',
 'Lunesta',
 'Mometasone',
 'Phenylephrine',
 'VESIcare',
 'Kapidex',
 'Errin',
 'Lomotil',
 'Clomid',
 'Clozapine',
 'Olopatadine',
 'Narcan Injection',
 'Mirabegron',
 'Wellbutrin SR',
 'Cyclobenzaprine',
 'Tinidazole',
 'Asenapine',
 'Penicillin VK',
 'Oxymetazoline',
 'EpiCeram',
 'Temazepam',
 'Oxybutynin',
 'Armodafinil',
 'Epclusa',
 'Dalfampridine',
 'OnabotulinumtoxinA',
 'Doxylamine / pyridoxine',
 'Estarylla',
 'Vancomycin',
 'Naproxen / sumatriptan',
 'Fastin',
 'Protonix',
 'Bazedoxifene / conjugated estrogens',
 'Chloral hydrate',
 'Lialda',
 'Maxalt',
 'Denosumab',
 'Boniva',
 'Sklice',
 'Acetazolamide',
 'Clinpro 5000',
 'Zelapar',
 'Desloratadine',
 'Docosanol',
 'Acetaminophen',
 'Chaparral',
 'Hyaluronan',
 'Polyethylene glycol 3350 with electrolytes',
 'Mestranol / norethindrone',
 'Melatonin',
 'Symbicort',
 'Lutera',
 'Emollients',
 'Colesevelam',
 'Zyclara',
 'Aspirin / caffeine',
 'Abreva',
 'Isocarboxazid',
 'Correctol',
 'Plecanatide',
 'Xarelto',
 'Prevacid',
 'Simvastatin',
 'Carbatrol',
 'Dapagliflozin',
 'Tresiba',
 'Oracea',
 'Abilify',
 'Meperidine',
 'Tamoxifen',
 'Harvoni',
 'Imiquimod',
 'Trospium',
 'Limbitrol',
 'Zegerid',
 'Diethylpropion',
 'Limbrel',
 'Verapamil',
 'Premarin',
 'Pentazocine',
 'Apremilast',
 'Ciclesonide',
 'Canagliflozin',
 'Megestrol',
 'LoSeasonique',
 'Otezla',
 'Zenzedi',
 'Phosphorated carbohydrate solution',
 'Influenza virus vaccine, live, trivalent',
 'Deplin',
 'Methylprednisolone',
 'Invega',
 'Cutar',
 'Serzone',
 'Biaxin XL',
 'Coreg',
 'Ortho Cyclen',
 'Lorcet 10 / 650',
 'Letrozole',
 'Cefuroxime',
 'Sectral',
 'Belladonna / opium',
 'Flomax',
 'My Way',
 'Belsomra',
 'Adapalene',
 'Promethazine',
 'Fentanyl Transdermal System',
 'Desoxyn',
 'Tegretol',
 'Latisse',
 'Oseltamivir',
 'Kombiglyze XR',
 'Minoxidil',
 'Enbrel',
 'Adalimumab',
 'Xulane',
 'Elavil',
 'Endocet',
 'Unisom SleepGels',
 'Invokana',
 'Naphazoline',
 'Hydrochlorothiazide / telmisartan',
 'Mycophenolate mofetil',
 'Eucrisa',
 'Biltricide',
 'Bystolic',
 'Ibuprofen / pseudoephedrine',
 'Alesse',
 'Bisoprolol / hydrochlorothiazide',
 'Fexofenadine',
 'Fentora',
 'Guaifenesin',
 'Modafinil',
 'Kadian',
 'Dexamethasone',
 'Atropine / diphenoxylate',
 'Metformin / sitagliptin',
 'Fluorouracil',
 'Clobetasol',
 'Commit',
 'Tri-Lo-Sprintec',
 'Guaifenesin / phenylephrine',
 'Dexbrompheniramine / pseudoephedrine',
 'FreshKote',
 'Racepinephrine',
 'Keflex',
 'Fluticasone',
 'Levemir',
 'Alprostadil',
 'Carbidopa / levodopa',
 'Tranexamic acid',
 'Esomeprazole',
 'Voltaren Gel',
 'Adderall XR',
 'Hydrocortisone',
 'Remeron',
 'Genvoya',
 'Podofilox',
 'Tri-Previfem',
 'Atorvastatin',
 'Carisoprodol',
 'Gildess Fe 1.5 / 30',
 'Viagra',
 'Famotidine / ibuprofen',
 'Selenium sulfide',
 'Aubra',
 'Tocilizumab',
 'Lacosamide',
 'Axiron',
 'Finacea',
 'Hydrocodone / ibuprofen',
 'Vantin',
 'Silenor',
 'Rivaroxaban',
 'Motrin',
 'Cholestyramine',
 'Nor-QD',
 'Ketamine',
 'Flurazepam',
 'Aubagio',
 'Nebivolol',
 'Vandazole',
 'Clopidogrel',
 'Imuran',
 'Avanafil',
 'Robaxin-750',
 'Hydrochlorothiazide / lisinopril',
 'Targiniq ER',
 'Drospirenone / ethinyl estradiol / levomefolate calcium',
 'Breo Ellipta',
 'Zonisamide',
 'Diamox',
 'Warfarin',
 'Amlodipine',
 'Midazolam',
 'Parnate',
 'Next Choice',
 'Gleevec',
 'Movantik',
 'Cabergoline',
 'Opcicon One-Step',
 'Vivelle-Dot',
 'Estrace Vaginal Cream',
 'Trihexyphenidyl',
 'Acetaminophen / propoxyphene',
 'Invega Sustenna',
 'Mephobarbital',
 'Nexium',
 'Insulin glargine',
 'Hydromet',
 'Tenofovir',
 'Chlordiazepoxide',
 'Taltz',
 'Blisovi 24 Fe',
 'Campral',
 'Loratadine / pseudoephedrine',
 'Sodium biphosphate / sodium phosphate',
 'Risedronate',
 'Lenalidomide',
 'Klor-Con',
 'Furosemide',
 'Exemestane',
 'Diclegis',
 'Monodox',
 'Bethanechol',
 'Portia',
 'Eliquis',
 'Prochlorperazine',
 'Diclofenac / misoprostol',
 'Amlodipine / benazepril',
 'Adzenys XR-ODT',
 'Natalizumab',
 'Aldactone',
 'Benadryl',
 'Niaspan',
 'Citric acid / magnesium oxide / sodium picosulfate',
 'Fioricet',
 'Clonidine',
 'Lactulose',
 'Guaifenesin / pseudoephedrine',
 'R-Tanna',
 'Jublia',
 'Atenolol',
 'Solifenacin',
 'Lidoderm',
 'Bismuth subcitrate potassium / metronidazole / tetracycline',
 'Nicotrol Inhaler',
 'Levlen',
 'Asacol',
 'Viorele',
 'Depakote ER',
 'Fortesta',
 'Silvadene',
 'Estring',
 'Duragesic',
 'Xyzal',
 'Artane',
 'Tafinlar',
 'Orencia',
 'Lyza',
 'Tegretol XR',
 'Ceftin',
 'Tambocor',
 'Previfem',
 'Duavee',
 'Microgestin 1 / 20',
 'Ocular lubricant',
 'Vascepa',
 'Estradiol Patch',
 'Protonix IV',
 'HC-Derma-Pax',
 'Welchol',
 'Tirosint',
 'Sandostatin',
 'Patanol',
 'Olmesartan',
 'Xolair',
 'Irbesartan',
 'Lidocaine',
 'Librax',
 'Chlorthalidone',
 'Naprelan',
 'Prolia',
 'Dolutegravir',
 'Cefpodoxime',
 'Clocortolone',
 'Ultram',
 'Afrezza',
 'Tysabri',
 'Lactobacillus acidophilus',
 'Oxaliplatin',
 'Mexiletine',
 'Halcion',
 'Somatropin',
 'Enjuvia',
 'Multivitamin, prenatal',
 'Omnicef',
 'Embeda',
 'Triamcinolone',
 'Uloric',
 'Caffeine',
 'Arthrotec',
 'Sodium hyaluronate',
 'Zarah',
 'Benicar HCT',
 'Hydrochlorothiazide / spironolactone',
 'Sumavel DosePro',
 'Immune globulin subcutaneous',
 'Certolizumab',
 'Aftera',
 'Pemoline',
 'Delsym',
 'Capzasin-HP',
 'Topicort',
 'Carvedilol',
 'Actiq',
 'Phendimetrazine',
 'Mysoline',
 'Acamprosate',
 'Prevnar 13',
 'Doxazosin',
 'Vistaril',
 'Rapaflo',
 'Vitamin B2',
 'Acthar',
 'Memantine',
 'Heather',
 'Apixaban',
 'Patanase',
 'Januvia',
 'Nicotine',
 'S-adenosylmethionine',
 'Zetia',
 'Feldene',
 'Pradaxa',
 'Abacavir / dolutegravir / lamivudine',
 'Synvisc',
 'Dextromethorphan / quinidine',
 'Acetaminophen / tramadol',
 'Tazarotene',
 'Opcon-A',
 'Ketoprofen',
 'Golimumab',
 'Kava',
 'Loestrin Fe 1.5 / 30',
 'Letairis',
 'Coartem',
 'Complera',
 'ProAir HFA',
 'Nasacort',
 'Pramoxine',
 'Clindesse',
 'Cozaar',
 'Multivitamin',
 'Chlorpromazine',
 'Protopic',
 'Migranal',
 'Teriparatide',
 'Dihydroergotamine',
 'Potassium chloride',
 'Chateal',
 'Quasense',
 'Anafranil',
 'Advair Diskus',
 'Nefazodone',
 'Ditropan',
 'Taytulla',
 'Kenalog',
 'Tenormin',
 'Rebif',
 'Copaxone',
 'Generess Fe',
 'Hydroxyurea',
 'Retisert',
 'Alfuzosin',
 'Penciclovir',
 'Betamethasone / calcipotriene',
 'Brisdelle',
 'Dyanavel XR',
 'Allerx Dose Pack DF',
 'Deltasone',
 'Amaryl',
 'Soolantra',
 'Mucinex',
 'Desyrel',
 'Levocetirizine',
 'Cobicistat / elvitegravir / emtricitabine / tenofovir alafenamide',
 'Lopinavir / ritonavir',
 'Byetta',
 'Catapres-TTS',
 'Chlorpheniramine',
 'Sustiva',
 'Zaleplon',
 'Farxiga',
 'Janumet',
 'Lo / Ovral-28',
 'Docetaxel',
 'Lantus Solostar',
 'Oxistat',
 'Selegiline',
 'Sunitinib',
 'Levorphanol',
 'Cheratussin AC',
 'Umeclidinium / vilanterol',
 'Erlotinib',
 'Forteo',
 'Seasonale',
 'Actemra',
 'Urso Forte',
 'Aspirin',
 'Zyprexa Zydis',
 'Cosentyx',
 'MiraLax',
 'Esgic',
 'Rituximab',
 'Fluticasone / vilanterol',
 'Tacrolimus',
 'Supartz',
 'Tenuate',
 'Beclomethasone',
 'Fenofibric acid',
 'Panlor SS',
 'Methylnaltrexone',
 'Benicar',
 'Sofosbuvir / velpatasvir',
 'Methadose',
 'Calcitriol',
 ...]

In [33]:

# How many drugs do we have?
len(df['drugName'].unique().tolist())

Out[33]:

In [34]:

# What is the most popular drug?
df['drugName'].value_counts()

Out[34]:

Levonorgestrel                       3657
Etonogestrel                         3336
Ethinyl estradiol / norethindrone    2850
Nexplanon                            2156
Ethinyl estradiol / norgestimate     2117
                                     ... 
Mellaril                                1
Oxymetholone                            1
Ethchlorvynol                           1
Ginseng                                 1
Meningococcal group B vaccine           1
Name: drugName, Length: 3436, dtype: int64

In [35]:

# What is the most popular drug?
# Top 20 Drugs (Most Popular)
df['drugName'].value_counts().nlargest(20)

Out[35]:

Levonorgestrel                        3657
Etonogestrel                          3336
Ethinyl estradiol / norethindrone     2850
Nexplanon                             2156
Ethinyl estradiol / norgestimate      2117
Ethinyl estradiol / levonorgestrel    1888
Phentermine                           1543
Sertraline                            1360
Escitalopram                          1292
Mirena                                1242
Implanon                              1102
Gabapentin                            1047
Bupropion                             1022
Venlafaxine                           1016
Miconazole                            1000
Citalopram                             995
Medroxyprogesterone                    995
Lexapro                                952
Bupropion / naltrexone                 950
Duloxetine                             934
Name: drugName, dtype: int64

In [36]:

# Top 20 Drugs (Most Popular)
plt.figure(figsize=(20,10))
df['drugName'].value_counts().nlargest(20).plot(kind='bar')
plt.title("Top 20 Most popular drugs based on counts")
plt.show()

Narrative

Most of the commonest drugs are hormonal drugs

In [37]:

# Least 20 Drugs (Most Popular)
df['drugName'].value_counts().nsmallest(20)

Out[37]:

Hyosyne                             1
Alimta                              1
Pamabrom                            1
Dallergy                            1
Reyataz                             1
Nor-QD                              1
Citric acid / potassium citrate     1
Doans Pills Extra Strength          1
Streptokinase                       1
Metolazone                          1
Dasetta 7 / 7 / 7                   1
Hexalen                             1
Nitro-Dur                           1
Stalevo 150                         1
Acrivastine / pseudoephedrine       1
Calcium / vitamin d                 1
Rifadin                             1
MVI Adult                           1
Bendroflumethiazide / nadolol       1
Hydroxyamphetamine / tropicamide    1
Name: drugName, dtype: int64

In [38]:

df['drugName'].value_counts().nsmallest(20).plot(kind='bar')

Out[38]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f916705d450>

In [ ]:

### What are the groups/classification of drugs used?
+ suffix or endings

In [39]:

drug_suffix = {"azole":"antifungal (except metronidazole)",
"caine":"anesthetic",
"cillin":"antibiotic(penicillins)",
"mycin":"antibiotic",
"micin":"antibiotic",
"cycline":"antibiotic",
"oxacin":"antibiotic",
"ceph":"antibiotic(cephalosporins)",
"cef":"antibiotic (cephalosporins)",
"dine":"h2 blockers (anti-ulcers)",
"done":"opiod analgesics",
"ide":"oral hypoglycemics",
"lam":"anti-anxiety",
"pam":"anti-anxiety",
"mide":"diuretics",
"zide":"diuretics",
"nium":"neuromuscular blocking agents",
"olol":"beta blockers",
"tidine":"h2 antagonist",
"tropin":"pituitary hormone",
"zosin":"alpha blocker",
"ase":"thrombolytics",
"plase":"thrombolytics",
"azepam":"anti-anziety(benzodiazepine)",
"azine":"antipyschotics (phenothiazine)",
"barbital":"barbiturate",
"dipine":"calcium channel blocker",
"lol":"beta blocker",
"zolam":"cns depressants",
"pril":"ace inhibitor",
"artan":"arb blocker",
"statins":"lipid-lowering drugs",
"parin":"anticoagulants",
"sone":"corticosteroid (prednisone)"}

In [40]:

def classify_drug(drugname):
    for i in drug_suffix.keys():
        if drugname.endswith(i):
            print(True)
            print(drug_suffix[i])

In [41]:

classify_drug('Valsartan')

True
arb blocker

In [43]:

classify_drug('losartan')

True
arb blocker

In [44]:

def classify_drug(drugname):
    for i in drug_suffix.keys():
        if drugname.endswith(i):
            return drug_suffix[i]

In [45]:

classify_drug('valsartan')

Out[45]:

'arb blocker'

In [46]:

df['drug_class'] = df['drugName'].apply(classify_drug)

In [47]:

df[['drugName','drug_class']]

Out[47]:

	drugName	drug_class
0	Valsartan	arb blocker
1	Guanfacine	None
2	Lybrel	None
3	Ortho Evra	None
4	Buprenorphine / naloxone	None
…	…	…
161292	Campral	None
161293	Metoclopramide	oral hypoglycemics
161294	Orencia	None
161295	Thyroid desiccated	None
161296	Lubiprostone	None

161297 rows × 2 columns

In [48]:

# How many Groups of Drugs By Class
df['drug_class'].unique().tolist()

Out[48]:

['arb blocker',
 None,
 'antifungal (except metronidazole)',
 'oral hypoglycemics',
 'opiod analgesics',
 'antibiotic',
 'anti-anxiety',
 'h2 blockers (anti-ulcers)',
 'beta blockers',
 'ace inhibitor',
 'thrombolytics',
 'alpha blocker',
 'corticosteroid (prednisone)',
 'antipyschotics (phenothiazine)',
 'antibiotic(penicillins)',
 'barbiturate',
 'calcium channel blocker',
 'anesthetic',
 'pituitary hormone',
 'antibiotic (cephalosporins)',
 'beta blocker',
 'neuromuscular blocking agents',
 'anticoagulants']

In [50]:

# How many Groups of Drugs By Class
len(df['drug_class'].unique().tolist())

Out[50]:

In [51]:

# Which of class of drug  is the most commonest
df['drug_class'].value_counts()

Out[51]:

antifungal (except metronidazole)    4201
opiod analgesics                     3945
oral hypoglycemics                   3555
antibiotic                           3401
anti-anxiety                         2645
h2 blockers (anti-ulcers)            1228
beta blockers                         966
corticosteroid (prednisone)           886
antipyschotics (phenothiazine)        664
arb blocker                           560
ace inhibitor                         432
calcium channel blocker               233
alpha blocker                         153
anesthetic                            129
antibiotic(penicillins)               119
thrombolytics                         116
beta blocker                           97
neuromuscular blocking agents          45
antibiotic (cephalosporins)            29
pituitary hormone                      28
barbiturate                            19
anticoagulants                          9
Name: drug_class, dtype: int64

In [52]:

# Which of class of drug  is the most commonest
plt.figure(figsize=(20,10))
df['drug_class'].value_counts().plot(kind='bar')
plt.title("Distribution of Drugs By Class")
plt.show()

Narrative

The most commonest class/group of drugs used is
- Antifungal
- Opiod Analgesics(Pain Killers)
- Oral Hypoglycemics (DM)
- Antibiotic

In [69]:

# Distribution of Drugs Per Drug Group based on size
drug_groups = df.groupby('drug_class').size()

In [70]:

type(drug_groups)

Out[70]:

pandas.core.series.Series

In [71]:

# Convert to DF
# Method 1
drug_groups.to_frame()

Out[71]:

	0
drug_class
ace inhibitor	432
alpha blocker	153
anesthetic	129
anti-anxiety	2645
antibiotic	3401
antibiotic (cephalosporins)	29
antibiotic(penicillins)	119
anticoagulants	9
antifungal (except metronidazole)	4201
antipyschotics (phenothiazine)	664
arb blocker	560
barbiturate	19
beta blocker	97
beta blockers	966
calcium channel blocker	233
corticosteroid (prednisone)	886
h2 blockers (anti-ulcers)	1228
neuromuscular blocking agents	45
opiod analgesics	3945
oral hypoglycemics	3555
pituitary hormone	28
thrombolytics	116

In [73]:

# Convert to DF
# Method 2
drug_groups_df = pd.DataFrame({'drug_class':drug_groups.index,'counts':drug_groups.values})

In [75]:

# Seaborn Plot
plt.figure(figsize=(20,10))
g = sns.barplot(data=drug_groups_df,x='drug_class',y='counts')
plt.show()

In [76]:

# Seaborn Plot
plt.figure(figsize=(20,10))
g = sns.barplot(data=drug_groups_df,x='drug_class',y='counts')
g.set_xticklabels(drug_groups_df['drug_class'].values,rotation=30)
plt.show()

In [77]:

# Seaborn Plot
plt.figure(figsize=(20,10))
g = sns.barplot(data=drug_groups_df,x='drug_class',y='counts')
plt.xticks(rotation=30)
plt.show()

In [ ]:

### Question on Conditions
+ How many conditions are there?
+ Which conditions are the most common?
+ Distribution of conditions and rating

In [54]:

# Number of Conditions
df['condition'].unique()

Out[54]:

array(['Left Ventricular Dysfunction', 'ADHD', 'Birth Control',
       'Opiate Dependence', 'Benign Prostatic Hyperplasia',
       'Emergency Contraception', 'Bipolar Disorde', 'Epilepsy',
       'Migraine Prevention', 'Depression', "Crohn's Disease", 'Cough',
       'Obesity', 'Urinary Tract Infection', 'ibromyalgia',
       'Chronic Myelogenous Leukemia', 'HIV Infection', 'Insomnia',
       'Rheumatoid Arthritis', 'Vaginal Yeast Infection',
       'Chlamydia Infection', 'Hirsutism', 'Panic Disorde', 'Migraine',
       nan, 'Pain', 'Irritable Bowel Syndrome', 'Osteoarthritis',
       'Constipation', 'Bowel Preparation', 'Psychosis', 'Muscle Spasm',
       'Hepatitis C', 'Overactive Bladde', 'Diabetes, Type 2',
       'Asthma, Maintenance', 'Non-Small Cell Lung Cance',
       'Schizophrenia', 'Dysuria', 'Smoking Cessation', 'Anxiety', 'Acne',
       'emale Infertility', 'Constipation, Acute',
       'Constipation, Drug Induced', 'Erectile Dysfunction',
       'Trigeminal Neuralgia', 'Underactive Thyroid', 'Chronic Pain',
       'Atrophic Vaginitis', 'Skin and Structure Infection', 'Tinnitus',
       'Major Depressive Disorde', 'Anxiety and Stress', 'Rosacea',
       'High Blood Pressure',
       '2</span> users found this comment helpful.',
       'Restless Legs Syndrome',
       'Osteolytic Bone Metastases of Solid Tumors', 'Bronchitis',
       'Skin or Soft Tissue Infection', 'Obsessive Compulsive Disorde',
       'Endometriosis', 'Keratoconjunctivitis Sicca', 'Breakthrough Pain',
       'Seizures', 'Neuropathic Pain', 'Sedation', 'Menstrual Disorders',
       'Allergic Rhinitis', 'Anesthesia',
       'Undifferentiated Connective Tissue Disease', 'Diabetes, Type 1',
       'Abnormal Uterine Bleeding', 'Weight Loss',
       'Constipation, Chronic', 'Breast Cancer, Metastatic',
       'Period Pain', 'Helicobacter Pylori Infection',
       'Atrial Fibrillation', 'Uterine Fibroids',
       '4</span> users found this comment helpful.', 'Kidney Infections',
       'Generalized Anxiety Disorde', 'Asthma', 'Postmenopausal Symptoms',
       'High Cholesterol', 'Hypogonadism, Male', 'Hyperthyroidism',
       'Back Pain', 'Anaplastic Oligodendroglioma', "Sjogren's Syndrome",
       'Asthma, acute', 'Hot Flashes',
       '3</span> users found this comment helpful.',
       'Herpes Simplex, Suppression', 'Bacterial Infection', 'Bursitis',
       'Diaper Rash', 'Systemic Mastocytosis', 'Trichotillomania',
       "Hashimoto's disease", 'Eczema', 'Dental Abscess', 'Headache',
       'Hypersomnia', 'Xerostomia', 'Breast Cance', 'Sore Throat',
       "Barrett's Esophagus", 'Pain/Feve', 'Diverticulitis', 'Sinusitis',
       'Polycystic Ovary Syndrome', 'Influenza',
       'Hypothyroidism, After Thyroid Removal', 'Onychomycosis, Toenail',
       'Progesterone Insufficiency',
       '11</span> users found this comment helpful.', 'GERD',
       'Nausea/Vomiting, Postoperative', 'Herpes Simplex',
       'Gastroparesis', 'Gout, Acute', 'Motion Sickness',
       'Multiple Sclerosis', 'Autism', 'Otitis Media',
       'Upper Respiratory Tract Infection', 'Surgical Prophylaxis',
       'Psoriatic Arthritis', 'Erosive Esophagitis',
       'Premature Ventricular Depolarizations', 'Stomach Ulce',
       'Nausea/Vomiting', 'Light Anesthesia',
       'Angina Pectoris Prophylaxis',
       '0</span> users found this comment helpful.', 'Paranoid Disorde',
       'Prostatitis', 'Extrapyramidal Reaction', 'mance Anxiety',
       'Night Terrors', 'High Cholesterol, Familial Heterozygous',
       'Spondyloarthritis', 'Clostridial Infection', 'Dermatomyositis',
       'Bronchiectasis', 'Nasal Congestion', 'Benign Essential Trem',
       'Angina', 'moterol / mometasone)', 'Impetig',
       'Conjunctivitis, Bacterial', 'Post Traumatic Stress Disorde',
       'Alcohol Withdrawal', 'Psoriasis', 'Cold Sores', 'Hyperhidrosis',
       '1</span> users found this comment helpful.',
       'Ankylosing Spondylitis', 'Hemorrhoids',
       '142</span> users found this comment helpful.',
       'Schizoaffective Disorde', 'Not Listed / Othe', 'Rhinitis',
       'Oral Thrush', 'Hyperlipoproteinemia',
       'Neutropenia Associated with Chemotherapy', 'Osteoporosis',
       'Reflex Sympathetic Dystrophy Syndrome', 'Urticaria', 'Narcolepsy',
       'Systemic Lupus Erythematosus', 'Ulcerative Colitis',
       'Adult Human Growth Hormone Deficiency', 'Bacterial Vaginitis',
       'COPD, Maintenance', 'Anorexia', 'TSH Suppression',
       'Breast Cancer, Adjuvant', 'Glaucoma',
       'Cough and Nasal Congestion',
       '8</span> users found this comment helpful.',
       'Inflammatory Conditions', 'Urinary Incontinence', 'Gout',
       'Bladder Infection', 'Human Papillomavirus Prophylaxis',
       'Glioblastoma Multiforme', 'Strep Throat',
       'Bacterial Skin Infection', 'Hereditary Angioedema',
       'Cold Symptoms', 'Labor Pain', 'Dry Skin', 'Diabetes Insipidus',
       'Methicillin-Resistant Staphylococcus Aureus Infection',
       'Borderline Personality Disorde', 'Amenorrhea', 'Pneumonia',
       'Seborrheic Dermatitis', 'Interstitial Cystitis',
       'Malaria Prevention', 'Prevention of Dental Caries',
       'Herbal Supplementation', 'Plaque Psoriasis', "Raynaud's Syndrome",
       "Addison's Disease", 'Prostate Cance', 'Allergies',
       'Opioid-Induced Constipation', 'moterol)',
       '13</span> users found this comment helpful.', 'Diarrhea',
       'Seasonal Allergic Conjunctivitis', 'Opioid Overdose',
       'Spondylolisthesis', 'Shift Work Sleep Disorde',
       'Obstructive Sleep Apnea/Hypopnea Syndrome',
       'Nausea/Vomiting of Pregnancy', 'Mucositis',
       'Ulcerative Colitis, Active', 'Head Lice',
       'Tonsillitis/Pharyngitis', 'Pseudotumor Cerebri',
       "Parkinson's Disease", 'Sciatica', 'Cance',
       'Bacterial Endocarditis Prevention', 'Diarrhea, Chronic',
       'Hypertensive Emergency', 'Keratosis', 'Ovarian Cysts',
       "Behcet's Disease", 'Chronic Idiopathic Constipation',
       'lic Acid Deficiency', 'Chronic Fatigue Syndrome',
       'Basal Cell Carcinoma', 'Cataplexy',
       "Crohn's Disease, Maintenance", 'Diabetic Peripheral Neuropathy',
       'Arrhythmia', 'Primary Ovarian Failure', 'Influenza Prophylaxis',
       'Agitated State', 'Heart Failure', 'atigue', 'Opiate Withdrawal',
       'Endometrial Hyperplasia, Prophylaxis', 'Immunosuppression',
       'Dystonia', 'Alopecia', 'Vulvodynia',
       'Premenstrual Dysphoric Disorde', 'Alcohol Dependence',
       'Myasthenia Gravis', 'Social Anxiety Disorde', 'Atopic Dermatitis',
       'Schistosoma japonicum', 'Sinus Symptoms', 'min / sitagliptin)',
       'Dermatitis', 'Eye Redness', 'Warts', 'Menorrhagia',
       'Seizure Prevention', 'Ophthalmic Surgery', 'Skin Rash',
       'Condylomata Acuminata', 'NSAID-Induced Ulcer Prophylaxis',
       'Tinea Versicol', 'Peripheral Neuropathy', 'Deep Vein Thrombosis',
       '6</span> users found this comment helpful.', 'Heart Attack',
       'Pulmonary Embolism, Recurrent Event', 'Light Sedation',
       'Acute Lymphoblastic Leukemia', 'Hyperprolactinemia',
       'Indigestion', 'Hepatitis B', 'Dysautonomia', 'Status Epilepticus',
       'Postpartum Depression', 'Multiple Myeloma',
       'Prevention of Hypokalemia', 'Edema', 'Urinary Retention',
       'Prevention of Thromboembolism in Atrial Fibrillation',
       'Cluster Headaches', 'Sexual Dysfunction, SSRI Induced',
       'Dermatitis Herpetiformis', 'Temporomandibular Joint Disorde',
       'Burns, External', 'Actinic Keratosis', 'Pharyngitis',
       'Melanoma, Metastatic', 'Atrial Flutte', 'Lyme Disease',
       'Dry Eye Disease', 'Allergic Reactions', 'Hypertriglyceridemia',
       'Pruritus', 'Carcinoid Tum', 'Muscle Pain', 'Colorectal Cance',
       'Vitamin/Mineral Supplementation during Pregnancy/Lactation',
       'Nausea/Vomiting, Chemotherapy Induced', 'Women (oxybutynin)',
       'Primary Immunodeficiency Syndrome',
       'New Daily Persistent Headache',
       'Pneumococcal Disease Prophylaxis', 'Burning Mouth Syndrome',
       'Urinary Tract Stones', 'Pseudobulbar Affect',
       '94</span> users found this comment helpful.',
       'Eye Redness/Itching', 'Deep Vein Thrombosis, First Event',
       'Pulmonary Hypertension', 'Malaria', 'Sarcoidosis',
       'Dietary Supplementation', 'Bulimia', 'Tendonitis', 'Nasal Polyps',
       'Hypokalemia', 'Anemia, Sickle Cell', 'Uveitis',
       'Streptococcal Infection', 'Perimenopausal Symptoms',
       'Asperger Syndrome', 'Tinea Corporis', 'Mania',
       'Renal Cell Carcinoma', 'COPD', 'Biliary Cirrhosis', 'Vertig',
       'Reversal of Opioid Sedation', "Non-Hodgkin's Lymphoma",
       'High Cholesterol, Familial Homozygous',
       'Periodic Limb Movement Disorde', 'Supraventricular Tachycardia',
       'Hypoestrogenism', 'Juvenile Idiopathic Arthritis', 'Swine Flu',
       'Giardiasis', 'Binge Eating Disorde', "Tourette's Syndrome",
       'Trichomoniasis', 'acial Wrinkles',
       '28</span> users found this comment helpful.',
       'Pulmonary Embolism', 'Conjunctivitis, Allergic',
       'Avian Influenza', '16</span> users found this comment helpful.',
       'Hemangioma', 'Nocturnal Leg Cramps', 'Thyroid Suppression Test',
       'Muscle Twitching', 'Pupillary Dilation',
       'Lennox-Gastaut Syndrome', 'Opiate Adjunct', 'Postoperative Pain',
       'Candida Urinary Tract Infection', 'Cerebral Spasticity',
       'Lipodystrophy', 'Androgenetic Alopecia', 'Computed Tomography',
       'Mitral Valve Prolapse', 'Vitamin D Deficiency',
       'Glaucoma, Open Angle', 'Endoscopy or Radiology Premedication',
       "Alzheimer's Disease", 'Gouty Arthritis',
       'Paroxysmal Supraventricular Tachycardia',
       'Deep Vein Thrombosis, Prophylaxis', 'Gaucher Disease',
       'Lymphocytic Colitis', 'Pancreatic Cance', 'Cystic Fibrosis',
       'Noninfectious Colitis',
       '27</span> users found this comment helpful.', 'Nephrocalcinosis',
       'Iron Deficiency Anemia', 'mulation) (phenylephrine)', 'Hiccups',
       '75</span> users found this comment helpful.',
       'Bronchospasm Prophylaxis', 'Chronic Spasticity',
       'min / saxagliptin)', 'Post-Cholecystectomy Diarrhea',
       'Postherpetic Neuralgia', 'Insomnia, Stimulant-Associated',
       'COPD, Acute', 'Herpes Simplex Dendritic Keratitis',
       'Oophorectomy', 'Cyclic Vomiting Syndrome',
       'Chronic Lymphocytic Leukemia', 'Lyme Disease, Arthritis',
       'Pseudomembranous Colitis', 'Conjunctivitis',
       '15</span> users found this comment helpful.', 'min)',
       'Intraocular Hypertension', 'Aphthous Ulce',
       'Ulcerative Colitis, Maintenance', 'Melasma',
       'Lyme Disease, Neurologic', 'ge (amlodipine / valsartan)',
       'Herpes Zoste', '12</span> users found this comment helpful.',
       'Cervical Dystonia', 'Labor Induction', 'Human Papilloma Virus',
       'Chronic Pancreatitis', 'Polycythemia Vera',
       '9</span> users found this comment helpful.',
       'Dermatological Disorders', 'Lewy Body Dementia',
       'amilial Mediterranean Feve', 'Neurosurgery', 'Gastroenteritis',
       'Macular Edema', 'Tinea Pedis',
       '7</span> users found this comment helpful.',
       'Diagnosis and Investigation',
       '35</span> users found this comment helpful.', 'Gas', 'Neuralgia',
       'Local Anesthesia', '54</span> users found this comment helpful.',
       'Acute Coronary Syndrome', 'Aspiration Pneumonia',
       'Idiopathic Thrombocytopenic Purpura', 'Onychomycosis, Fingernail',
       'Photoaging of the Skin', 'Premature Lab', 'Precocious Puberty',
       'Prevention of Bladder infection', 'Seasonal Affective Disorde',
       'Diabetic Kidney Disease', "Crohn's Disease, Acute",
       'Insulin Resistance Syndrome', 'Pudendal Neuralgia',
       "Reiter's Syndrome", '17</span> users found this comment helpful.',
       'Amyotrophic Lateral Sclerosis', 'Body Dysmorphic Disorde',
       'Prosthetic Heart Valves, Mechanical Valves - Thrombosis Prophylaxis',
       'Dandruff', 'Vitamin B12 Deficiency', 'Bone infection',
       'Prosthetic Heart Valves, Tissue Valves - Thrombosis Prophylaxis',
       'Iritis', 'Allergic Urticaria', 'Cardiovascular Risk Reduction',
       'Giant Cell Tumor of Bone', 'Babesiosis',
       'Secondary Hyperparathyroidism', 'Hypoparathyroidism',
       'Performance Anxiety', 'Abortion', 'Skin Cance',
       'Ovulation Induction', 'Liver Magnetic Resonance Imaging',
       'Vitamin/Mineral Supplementation and Deficiency',
       '79</span> users found this comment helpful.',
       'Herpes Simplex, Mucocutaneous/Immunocompetent Host',
       '10</span> users found this comment helpful.',
       'Anemia Associated with Chronic Renal Failure',
       'Hyperphosphatemia of Renal Failure',
       'Dissociative Identity Disorde', 'Anal Fissure and Fistula',
       '14</span> users found this comment helpful.',
       'Herpes Simplex, Mucocutaneous/Immunocompromised Host', 'Scabies',
       '5</span> users found this comment helpful.', 'Endometrial Cance',
       'Transient Ischemic Attack', 'Granuloma Annulare',
       "Traveler's Diarrhea", 'Candidemia',
       't Pac with Cyclobenzaprine (cyclobenzaprine)',
       'Hypoactive Sexual Desire Disorde', 'Epicondylitis, Tennis Elbow',
       'Nightmares', 'Dientamoeba fragilis', 'Ventricular Tachycardia',
       'Dumping Syndrome', 'Myelodysplastic Syndrome', 'Hypodermoclysis',
       'zen Shoulde', 'Topical Disinfection', 'Perioral Dermatitis',
       'Agitation', 'Intermittent Claudication',
       'Prevention of Osteoporosis', 'Leukemia', 'Dermatitis Herpeti',
       'mis', 'Eosinophilic Esophagitis',
       'Hyperlipoproteinemia Type IIa, Elevated LDL',
       'Endometrial Hyperplasia',
       '19</span> users found this comment helpful.', 'Peptic Ulce',
       'Chronic Myofascial Pain', 'Enterocolitis',
       'Secondary Cutaneous Bacterial Infections', 'Syringomyelia',
       'Postoperative Ocular Inflammation',
       'Persistent Depressive Disorde', 'Otitis Externa',
       'Organ Transplant, Rejection Prophylaxis',
       'Intermittent Explosive Disorde', 'Dermatophytosis',
       'Inflammatory Bowel Disease', 'Porphyria', 'Anemia',
       'Hyperuricemia Secondary to Chemotherapy',
       'Wolff-Parkinson-White Syndrome', 'eve', 'Ectopic Pregnancy',
       'Thyroid Cance', 'Tuberculosis, Latent',
       'Nasal Carriage of Staphylococcus aureus', 'Systemic Candidiasis',
       'Ear Wax Impaction', 'Hepatocellular Carcinoma', 'Dyspareunia',
       '41</span> users found this comment helpful.', 'Tic Disorde',
       'Head and Neck Cance', 'Klinefelter Syndrome', 'Rhinorrhea',
       'Soft Tissue Sarcoma', 'Diabetic Macular Edema',
       'Menopausal Disorders', 'Anesthetic Adjunct', 'Tinea Cruris',
       'tic (mycophenolic acid)', 'Ischemic Stroke', 'Malignant Glioma',
       'Thrombocythemia', 'Atrophic Urethritis', 'Systemic Sclerosis',
       'Macular Degeneration', 'AIDS Related Wasting', 'Hemophilia A',
       'Osteolytic Bone Lesions of Multiple Myeloma',
       'Autoimmune Hemolytic Anemia', 'ailure to Thrive',
       'Strongyloidiasis', 'Vitamin K Deficiency', 'Ulcerative Proctitis',
       'Premenstrual Syndrome',
       '23</span> users found this comment helpful.',
       'Primary Hyperaldosteronism', 'Lactose Intolerance',
       'Anal Itching', 'amilial Cold Autoinflammatory Syndrome',
       'Duodenal Ulce', 'Tuberculosis, Prophylaxis', 'Neurosis',
       "Turner's Syndrome", 'NSAID-Induced Gastric Ulce',
       'CNS Magnetic Resonance Imaging', 'Atherosclerosis',
       'Deep Vein Thrombosis Prophylaxis after Hip Replacement Surgery',
       'Gastritis/Duodenitis', 'Diarrhea, Acute', 'Costochondritis',
       'Portal Hypertension', 'Glaucoma/Intraocular Hypertension',
       'Toothache', 'Benzodiazepine Withdrawal', 'm Pain Disorde',
       'Esophageal Candidiasis',
       'Deep Vein Thrombosis Prophylaxis after Knee Replacement Surgery',
       'Peripheral Arterial Disease',
       'Deep Vein Thrombosis, Recurrent Event', 'Pseudogout, Prophylaxis',
       'Lichen Planus', 'CMV Prophylaxis',
       '64</span> users found this comment helpful.', 'Neuritis',
       'Typhoid Feve', 'Tardive Dyskinesia', 'Ichthyosis',
       'Juvenile Rheumatoid Arthritis', 'B12 Nutritional Deficiency',
       '18</span> users found this comment helpful.',
       'Primary Nocturnal Enuresis',
       '146</span> users found this comment helpful.', "Dercum's Disease",
       'Cutaneous Candidiasis', 'Gingivitis', 'Q Feve', 'Hyperekplexia',
       '44</span> users found this comment helpful.', 'Niacin Deficiency',
       'Dietary Fiber Supplementation', 'Nephrotic Syndrome',
       'Pinworm Infection (Enterobius vermicularis)',
       'Pancreatic Exocrine Dysfunction',
       'Nausea/Vomiting, Radiation Induced', 'Schilling Test',
       'Mild Cognitive Impairment', 'Ischemic Stroke, Prophylaxis',
       '20</span> users found this comment helpful.',
       'Gonococcal Infection, Uncomplicated', 'Ovarian Cance',
       'Eyelash Hypotrichosis', "Meniere's Disease", 'Tinea Capitis',
       '21</span> users found this comment helpful.', 'Lichen Sclerosus',
       'min / pioglitazone)', 'Renal Transplant', 'Gout, Prophylaxis',
       "von Willebrand's Disease",
       'Prevention of Atherothrombotic Events', 'Small Fiber Neuropathy',
       '110</span> users found this comment helpful.',
       'min / rosiglitazone)', "Peyronie's Disease",
       'Autoimmune Hepatitis', 'llicular Lymphoma',
       'Auditory Processing Disorde', 'Herpes Zoster, Prophylaxis',
       'Submental Fullness', 'Lactation Augmentation',
       'Radionuclide Myocardial Perfusion Study',
       'Prevention of Cardiovascular Disease', 'Varicella-Zoste',
       'Pelvic Inflammatory Disease', 'Intraabdominal Infection', 'Croup',
       '85</span> users found this comment helpful.',
       'Dermatologic Lesion',
       'Hyperlipoproteinemia Type IV, Elevated VLDL', 'Expectoration',
       'Primary Hyperaldosteronism Diagnosis', 'Abdominal Distension',
       'Salivary Gland Cance', 'Pulmonary Embolism, First Event',
       'Postpartum Breast Pain',
       'Postural Orthostatic Tachycardia Syndrome',
       '46</span> users found this comment helpful.',
       'Pediatric Growth Hormone Deficiency', 'Hypomagnesemia',
       'ge HCT (amlodipine / hydrochlorothiazide / valsartan)',
       'Hairy Cell Leukemia', 'Histoplasmosis', 'Hypoglycemia',
       '31</span> users found this comment helpful.', 'Brain Tum',
       'Gastrointestinal Stromal Tum', 'Tetanus',
       'Breast Cancer, Prevention', 'ICU Agitation', 'Women (minoxidil)',
       'Peripheral T-cell Lymphoma',
       'Chronic Inflammatory Demyelinating Polyradiculoneuropathy',
       'Pathological Hypersecretory Conditions',
       'Oral and Dental Conditions', 'Antiphospholipid Syndrome',
       'Ventricular Arrhythmia', 'Asystole', "Wegener's Granulomatosis",
       'Thromboembolic Stroke Prophylaxis',
       'Platelet Aggregation Inhibition', 'Sleep Paralysis',
       'Rejection Prophylaxis', 'Delayed Puberty, Male', 'Ascariasis',
       '25</span> users found this comment helpful.',
       'Acute Promyelocytic Leukemia',
       '32</span> users found this comment helpful.', 'Bartonellosis',
       'Cyclothymic Disorde', 'Hypokalemic Periodic Paralysis',
       'Varicose Veins', 'Mononucleosis', 'Cachexia', 'Hyperkalemia',
       "Still's Disease", '48</span> users found this comment helpful.',
       'Dementia', 'Ocular Rosacea', 'Hidradenitis Suppurativa', 'SIADH',
       'Bullous Pemphigoid', 'Angioedema',
       'Mountain Sickness / Altitude Sickness',
       'Severe Mood Dysregulation', 'Cutaneous T-cell Lymphoma',
       'Adrenocortical Insufficiency', 'Myxedema Coma',
       'Small Bowel Bacterial Overgrowth', 'Sunburn',
       '33</span> users found this comment helpful.',
       'Transverse Myelitis', 'Squamous Cell Carcinoma', 'Parkinsonism',
       '22</span> users found this comment helpful.', 'Thyrotoxicosis',
       '29</span> users found this comment helpful.',
       '30</span> users found this comment helpful.',
       'Epididymitis, Sexually Transmitted', 'Neck Pain',
       'Bleeding Disorde', '63</span> users found this comment helpful.',
       'actor IX Deficiency', 'Melanoma', 'Thrombocytopenia',
       'Esophageal Variceal Hemorrhage Prophylaxis', 'Glioblastoma Multi',
       'Cholera', 'Anorexia/Feeding Problems',
       '45</span> users found this comment helpful.', 'Peritonitis',
       'AV Heart Block', 'Pe', "Wilson's Disease",
       'Nonalcoholic Fatty Liver Disease',
       '34</span> users found this comment helpful.', 'Sepsis', 'Anthrax',
       'Body Imaging', 'Aggressive Behavi', 'Hepatic Tum', 'Ehrlichiosis',
       'Hypopituitarism', 'Gender Dysphoria', 'Infectious Diarrhea',
       'Ventricular Fibrillation', 'Anaphylaxis', 'Pemphigus',
       'Multiple Endocrine Adenomas', 'Pre-Exposure Prophylaxis',
       'Postoperative Increased Intraocular Pressure',
       'Pruritus of Partial Biliary Obstruction', 'Pertussis',
       'Periodontitis', 'Lymphoma', 'Hypercalcemia of Malignancy',
       'Pityriasis rubra pilaris', 'Amebiasis', 't Care',
       'Hepatic Encephalopathy',
       '55</span> users found this comment helpful.',
       'Deep Neck Infection', 'Meningitis, Meningococcal',
       'Parkinsonian Trem', 'Rabies Prophylaxis',
       '39</span> users found this comment helpful.', 'Hypotension',
       'Myelofibrosis', '98</span> users found this comment helpful.',
       'cal Segmental Glomerulosclerosis',
       'Gastric Ulcer Maintenance Treatment', "Paget's Disease",
       'Infection Prophylaxis', 'Gastrointestinal Decontamination',
       'Mixed Connective Tissue Disease',
       '24</span> users found this comment helpful.',
       'Somatoform Pain Disorde', 'Esophageal Spasm',
       'Campylobacter Gastroenteritis', 'Hyperphosphatemia',
       'Oligospermia', 'Wound Cleansing', 'Euvolemic Hyponatremia',
       'Gallbladder Disease',
       '84</span> users found this comment helpful.',
       'Mycobacterium avium-intracellulare, Treatment',
       'Oppositional Defiant Disorde', 'Legionella Pneumonia',
       'Breast Cancer, Palliative', 'Hydrocephalus',
       'Hyperlipoproteinemia Type III, Elevated beta-VLDL   IDL',
       '36</span> users found this comment helpful.',
       'Anaplastic Astrocytoma', "Dupuytren's contracture",
       '40</span> users found this comment helpful.', 'Mumps Prophylaxis',
       'Skin Disinfection, Preoperative', 'Hyperbilirubinemia',
       'Meningitis', 'Corneal Ulce', 'acial Lipoatrophy',
       '43</span> users found this comment helpful.',
       'Percutaneous Coronary Intervention', 'Hepatitis B Prevention',
       'Tuberculosis, Active', 'Cerebrovascular Insufficiency',
       'Head Injury', 'Anti NMDA Receptor Encephalitis',
       'Nonoccupational Exposure',
       '72</span> users found this comment helpful.',
       'Gonadotropin Inhibition', 'unctional Gastric Disorde',
       'Chronic Eosinophilic Leukemia', 'Acetaminophen Overdose',
       'Duodenal Ulcer Prophylaxis', 'Paragonimus westermani, Lung Fluke',
       'Alpha-1 Proteinase Inhibitor Deficiency', "Cogan's Syndrome",
       'Uterine Bleeding', 'Stomach Cance', 'Sporotrichosis',
       'Cluster-Tic Syndrome', 'Gestational Diabetes',
       'Stress Ulcer Prophylaxis',
       'Reversal of Nondepolarizing Muscle Relaxants', 'Solid Tumors',
       'mist (', 'Schnitzler Syndrome', 'Hypocalcemia',
       '26</span> users found this comment helpful.',
       'Meningococcal Meningitis Prophylaxis', 'Nocardiosis',
       'Hemophilia B', '42</span> users found this comment helpful.',
       'Microscopic polyangiitis', 'Gonococcal Infection, Disseminated',
       'Neurotic Depression', 'Keratitis',
       '99</span> users found this comment helpful.',
       "Hodgkin's Lymphoma", 'me', 'STD Prophylaxis',
       '123</span> users found this comment helpful.',
       'Small Bowel or Pancreatic Fistula',
       'Prevention of Perinatal Group B Streptococcal Disease',
       '74</span> users found this comment helpful.', 'Cerebral Edema',
       'Testicular Cance', 'Short Stature for Age',
       '47</span> users found this comment helpful.',
       'Aspergillosis, Aspergilloma', 'Pemphigoid',
       'Hyperparathyroidism Secondary to Renal Impairment',
       '76</span> users found this comment helpful.',
       'Ramsay Hunt Syndrome', 'Cutaneous Larva Migrans',
       'Occipital Neuralgia', 'Blepharitis', 'Patent Ductus Arteriosus',
       'Joint Infection', '77</span> users found this comment helpful.',
       'Manscaping Pain', 'Strabismus',
       'Organ Transplant, Rejection Reversal',
       'Leukocytoclastic Vasculitis', 'Coronary Artery Disease',
       'Gastric Cance', 'ibrocystic Breast Disease',
       '121</span> users found this comment helpful.',
       'ungal Infection Prophylaxis', 'Short Stature', 'Hypercalcemia',
       'Coccidioidomycosis', 'Cyclitis', 'Anemia, Chemotherapy Induced',
       'Upper Limb Spasticity',
       '95</span> users found this comment helpful.',
       '61</span> users found this comment helpful.',
       'Diagnostic Bronchograms', 'Neoplastic Diseases',
       '51</span> users found this comment helpful.',
       'Mycoplasma Pneumonia', 'Linear IgA Disease',
       'Subarachnoid Hemorrhage', 'Myeloproliferative Disorders',
       'ungal Pneumonia', '145</span> users found this comment helpful.',
       'Scleroderma', 'Zollinger-Ellison Syndrome', 'Tinea Barbae',
       'Acute Nonlymphocytic Leukemia',
       '62</span> users found this comment helpful.',
       '92</span> users found this comment helpful.', 'Neutropenia'],
      dtype=object)

In [55]:

len(df['condition'].unique().tolist())

Out[55]:

Narrative

We have 885 different conditions

In [56]:

#### Distribution of Conditions
df['condition'].value_counts()

Out[56]:

Birth Control                                   28788
Depression                                       9069
Pain                                             6145
Anxiety                                          5904
Acne                                             5588
                                                ...  
Gonadotropin Inhibition                             1
Anti NMDA Receptor Encephalitis                     1
Aspergillosis, Aspergilloma                         1
40</span> users found this comment helpful.         1
121</span> users found this comment helpful.        1
Name: condition, Length: 884, dtype: int64

In [57]:

#### Most commonest Conditions
df['condition'].value_counts().nlargest(20)

Out[57]:

Birth Control                28788
Depression                    9069
Pain                          6145
Anxiety                       5904
Acne                          5588
Bipolar Disorde               4224
Insomnia                      3673
Weight Loss                   3609
Obesity                       3568
ADHD                          3383
Diabetes, Type 2              2554
Emergency Contraception       2463
High Blood Pressure           2321
Vaginal Yeast Infection       2274
Abnormal Uterine Bleeding     2096
Bowel Preparation             1859
ibromyalgia                   1791
Smoking Cessation             1780
Migraine                      1694
Anxiety and Stress            1663
Name: condition, dtype: int64

In [58]:

#### Most commonest Conditions
df['condition'].value_counts().nlargest(20).plot(kind='bar',figsize=(20,10))

Out[58]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f9167cd1cd0>

Narrative

The most commonest condition is Birth Control,followed by Depression and Pain and Anxiety
Makes sense as compared to the drug distribution

In [60]:

df['condition'].value_counts().nsmallest(20)

Out[60]:

Hemophilia B                                   1
Legionella Pneumonia                           1
Upper Limb Spasticity                          1
ungal Infection Prophylaxis                    1
Dercum's Disease                               1
Stomach Cance                                  1
Ventricular Arrhythmia                         1
Corneal Ulce                                   1
Pemphigoid                                     1
34</span> users found this comment helpful.    1
Bartonellosis                                  1
Thyrotoxicosis                                 1
77</span> users found this comment helpful.    1
Strongyloidiasis                               1
Hemangioma                                     1
64</span> users found this comment helpful.    1
Epicondylitis, Tennis Elbow                    1
Esophageal Spasm                               1
Cerebrovascular Insufficiency                  1
Ramsay Hunt Syndrome                           1
Name: condition, dtype: int64

In [59]:

#### Least commonest Conditions
df['condition'].value_counts().nsmallest(20).plot(kind='bar',figsize=(20,10))

Out[59]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f9166fab0d0>

Questions on Drugs and Conditions

How many drugs per condition

In [63]:

# How many Drugs per condition (Top 20)
df.groupby('condition')['drugName'].nunique().nlargest(20)

Out[63]:

condition
Not Listed / Othe                             214
Pain                                          200
Birth Control                                 172
High Blood Pressure                           140
Acne                                          117
Depression                                    105
Rheumatoid Arthritis                           98
Diabetes, Type 2                               89
Allergic Rhinitis                              88
Bipolar Disorde                                80
Osteoarthritis                                 80
Anxiety                                        78
Insomnia                                       78
Abnormal Uterine Bleeding                      74
Migraine                                       59
Psoriasis                                      58
3</span> users found this comment helpful.     57
Endometriosis                                  57
ADHD                                           55
Asthma, Maintenance                            54
Name: drugName, dtype: int64

In [66]:

# How many Drugs per condition (Top 20)
plt.figure(figsize=(15,10))
df.groupby('condition')['drugName'].nunique().nlargest(20).plot(kind='bar')
plt.title("Number of Drugs Per Condition")
plt.grid()
plt.show()

Narrative

Pain,Birth Control and HBP have the highest number of different/unique drugs for their condition

In [ ]:

#### Questions on Rating
+ Distribution of rating
+ Average Rating Per Count

In [78]:

df['rating']

Out[78]:

0          9.0
1          8.0
2          5.0
3          8.0
4          9.0
          ... 
161292    10.0
161293     1.0
161294     2.0
161295    10.0
161296     9.0
Name: rating, Length: 161297, dtype: float64

In [79]:

# Distrubtion of Rating By Size
df.groupby('rating').size()

Out[79]:

rating
1.0     21619
2.0      6931
3.0      6513
4.0      5012
5.0      8013
6.0      6343
7.0      9456
8.0     18890
9.0     27531
10.0    50989
dtype: int64

In [80]:

# Distrubtion of Rating By Size
df.groupby('rating').size().plot(kind='bar')

Out[80]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f91641b1890>

In [81]:

# # Distrubtion of Rating By Size Using Histogram
plt.figure(figsize=(20,10))
df['rating'].hist()
plt.title("Distrubtion of Rating By Size Using Histogram")
plt.show()

Narative

Most people rated at the extremes

In [83]:

# Average Rating of Drugs
avg_rating = (df['rating'].groupby(df['drugName']).mean())

In [84]:

avg_rating

Out[84]:

drugName
A + D Cracked Skin Relief               10.000000
A / B Otic                              10.000000
Abacavir / dolutegravir / lamivudine     8.211538
Abacavir / lamivudine / zidovudine       9.000000
Abatacept                                7.157895
                                          ...    
Zyvox                                    9.000000
ZzzQuil                                  2.500000
depo-subQ provera 104                    1.000000
ella                                     6.980392
femhrt                                   4.000000
Name: rating, Length: 3436, dtype: float64

In [86]:

# Average Rating For All Drugs
plt.figure(figsize=(20,10))
avg_rating.hist()
plt.title("Distrubtion of Average Rating For All Drugs")
plt.show()

In [92]:

# Average Rating of Drugs By Class
avg_rating_per_drug_class = (df['rating'].groupby(df['drug_class']).mean())

In [93]:

avg_rating_per_drug_class

Out[93]:

drug_class
ace inhibitor                        5.759259
alpha blocker                        6.954248
anesthetic                           5.937984
anti-anxiety                         8.543667
antibiotic                           6.500735
antibiotic (cephalosporins)          6.344828
antibiotic(penicillins)              7.033613
anticoagulants                       9.222222
antifungal (except metronidazole)    5.580100
antipyschotics (phenothiazine)       7.146084
arb blocker                          6.464286
barbiturate                          8.894737
beta blocker                         6.587629
beta blockers                        7.681159
calcium channel blocker              5.725322
corticosteroid (prednisone)          7.477427
h2 blockers (anti-ulcers)            7.280945
neuromuscular blocking agents        8.622222
opiod analgesics                     7.446388
oral hypoglycemics                   7.268917
pituitary hormone                    8.500000
thrombolytics                        7.103448
Name: rating, dtype: float64

In [94]:

# Average Rating For All Drugs
plt.figure(figsize=(20,10))
avg_rating_per_drug_class.hist()
plt.title("Distrubtion of Average Rating For Drug Classes")
plt.show()

In [96]:

# Which Group of Drugs have the higest mean/average rating
avg_rating_per_drug_class.nlargest(20)

Out[96]:

drug_class
anticoagulants                    9.222222
barbiturate                       8.894737
neuromuscular blocking agents     8.622222
anti-anxiety                      8.543667
pituitary hormone                 8.500000
beta blockers                     7.681159
corticosteroid (prednisone)       7.477427
opiod analgesics                  7.446388
h2 blockers (anti-ulcers)         7.280945
oral hypoglycemics                7.268917
antipyschotics (phenothiazine)    7.146084
thrombolytics                     7.103448
antibiotic(penicillins)           7.033613
alpha blocker                     6.954248
beta blocker                      6.587629
antibiotic                        6.500735
arb blocker                       6.464286
antibiotic (cephalosporins)       6.344828
anesthetic                        5.937984
ace inhibitor                     5.759259
Name: rating, dtype: float64

In [97]:

# Which Drugs have the higest mean/average rating
avg_rating.nlargest(20)

Out[97]:

drugName
A + D Cracked Skin Relief                              10.0
A / B Otic                                             10.0
Absorbine Jr.                                          10.0
Accolate                                               10.0
Acetaminophen / caffeine / magnesium salicylate        10.0
Acetaminophen / dextromethorphan / doxylamine          10.0
Acetaminophen / phenylephrine                          10.0
Acetaminophen / pseudoephedrine                        10.0
Acetic acid / antipyrine / benzocaine / polycosanol    10.0
Acrivastine / pseudoephedrine                          10.0
Acyclovir / hydrocortisone                             10.0
Advil Cold and Sinus Liqui-Gels                        10.0
Aerobid-M                                              10.0
Afrin 4 Hour Extra Moisturizing                        10.0
Ala-Quin                                               10.0
Alavert                                                10.0
Aldactazide                                            10.0
Alefacept                                              10.0
Alka-Seltzer Cold and Sinus                            10.0
Allegra ODT                                            10.0
Name: rating, dtype: float64

In [98]:

df.columns

Out[98]:

Index(['Unnamed: 0', 'drugName', 'condition', 'review', 'rating', 'date',
       'usefulCount', 'drug_class'],
      dtype='object')

In [ ]:

### Question on Review
+ How genuine is the review? (Using sentiment analysis)
+ How many reviews are positive,negative,neutral?
+ Correlation between rating and review and users who found the review useful
+ Distribution of rating
+ Amount of review made per year and per month
+ Which condition has the most review on drugs
+ Can you predict the rating using the review?

In [99]:

# How genuine is the review? (Using sentiment analysis)
from textblob import TextBlob

In [100]:

df['review']

Out[100]:

0         "It has no side effect, I take it in combinati...
1         "My son is halfway through his fourth week of ...
2         "I used to take another oral contraceptive, wh...
3         "This is my first time using any form of birth...
4         "Suboxone has completely turned my life around...
                                ...                        
161292    "I wrote my first report in Mid-October of 201...
161293    "I was given this in IV before surgey. I immed...
161294    "Limited improvement after 4 months, developed...
161295    "I&#039;ve been on thyroid medication 49 years...
161296    "I&#039;ve had chronic constipation all my adu...
Name: review, Length: 161297, dtype: object

In [101]:

def get_sentiment(text):
    blob = TextBlob(text)
    return blob.polarity

def get_sentiment_label(text):
    blob = TextBlob(text)
    if blob.polarity > 0:
        result = 'positive'
    elif blob.polarity < 0:
        result = 'negative'
    else:
        result = 'neutral'
    return result

In [102]:

# text fxn
get_sentiment("I love apples")

Out[102]:

0.5

In [104]:

# text fxn
get_sentiment_label("I love apples")

Out[104]:

'positive'

In [105]:

# Sentiment Score for Review
df['sentiment'] = df['review'].apply(get_sentiment)

In [106]:

# Sentiment Labels for Review
df['sentiment_label'] = df['review'].apply(get_sentiment_label)

In [107]:

df[['review','sentiment','sentiment_label']]

Out[107]:

	review	sentiment	sentiment_label
0	“It has no side effect, I take it in combinati…	0.000000	neutral
1	“My son is halfway through his fourth week of …	0.168333	positive
2	“I used to take another oral contraceptive, wh…	0.067210	positive
3	“This is my first time using any form of birth…	0.179545	positive
4	“Suboxone has completely turned my life around…	0.194444	positive
…	…	…	…
161292	“I wrote my first report in Mid-October of 201…	0.262917	positive
161293	“I was given this in IV before surgey. I immed…	-0.276389	negative
161294	“Limited improvement after 4 months, developed…	-0.223810	negative
161295	“I've been on thyroid medication 49 years…	0.212597	positive
161296	“I've had chronic constipation all my adu…	0.085417	positive

161297 rows × 3 columns

In [109]:

# How many positive and negative and neutral reviews?
df['sentiment_label'].value_counts()

Out[109]:

positive    101041
negative     53303
neutral       6953
Name: sentiment_label, dtype: int64

In [110]:

# How many positive and negative and neutral reviews?
df['sentiment_label'].value_counts().plot(kind='bar')

Out[110]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f9163e91c10>

In [111]:

#### Correlation Between Our sentiment and rating
sns.lineplot(data=df,x='rating',y='sentiment')
plt.show()

Narrative

The rating increases with increase in sentiment

In [112]:

# Correlation  btween rating and sentiment
sns.lineplot(data=df,x='rating',y='sentiment',hue='sentiment_label')

Out[112]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f9163e1ac10>

In [ ]:

# How many reviews are genuine as compared to the rating
+ genuine good rating =positive + rating 10-6
+ genuine bad rating = negative + rating 4-1

In [119]:

# Genuine Good  Rating Per Review
good_review =  df[(df['rating'] >= 6) & (df['sentiment_label'] == 'positive')]

In [117]:

# Genuine Bad  Rating Per Review
bad_review = df[(df['rating'] <= 4) & (df['sentiment_label'] == 'negative')]

In [120]:

good_review.head()

Out[120]:

	Unnamed: 0	drugName	condition	review	rating	date	usefulCount	drug_class	sentiment	sentiment_label
1	95260	Guanfacine	ADHD	“My son is halfway through his fourth week of …	8.0	April 27, 2010	192	None	0.168333	positive
3	138000	Ortho Evra	Birth Control	“This is my first time using any form of birth…	8.0	November 3, 2015	10	None	0.179545	positive
4	35696	Buprenorphine / naloxone	Opiate Dependence	“Suboxone has completely turned my life around…	9.0	November 27, 2016	37	None	0.194444	positive
7	102654	Aripiprazole	Bipolar Disorde	“Abilify changed my life. There is hope. I was…	10.0	March 14, 2015	32	antifungal (except metronidazole)	0.074107	positive
9	48928	Ethinyl estradiol / levonorgestrel	Birth Control	“I had been on the pill for many years. When m…	8.0	December 8, 2016	1	None	0.079167	positive

In [122]:

good_review.iloc[0]['review']

Out[122]:

'"My son is halfway through his fourth week of Intuniv. We became concerned when he began this last week, when he started taking the highest dose he will be on. For two days, he could hardly get out of bed, was very cranky, and slept for nearly 8 hours on a drive home from school vacation (very unusual for him.) I called his doctor on Monday morning and she said to stick it out a few days. See how he did at school, and with getting up in the morning. The last two days have been problem free. He is MUCH more agreeable than ever. He is less emotional (a good thing), less cranky. He is remembering all the things he should. Overall his behavior is better. \r\nWe have tried many different medications and so far this is the most effective."'

In [ ]:

#### Questions on UsefulCount
+ number of users who found review useful
+  Top UsefulCount By Drugs/Class
+ Best drugs based usefulcount

In [124]:

df.groupby('drugName')['usefulCount'].value_counts()

Out[124]:

drugName                              usefulCount
A + D Cracked Skin Relief             6              1
A / B Otic                            20             1
Abacavir / dolutegravir / lamivudine  9              6
                                      1              5
                                      12             5
                                                    ..
ella                                  32             1
                                      42             1
femhrt                                0              1
                                      2              1
                                      42             1
Name: usefulCount, Length: 54324, dtype: int64

In [126]:

# Top Drugs Per UsefulCount
df.groupby('drugName')['usefulCount'].nunique().nlargest(20)

Out[126]:

drugName
Fluoxetine       181
Gabapentin       181
Bupropion        177
Citalopram       176
Sertraline       172
Escitalopram     171
Prozac           171
Zoloft           171
Lexapro          169
Celexa           166
Amitriptyline    162
Lorcaserin       157
Trazodone        157
Duloxetine       153
Phentermine      150
Belviq           148
Alprazolam       146
Cymbalta         144
Venlafaxine      144
BuSpar           141
Name: usefulCount, dtype: int64

In [127]:

# Top Drugs Per UsefulCount
df.groupby('drugName')['usefulCount'].nunique().nlargest(20).plot(kind='bar')

Out[127]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f91642abc10>

In [128]:

# Top Drugs Class Per UsefulCount
df.groupby('drug_class')['usefulCount'].nunique().nlargest(20)

Out[128]:

drug_class
opiod analgesics                     212
anti-anxiety                         198
oral hypoglycemics                   157
h2 blockers (anti-ulcers)            147
antifungal (except metronidazole)    139
arb blocker                          129
beta blockers                        123
antibiotic                           118
ace inhibitor                        111
calcium channel blocker              108
corticosteroid (prednisone)           97
antipyschotics (phenothiazine)        95
alpha blocker                         73
beta blocker                          65
antibiotic(penicillins)               60
thrombolytics                         59
anesthetic                            47
neuromuscular blocking agents         37
antibiotic (cephalosporins)           20
barbiturate                           16
Name: usefulCount, dtype: int64

In [129]:

# Top Drugs Class Per UsefulCount
df.groupby('drug_class')['usefulCount'].nunique().nlargest(20).plot(kind='bar')
plt.title("Top Drug Class Per Usefulcount")
plt.show()

In [130]:

# Top Drugs Class Per UsefulCount
df.groupby('drug_class')['usefulCount'].nunique().nsmallest(20).plot(kind='bar')
plt.title("Least Drug Class Per Usefulcount")
plt.show()

In [131]:

### Correlation between Rating and Usefulcount
sns.lineplot(data=df,x='rating',y='usefulCount')

Out[131]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f916410b8d0>

Narrative

As the rating goes up the usefulcount goes up

In [133]:

#### Question on Date
df.columns

Out[133]:

Index(['Unnamed: 0', 'drugName', 'condition', 'review', 'rating', 'date',
       'usefulCount', 'drug_class', 'sentiment', 'sentiment_label'],
      dtype='object')

In [134]:

# Rating Per Year
df.groupby('date')['rating'].size()

Out[134]:

date
April 1, 2008        28
April 1, 2009        21
April 1, 2010        16
April 1, 2011        12
April 1, 2012        21
                     ..
September 9, 2013    44
September 9, 2014    45
September 9, 2015    90
September 9, 2016    99
September 9, 2017    55
Name: rating, Length: 3579, dtype: int64

In [135]:

# Averaging Rating Per Day of A Year
df.groupby('date')['rating'].mean()

Out[135]:

date
April 1, 2008        8.285714
April 1, 2009        7.666667
April 1, 2010        7.812500
April 1, 2011        8.583333
April 1, 2012        9.238095
                       ...   
September 9, 2013    8.295455
September 9, 2014    8.800000
September 9, 2015    5.733333
September 9, 2016    6.777778
September 9, 2017    5.127273
Name: rating, Length: 3579, dtype: float64

In [138]:

# Average Rating Per Day of Every Year
df.groupby('date')['rating'].mean().plot(figsize=(20,10))
plt.title("Average Rating Per Day of Every Year")
plt.show()

In [139]:

# Average Useful Per Day of Every Year
df.groupby('date')['usefulCount'].mean().plot(figsize=(20,10))
plt.title("Average UsefulCount Per Day of Every Year")
plt.show()

In [140]:

# Average Sentiment Per Day of Every Year
df.groupby('date')['sentiment'].mean().plot(figsize=(20,10))
plt.title("Average sentiment Per Day of Every Year")
plt.show()

In [144]:

# Amount of Review Per Day of Every Year
df.groupby('date')['review'].size().plot(figsize=(20,10))
plt.title("Amount of Review Per Day of Every Year")
plt.show()

In [145]:

# Amount of Review Per Day of Every Year
df.groupby('date')['review'].size().plot(kind='bar',figsize=(20,10))
plt.title("Amount of Review Per Day of Every Year")
plt.show()

In [ ]:

In [150]:

####  Using DatetimeIndex
grouped_date = df.groupby('date').agg({'rating':np.mean,'usefulCount':np.sum,'review':np.size})

In [151]:

grouped_date

Out[151]:

	rating	usefulCount	review
date
April 1, 2008	8.285714	2303	28
April 1, 2009	7.666667	3698	21
April 1, 2010	7.812500	342	16
April 1, 2011	8.583333	216	12
April 1, 2012	9.238095	1178	21
…	…	…	…
September 9, 2013	8.295455	1941	44
September 9, 2014	8.800000	2935	45
September 9, 2015	5.733333	1901	90
September 9, 2016	6.777778	1728	99
September 9, 2017	5.127273	298	55

3579 rows × 3 columns

In [154]:

grouped_date.index

Out[154]:

Index(['April 1, 2008', 'April 1, 2009', 'April 1, 2010', 'April 1, 2011',
       'April 1, 2012', 'April 1, 2013', 'April 1, 2014', 'April 1, 2015',
       'April 1, 2016', 'April 1, 2017',
       ...
       'September 9, 2008', 'September 9, 2009', 'September 9, 2010',
       'September 9, 2011', 'September 9, 2012', 'September 9, 2013',
       'September 9, 2014', 'September 9, 2015', 'September 9, 2016',
       'September 9, 2017'],
      dtype='object', name='date', length=3579)

In [155]:

grouped_date['date'] = grouped_date.index

In [157]:

grouped_date['date'] = pd.DatetimeIndex(grouped_date['date'])

In [158]:

grouped_date.dtypes

Out[158]:

rating                float64
usefulCount             int64
review                  int64
date           datetime64[ns]
dtype: object

In [159]:

grouped_date = grouped_date.set_index('date')

In [161]:

# Select A Particular Date Range
grouped_date['2008'].plot()

Out[161]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f915b5241d0>

In [164]:

# AMount of Review Fr 2008
grouped_date['2008']['review'].plot()
plt.title("Amount of Review For 2008")
plt.show()

In [166]:

# AMount of Review Fr 2008
grouped_date['2008':'2009']['review'].plot()
plt.title("Amount of Review For 2008-2009")
plt.show()

In [167]:

# Distribution of Rating Over Time
grouped_date['2008':'2009']['rating'].plot()
plt.title("Distribution of Rating Over Time")
plt.show()

In [169]:

# Distribution of Rating Over Time
grouped_date['2008':'2012']['rating'].plot(figsize=(20,10))
plt.title("Distribution of Rating Over Time")
plt.show()

In [172]:

grouped_date['2008-04'].plot()

Out[172]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f915889f110>

In [173]:

# Distribution of Rating Over A Month
grouped_date['2008-4':'2008-5']['rating'].plot()
plt.title("Distribution of Rating Over Time")
plt.show()

In [174]:

# Save Dataset
df.to_csv("drug_review_dataset_with_sentiment.csv",index=False)

You can also check out the video tutorial on YouTube or below

Thanks for Your Time

Jesus Saves

By Jesse E.Agbe(JCharis)

Exploratory Data Analysis of Drug Review Dataset using Python

Data Science EDA Project From Scratch with Python

DataSource

Attributes

Questions

Question on Drugs

Narrative

Question on Drugs

Narrative

Narrative

Narrative

Narrative

Questions on Drugs and Conditions

Narrative

Narative

Narrative

Narrative

Leave a Comment Cancel Reply