Exploratory Data Analysis of Drug Review Dataset using Python

Exploratory Data Analysis is an important aspect of any data science project. It forms the initial steps before moving into the Machine learning aspects.

In this tutorial we will be exploring the drug review dataset using python in an elaborate way.  In doing EDA (exploratory data analysis) it is recommended to keep in mind the basic questions you want to find answers to using your dataset. This will direct you on the various analysis to use and how deep to explore the given data for more insight. In our case we will be breaking our questions into questions on the following

  • Drugs
  • Reviews
  • Ratings
  • Conditions
  • Combinations

We will be using the dataset from UCI machine learning repository which already have some basic info about what we will be doing.

By the end of this tutorial you will learn about

  • The various libraries to use for EDA
  • Descriptive analytics
  • How to do value counts
  • How to generate some plots for more insights
  • How to classify drugs based on their suffixes
  • How to do sentiment analysis on drug reviews
  • How to find and identify genuine review
  • Time series analysis on drug review and rating
  • Distribution Analysis
  • and More

You can get the entire code on Github here.

Let us start.

Data Science EDA Project From Scratch with Python

  • Tools & Libraries
    • EDA: Pandas
    • Viz: Seaborn,Matplotlib
    • NLP:spaCy,TextBlob,NeatText
    • ML: sklearn,xgboost,pycaret

DataSource

Attributes

  1. drugName (categorical): name of drug
  2. condition (categorical): name of condition
  3. review (text): patient review
  4. rating (numerical): 10 star patient rating
  5. date (date): date of review entry
  6. usefulCount (numerical): number of users who found review useful

Questions

  • Types of questions we can ask?(Drugs,Review,Rating,Conditions,Time,Genuiness,etc)
  • What is the most popular drug?
  • What are the groups/classification of drugs used?
  • Which Drug has the best review?
  • How many drugs do we have?
  • The number of drugs per condition
  • Number of patients that searched on a particular drug
  • How genuine is the review? (Using sentiment analysis)
  • How many reviews are positive,negative,neutral?
  • Correlation between rating and review and users who found the review useful
  • Can you predict the rating using the review?
  • Distribution of rating
  • Amount of review made per year and per month
  • Which condition has the most review on drugs
In [2]:
# Load EDA Pkgs
import pandas as pd
import numpy as np
In [3]:
# Load Data Viz
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
/usr/local/lib/python3.7/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  import pandas.util.testing as tm
In [4]:
# Load Sentiment Pkgs
from textblob import TextBlob

Question on Drugs

  • How many drugs do we have?
  • What is the most popular drug?
  • What are the groups/classification of drugs used?
  • Which Drug has the best review?
  • The number of drugs per condition
  • Number of patients that searched on a particular drug
In [27]:
# Load Dataset
df = pd.read_csv("drugsCom_raw/drugsComTrain_raw.tsv",sep='\t')
In [28]:
# Preview Dataset
df.head()
Out[28]:
Unnamed: 0 drugName condition review rating date usefulCount
0 206461 Valsartan Left Ventricular Dysfunction “It has no side effect, I take it in combinati… 9.0 May 20, 2012 27
1 95260 Guanfacine ADHD “My son is halfway through his fourth week of … 8.0 April 27, 2010 192
2 92703 Lybrel Birth Control “I used to take another oral contraceptive, wh… 5.0 December 14, 2009 17
3 138000 Ortho Evra Birth Control “This is my first time using any form of birth… 8.0 November 3, 2015 10
4 35696 Buprenorphine / naloxone Opiate Dependence “Suboxone has completely turned my life around… 9.0 November 27, 2016 37
In [29]:
# Columns
df.columns
Out[29]:
Index(['Unnamed: 0', 'drugName', 'condition', 'review', 'rating', 'date',
       'usefulCount'],
      dtype='object')
In [30]:
# Missing Values
df.isnull().sum()
Out[30]:
Unnamed: 0       0
drugName         0
condition      899
review           0
rating           0
date             0
usefulCount      0
dtype: int64

Narrative

  • Most of the missing values are in the condition column
  • This implies that most people don’t know their condition by name or privacy

Question on Drugs

  • How many drugs do we have?
In [32]:
# How many drugs do we have?
df['drugName'].unique().tolist()
Out[32]:
['Valsartan',
 'Guanfacine',
 'Lybrel',
 'Ortho Evra',
 'Buprenorphine / naloxone',
 'Cialis',
 'Levonorgestrel',
 'Aripiprazole',
 'Keppra',
 'Ethinyl estradiol / levonorgestrel',
 'Topiramate',
 'L-methylfolate',
 'Pentasa',
 'Dextromethorphan',
 'Nexplanon',
 'Liraglutide',
 'Trimethoprim',
 'Amitriptyline',
 'Lamotrigine',
 'Nilotinib',
 'Atripla',
 'Trazodone',
 'Etonogestrel',
 'Etanercept',
 'Tioconazole',
 'Azithromycin',
 'Eflornithine',
 'Daytrana',
 'Ativan',
 'Imitrex',
 'Sertraline',
 'Toradol',
 'Viberzi',
 'Mobic',
 'Dulcolax',
 'Morphine',
 'MoviPrep',
 'Trilafon',
 'Fluconazole',
 'Contrave',
 'Clonazepam',
 'Metaxalone',
 'Venlafaxine',
 'Ledipasvir / sofosbuvir',
 'Symbyax',
 'Tamsulosin',
 'Doxycycline',
 'Dulaglutide',
 'Intuniv',
 'Buprenorphine',
 'Qvar',
 'Opdivo',
 'Pyridium',
 'Latuda',
 'Bupropion',
 'Implanon',
 'Effexor XR',
 'Drospirenone / ethinyl estradiol',
 'NuvaRing',
 'Prepopik',
 'Tretinoin',
 'Gildess Fe 1 / 20',
 'Ethinyl estradiol / norgestimate',
 'Elbasvir / grazoprevir',
 'Clomiphene',
 'Docusate / senna',
 'Amitiza',
 'Sildenafil',
 'Lo Loestrin Fe',
 'Oxcarbazepine',
 'Wellbutrin',
 "Phillips' Milk of Magnesia",
 'Nature-Throid',
 'Lithium',
 'Oxycodone',
 'Estradiol',
 'Sronyx',
 'Augmentin XR',
 'Monistat 7-Day Combination Pack',
 'Plan B One-Step',
 'Alprazolam',
 'Fluoxetine',
 'Spironolactone',
 'Fluvoxamine',
 'Macrobid',
 'Lurasidone',
 'Adapalene / benzoyl peroxide',
 'Brimonidine',
 'Amlodipine / olmesartan',
 'Loestrin 24 Fe',
 'Linaclotide',
 'Mirtazapine',
 'Acetaminophen / hydrocodone',
 'Isotretinoin',
 'Ropinirole',
 'Zoledronic acid',
 'Lamictal',
 'Buspirone',
 'Propranolol',
 'Focalin',
 'Jolivette',
 'Levofloxacin',
 'Phentermine / topiramate',
 'Cephalexin',
 'Aviane',
 'Saxenda',
 'Clomipramine',
 'Medroxyprogesterone',
 'Aczone',
 'Nicoderm CQ',
 'Naltrexone',
 'Restasis',
 'Depo-Provera',
 'Olanzapine',
 'Oxytrol',
 'Fentanyl',
 'Epiduo',
 'Accutane',
 'Xanax',
 'Desvenlafaxine',
 'Urea',
 'Lyrica',
 'Phenergan',
 'Loestrin 21 1 / 20',
 'Loratadine',
 'Cardura XL',
 'Viibryd',
 'Mirena',
 'Ethinyl estradiol / norelgestromin',
 'Propofol',
 'Camphor / menthol',
 'Hydroxychloroquine',
 'Lorcaserin',
 'Insulin degludec',
 'Trintellix',
 'Lupron Depot',
 'Zanaflex',
 'Miconazole',
 'Opana ER',
 'Provera',
 'Diflucan',
 'Ibrance',
 'Reclipsen',
 'Lisinopril',
 'Empagliflozin',
 'Naproxen',
 'Amoxicillin / clarithromycin / lansoprazole',
 'Metoprolol',
 'Naloxegol',
 'Skyla',
 'Leuprolide',
 'Ulipristal',
 'Benzonatate',
 'Sulfamethoxazole / trimethoprim',
 'Eletriptan',
 'Escitalopram',
 'Dulera',
 'Prempro',
 'Gemfibrozil',
 'Depakote',
 'Testosterone',
 'Zomig',
 'Vyvanse',
 'Solodyn',
 'Efavirenz / emtricitabine / tenofovir',
 'Methimazole',
 'Ortho Tri-Cyclen',
 'Aleve',
 'Tylenol with Codeine #3',
 'Victoza',
 'Lubiprostone',
 'Ethinyl estradiol / norethindrone',
 'Sovaldi',
 'Pristiq',
 'Temozolomide',
 'Nabumetone',
 'Meloxicam',
 'Cevimeline',
 'ProAir RespiClick',
 'Gabapentin',
 'Relpax',
 'Levomilnacipran',
 'Yaz',
 'Valtrex',
 'Clindamycin',
 'BuSpar',
 'Plan B',
 'Trolamine salicylate',
 'Lisdexamfetamine',
 'Qsymia',
 'Rizatriptan',
 'Ziana',
 'Boudreaux Butt Paste',
 'Cymbalta',
 'Zoloft',
 'Tizanidine',
 'Gastrocrom',
 'Seasonique',
 'Amphetamine / dextroamphetamine',
 'Liletta',
 'Exenatide',
 'Paroxetine',
 'Bontril Slow Release',
 'Levothroid',
 'Carbamazepine',
 'Adipex-P',
 'Bydureon',
 'Bupropion / naltrexone',
 'Voltaren-XR',
 'Pimecrolimus',
 'Acetaminophen / oxycodone',
 'Monistat 7',
 'Pramipexole',
 'AndroGel',
 'Nitrofurantoin',
 'Metronidazole',
 'Ziprasidone',
 'Acetaminophen / butalbital / caffeine',
 'Nuvigil',
 'Moxifloxacin',
 'Methadone',
 'Celecoxib',
 'Aspirin / butalbital / caffeine',
 'Montelukast',
 'Saliva substitutes',
 'Atomoxetine',
 'Anastrozole',
 'Phenol',
 'Duloxetine',
 'Magnesium sulfate / potassium sulfate / sodium sulfate',
 'Lansoprazole',
 'Nardil',
 'Milnacipran',
 'Oxymorphone',
 'Acetaminophen / aspirin / caffeine',
 'Levora',
 'ParaGard',
 'Levaquin',
 'Ciprofloxacin',
 'Avelox',
 'Acidophilus',
 'Metformin',
 'Terconazole',
 'Saphris',
 'Augmentin',
 'Lexapro',
 'Tamiflu',
 'Prazosin',
 'Liothyronine',
 'Seroquel',
 'Terbinafine',
 'Valium',
 'Norco',
 'Progesterone',
 'Concerta',
 'Ocella',
 'Strattera',
 'Mylanta',
 'TriNessa',
 'Goserelin',
 'Quetiapine',
 'Testim',
 'Emend',
 'Methylphenidate',
 'Acyclovir',
 'Linzess',
 'Orthovisc',
 'Silodosin',
 'Metoclopramide',
 'Indomethacin',
 'Copper',
 'Meclizine',
 'Gilenya',
 'Microgestin Fe 1 / 20',
 'Klonopin',
 'Codeine / guaifenesin',
 'Citalopram',
 'Colazal',
 'Fiorinal with Codeine',
 'Zolpidem',
 'Wellbutrin XL',
 'Climara Pro',
 'Clarithromycin',
 'Bactrim DS',
 'Varenicline',
 'Amoxicillin / clavulanate',
 'Tolterodine',
 'Hydroxyzine',
 'Blisovi Fe 1 / 20',
 'Ramelteon',
 'Infliximab',
 'Rabeprazole',
 'Dexilant',
 'Immune globulin oral',
 'Nucynta',
 'Hysingla ER',
 'Pantoprazole',
 'Sprycel',
 'Tri-Sprintec',
 'Doxepin',
 'Zofran',
 'Versed',
 'Cipro',
 'MS Contin',
 'Avonex',
 'Focalin XR',
 'Junel Fe 1 / 20',
 'Imdur',
 'Diazepam',
 'Bisacodyl',
 'Nortrel 1 / 35',
 'Suvorexant',
 'Risperidone',
 'Sprintec',
 'Risperdal',
 'Simponi',
 'Euflexxa',
 'Diclofenac',
 'Dimenhydrinate',
 'Benztropine',
 'Ortho Tri-Cyclen Lo',
 'Ibuprofen',
 'Sumatriptan',
 'Polyethylene glycol 3350',
 'Xenical',
 'Glatiramer',
 'Atarax',
 'Orsythia',
 'Alirocumab',
 'Sublimaze',
 'Ethinyl estradiol / etonogestrel',
 'femhrt',
 'Lortab',
 'Haldol',
 'Tapentadol',
 'Ketorolac',
 'Phentermine',
 'Effexor',
 'Remicade',
 'Flexeril',
 'SMZ-TMP DS',
 'Dutasteride',
 'Thyroid desiccated',
 'Opana',
 'Zyprexa',
 'Vedolizumab',
 'Chantix',
 'Suboxone',
 'Mirapex',
 'Uribel',
 'Vicks Sinex Nasal Spray (old formulation)',
 'Bactrim',
 'Depo-Provera Contraceptive',
 'Ceftriaxone',
 'Ranexa',
 'Trulicity',
 'Percocet',
 'Sitagliptin',
 'Formoterol / mometasone',
 'Ambien',
 'Belviq',
 'Retapamulin',
 'Vigamox',
 'Levetiracetam',
 'Elidel',
 'Abilify Discmelt',
 'Delsym 12 Hour Cough Relief',
 'Baclofen',
 'Halobetasol',
 'Azelastine / fluticasone',
 'Drysol',
 'Alavert D-12 Hour Allergy and Sinus',
 'Cimzia',
 'Rexulti',
 'Proctofoam',
 'Alli',
 'Cefdinir',
 'Tramadol',
 'Norflex',
 'Humira',
 'Tecfidera',
 'Acetaminophen / dichloralphenazone / isometheptene mucate',
 'Benzoyl peroxide / clindamycin',
 'Methyl salicylate',
 'Monistat 3-Day Combination Pack',
 'Clindamycin / tretinoin',
 'Flonase',
 'Norethindrone',
 'Alvesco',
 'Nystatin',
 'EContra EZ',
 'TriCor',
 'Diphenhydramine',
 'Neulasta',
 'Zolmitriptan',
 'Minastrin 24 Fe',
 'Levothyroxine',
 'Mononessa',
 'Differin',
 'Ibandronate',
 'Zithromax',
 'Compazine',
 'Topamax',
 'Ustekinumab',
 'Minocycline',
 'Ultram ER',
 'Nortriptyline',
 'Pregabalin',
 'Suprep Bowel Prep Kit',
 'Armour Thyroid',
 'Jolessa',
 'Mirvaso',
 'Atralin',
 'Crestor',
 'Rozerem',
 'Cryselle',
 'Sucralfate',
 'Efinaconazole',
 'Cetirizine',
 'Amoxicillin',
 'Soma',
 'Neupro',
 'Valacyclovir',
 'Toprol-XL',
 'Sodium oxybate',
 'Mesalamine',
 'Orlistat',
 'Butorphanol',
 'Humatrope',
 'Diltiazem',
 'Hydrocodone',
 'Ritalin',
 'Kapvay',
 'Prozac',
 'Vicodin',
 'Falmina',
 'Relafen',
 'Restoril',
 'Frovatriptan',
 'Losartan',
 'Sharobel',
 'Xyrem',
 'Apri',
 'ella',
 'Spiriva',
 'Tasigna',
 'Dupixent',
 'Lorazepam',
 'Cyproheptadine',
 'Repatha',
 'Docusate',
 'Hydrochlorothiazide',
 'Scopolamine',
 'Flurbiprofen',
 'Femara',
 'Methotrexate',
 'Hypercare',
 'Epinephrine',
 'Brimonidine / timolol',
 'Mucinex D',
 'Azor',
 'Voltaren',
 'Vortioxetine',
 'Velivet',
 'Necon 1 / 35',
 'Seroquel XR',
 'Detrol LA',
 'Sporanox',
 'Febuxostat',
 'Eluxadoline',
 'Neurontin',
 'Ondansetron',
 'Gardasil',
 'Avastin',
 'Paliperidone',
 'Nora-Be',
 'Penicillin v potassium',
 'Tofacitinib',
 "St. john's wort",
 'Rythmol SR',
 'Vytorin',
 'Dilaudid',
 'Vilazodone',
 'Librium',
 'Duac',
 'Dapsone',
 'Vicodin ES',
 'Hydromorphone',
 'Zyban',
 'Ranitidine',
 'Arava',
 'Acetaminophen / caffeine',
 'Adderall',
 'Ruconest',
 'Pseudoephedrine / triprolidine',
 'Etodolac',
 'Emsam',
 'Diovan',
 'Ammonium lactate',
 'Desmopressin',
 'Zepatier',
 'Magnesium hydroxide',
 'Butrans',
 'Botox',
 'Azelastine',
 'Xalkori',
 'Zyvox',
 'Rocephin',
 'Kyleena',
 'Toujeo',
 'Dexlansoprazole',
 'Brexpiprazole',
 'Roxicodone Intensol',
 'Reglan',
 'Divalproex sodium',
 'Tindamax',
 'Camrese',
 'MetroGel-Vaginal',
 'Tadalafil',
 'Flecainide',
 'Junel Fe 1.5 / 30',
 'Flagyl',
 'Ovace Plus',
 'Secobarbital',
 'Empagliflozin / linagliptin',
 'Raltegravir',
 'Tavaborole',
 'Ampyra',
 'Celebrex',
 'Colchicine',
 'Geodon',
 'Aluminum chloride hexahydrate',
 'Tryptophan',
 'Myrbetriq',
 'Malarone',
 'Fetzima',
 'Omeprazole',
 'Fluoride',
 'Jardiance',
 'Turmeric',
 'Acetaminophen / butalbital / caffeine / codeine',
 'Desogestrel / ethinyl estradiol',
 'Secukinumab',
 'Ethinyl estradiol / norgestrel',
 'Nifedipine',
 'Celexa',
 'Prednisone',
 'Methocarbamol',
 'Haldol Decanoate',
 'Beyaz',
 'Taclonex',
 'Decadron',
 'Vardenafil',
 'Oxazepam',
 'Dexmethylphenidate',
 'Firmagon',
 'Phenazopyridine',
 'Tiotropium',
 'Savella',
 'Cataflam',
 'Cobicistat / elvitegravir / emtricitabine / tenofovir',
 'Ramipril',
 'Relistor',
 'Paxil',
 'Stelara',
 'Cambia',
 'Ezetimibe',
 'Mefenamic acid',
 'Budesonide / formoterol',
 'Doryx',
 'Dymista',
 'Omalizumab',
 'Conjugated estrogens',
 'Lunesta',
 'Mometasone',
 'Phenylephrine',
 'VESIcare',
 'Kapidex',
 'Errin',
 'Lomotil',
 'Clomid',
 'Clozapine',
 'Olopatadine',
 'Narcan Injection',
 'Mirabegron',
 'Wellbutrin SR',
 'Cyclobenzaprine',
 'Tinidazole',
 'Asenapine',
 'Penicillin VK',
 'Oxymetazoline',
 'EpiCeram',
 'Temazepam',
 'Oxybutynin',
 'Armodafinil',
 'Epclusa',
 'Dalfampridine',
 'OnabotulinumtoxinA',
 'Doxylamine / pyridoxine',
 'Estarylla',
 'Vancomycin',
 'Naproxen / sumatriptan',
 'Fastin',
 'Protonix',
 'Bazedoxifene / conjugated estrogens',
 'Chloral hydrate',
 'Lialda',
 'Maxalt',
 'Denosumab',
 'Boniva',
 'Sklice',
 'Acetazolamide',
 'Clinpro 5000',
 'Zelapar',
 'Desloratadine',
 'Docosanol',
 'Acetaminophen',
 'Chaparral',
 'Hyaluronan',
 'Polyethylene glycol 3350 with electrolytes',
 'Mestranol / norethindrone',
 'Melatonin',
 'Symbicort',
 'Lutera',
 'Emollients',
 'Colesevelam',
 'Zyclara',
 'Aspirin / caffeine',
 'Abreva',
 'Isocarboxazid',
 'Correctol',
 'Plecanatide',
 'Xarelto',
 'Prevacid',
 'Simvastatin',
 'Carbatrol',
 'Dapagliflozin',
 'Tresiba',
 'Oracea',
 'Abilify',
 'Meperidine',
 'Tamoxifen',
 'Harvoni',
 'Imiquimod',
 'Trospium',
 'Limbitrol',
 'Zegerid',
 'Diethylpropion',
 'Limbrel',
 'Verapamil',
 'Premarin',
 'Pentazocine',
 'Apremilast',
 'Ciclesonide',
 'Canagliflozin',
 'Megestrol',
 'LoSeasonique',
 'Otezla',
 'Zenzedi',
 'Phosphorated carbohydrate solution',
 'Influenza virus vaccine, live, trivalent',
 'Deplin',
 'Methylprednisolone',
 'Invega',
 'Cutar',
 'Serzone',
 'Biaxin XL',
 'Coreg',
 'Ortho Cyclen',
 'Lorcet 10 / 650',
 'Letrozole',
 'Cefuroxime',
 'Sectral',
 'Belladonna / opium',
 'Flomax',
 'My Way',
 'Belsomra',
 'Adapalene',
 'Promethazine',
 'Fentanyl Transdermal System',
 'Desoxyn',
 'Tegretol',
 'Latisse',
 'Oseltamivir',
 'Kombiglyze XR',
 'Minoxidil',
 'Enbrel',
 'Adalimumab',
 'Xulane',
 'Elavil',
 'Endocet',
 'Unisom SleepGels',
 'Invokana',
 'Naphazoline',
 'Hydrochlorothiazide / telmisartan',
 'Mycophenolate mofetil',
 'Eucrisa',
 'Biltricide',
 'Bystolic',
 'Ibuprofen / pseudoephedrine',
 'Alesse',
 'Bisoprolol / hydrochlorothiazide',
 'Fexofenadine',
 'Fentora',
 'Guaifenesin',
 'Modafinil',
 'Kadian',
 'Dexamethasone',
 'Atropine / diphenoxylate',
 'Metformin / sitagliptin',
 'Fluorouracil',
 'Clobetasol',
 'Commit',
 'Tri-Lo-Sprintec',
 'Guaifenesin / phenylephrine',
 'Dexbrompheniramine / pseudoephedrine',
 'FreshKote',
 'Racepinephrine',
 'Keflex',
 'Fluticasone',
 'Levemir',
 'Alprostadil',
 'Carbidopa / levodopa',
 'Tranexamic acid',
 'Esomeprazole',
 'Voltaren Gel',
 'Adderall XR',
 'Hydrocortisone',
 'Remeron',
 'Genvoya',
 'Podofilox',
 'Tri-Previfem',
 'Atorvastatin',
 'Carisoprodol',
 'Gildess Fe 1.5 / 30',
 'Viagra',
 'Famotidine / ibuprofen',
 'Selenium sulfide',
 'Aubra',
 'Tocilizumab',
 'Lacosamide',
 'Axiron',
 'Finacea',
 'Hydrocodone / ibuprofen',
 'Vantin',
 'Silenor',
 'Rivaroxaban',
 'Motrin',
 'Cholestyramine',
 'Nor-QD',
 'Ketamine',
 'Flurazepam',
 'Aubagio',
 'Nebivolol',
 'Vandazole',
 'Clopidogrel',
 'Imuran',
 'Avanafil',
 'Robaxin-750',
 'Hydrochlorothiazide / lisinopril',
 'Targiniq ER',
 'Drospirenone / ethinyl estradiol / levomefolate calcium',
 'Breo Ellipta',
 'Zonisamide',
 'Diamox',
 'Warfarin',
 'Amlodipine',
 'Midazolam',
 'Parnate',
 'Next Choice',
 'Gleevec',
 'Movantik',
 'Cabergoline',
 'Opcicon One-Step',
 'Vivelle-Dot',
 'Estrace Vaginal Cream',
 'Trihexyphenidyl',
 'Acetaminophen / propoxyphene',
 'Invega Sustenna',
 'Mephobarbital',
 'Nexium',
 'Insulin glargine',
 'Hydromet',
 'Tenofovir',
 'Chlordiazepoxide',
 'Taltz',
 'Blisovi 24 Fe',
 'Campral',
 'Loratadine / pseudoephedrine',
 'Sodium biphosphate / sodium phosphate',
 'Risedronate',
 'Lenalidomide',
 'Klor-Con',
 'Furosemide',
 'Exemestane',
 'Diclegis',
 'Monodox',
 'Bethanechol',
 'Portia',
 'Eliquis',
 'Prochlorperazine',
 'Diclofenac / misoprostol',
 'Amlodipine / benazepril',
 'Adzenys XR-ODT',
 'Natalizumab',
 'Aldactone',
 'Benadryl',
 'Niaspan',
 'Citric acid / magnesium oxide / sodium picosulfate',
 'Fioricet',
 'Clonidine',
 'Lactulose',
 'Guaifenesin / pseudoephedrine',
 'R-Tanna',
 'Jublia',
 'Atenolol',
 'Solifenacin',
 'Lidoderm',
 'Bismuth subcitrate potassium / metronidazole / tetracycline',
 'Nicotrol Inhaler',
 'Levlen',
 'Asacol',
 'Viorele',
 'Depakote ER',
 'Fortesta',
 'Silvadene',
 'Estring',
 'Duragesic',
 'Xyzal',
 'Artane',
 'Tafinlar',
 'Orencia',
 'Lyza',
 'Tegretol XR',
 'Ceftin',
 'Tambocor',
 'Previfem',
 'Duavee',
 'Microgestin 1 / 20',
 'Ocular lubricant',
 'Vascepa',
 'Estradiol Patch',
 'Protonix IV',
 'HC-Derma-Pax',
 'Welchol',
 'Tirosint',
 'Sandostatin',
 'Patanol',
 'Olmesartan',
 'Xolair',
 'Irbesartan',
 'Lidocaine',
 'Librax',
 'Chlorthalidone',
 'Naprelan',
 'Prolia',
 'Dolutegravir',
 'Cefpodoxime',
 'Clocortolone',
 'Ultram',
 'Afrezza',
 'Tysabri',
 'Lactobacillus acidophilus',
 'Oxaliplatin',
 'Mexiletine',
 'Halcion',
 'Somatropin',
 'Enjuvia',
 'Multivitamin, prenatal',
 'Omnicef',
 'Embeda',
 'Triamcinolone',
 'Uloric',
 'Caffeine',
 'Arthrotec',
 'Sodium hyaluronate',
 'Zarah',
 'Benicar HCT',
 'Hydrochlorothiazide / spironolactone',
 'Sumavel DosePro',
 'Immune globulin subcutaneous',
 'Certolizumab',
 'Aftera',
 'Pemoline',
 'Delsym',
 'Capzasin-HP',
 'Topicort',
 'Carvedilol',
 'Actiq',
 'Phendimetrazine',
 'Mysoline',
 'Acamprosate',
 'Prevnar 13',
 'Doxazosin',
 'Vistaril',
 'Rapaflo',
 'Vitamin B2',
 'Acthar',
 'Memantine',
 'Heather',
 'Apixaban',
 'Patanase',
 'Januvia',
 'Nicotine',
 'S-adenosylmethionine',
 'Zetia',
 'Feldene',
 'Pradaxa',
 'Abacavir / dolutegravir / lamivudine',
 'Synvisc',
 'Dextromethorphan / quinidine',
 'Acetaminophen / tramadol',
 'Tazarotene',
 'Opcon-A',
 'Ketoprofen',
 'Golimumab',
 'Kava',
 'Loestrin Fe 1.5 / 30',
 'Letairis',
 'Coartem',
 'Complera',
 'ProAir HFA',
 'Nasacort',
 'Pramoxine',
 'Clindesse',
 'Cozaar',
 'Multivitamin',
 'Chlorpromazine',
 'Protopic',
 'Migranal',
 'Teriparatide',
 'Dihydroergotamine',
 'Potassium chloride',
 'Chateal',
 'Quasense',
 'Anafranil',
 'Advair Diskus',
 'Nefazodone',
 'Ditropan',
 'Taytulla',
 'Kenalog',
 'Tenormin',
 'Rebif',
 'Copaxone',
 'Generess Fe',
 'Hydroxyurea',
 'Retisert',
 'Alfuzosin',
 'Penciclovir',
 'Betamethasone / calcipotriene',
 'Brisdelle',
 'Dyanavel XR',
 'Allerx Dose Pack DF',
 'Deltasone',
 'Amaryl',
 'Soolantra',
 'Mucinex',
 'Desyrel',
 'Levocetirizine',
 'Cobicistat / elvitegravir / emtricitabine / tenofovir alafenamide',
 'Lopinavir / ritonavir',
 'Byetta',
 'Catapres-TTS',
 'Chlorpheniramine',
 'Sustiva',
 'Zaleplon',
 'Farxiga',
 'Janumet',
 'Lo / Ovral-28',
 'Docetaxel',
 'Lantus Solostar',
 'Oxistat',
 'Selegiline',
 'Sunitinib',
 'Levorphanol',
 'Cheratussin AC',
 'Umeclidinium / vilanterol',
 'Erlotinib',
 'Forteo',
 'Seasonale',
 'Actemra',
 'Urso Forte',
 'Aspirin',
 'Zyprexa Zydis',
 'Cosentyx',
 'MiraLax',
 'Esgic',
 'Rituximab',
 'Fluticasone / vilanterol',
 'Tacrolimus',
 'Supartz',
 'Tenuate',
 'Beclomethasone',
 'Fenofibric acid',
 'Panlor SS',
 'Methylnaltrexone',
 'Benicar',
 'Sofosbuvir / velpatasvir',
 'Methadose',
 'Calcitriol',
 ...]
In [33]:
# How many drugs do we have?
len(df['drugName'].unique().tolist())
Out[33]:
3436
In [34]:
# What is the most popular drug?
df['drugName'].value_counts()
Out[34]:
Levonorgestrel                       3657
Etonogestrel                         3336
Ethinyl estradiol / norethindrone    2850
Nexplanon                            2156
Ethinyl estradiol / norgestimate     2117
                                     ... 
Mellaril                                1
Oxymetholone                            1
Ethchlorvynol                           1
Ginseng                                 1
Meningococcal group B vaccine           1
Name: drugName, Length: 3436, dtype: int64
In [35]:
# What is the most popular drug?
# Top 20 Drugs (Most Popular)
df['drugName'].value_counts().nlargest(20)
Out[35]:
Levonorgestrel                        3657
Etonogestrel                          3336
Ethinyl estradiol / norethindrone     2850
Nexplanon                             2156
Ethinyl estradiol / norgestimate      2117
Ethinyl estradiol / levonorgestrel    1888
Phentermine                           1543
Sertraline                            1360
Escitalopram                          1292
Mirena                                1242
Implanon                              1102
Gabapentin                            1047
Bupropion                             1022
Venlafaxine                           1016
Miconazole                            1000
Citalopram                             995
Medroxyprogesterone                    995
Lexapro                                952
Bupropion / naltrexone                 950
Duloxetine                             934
Name: drugName, dtype: int64
In [36]:
# Top 20 Drugs (Most Popular)
plt.figure(figsize=(20,10))
df['drugName'].value_counts().nlargest(20).plot(kind='bar')
plt.title("Top 20 Most popular drugs based on counts")
plt.show()

Narrative

  • Most of the commonest drugs are hormonal drugs
In [37]:
# Least 20 Drugs (Most Popular)
df['drugName'].value_counts().nsmallest(20)
Out[37]:
Hyosyne                             1
Alimta                              1
Pamabrom                            1
Dallergy                            1
Reyataz                             1
Nor-QD                              1
Citric acid / potassium citrate     1
Doans Pills Extra Strength          1
Streptokinase                       1
Metolazone                          1
Dasetta 7 / 7 / 7                   1
Hexalen                             1
Nitro-Dur                           1
Stalevo 150                         1
Acrivastine / pseudoephedrine       1
Calcium / vitamin d                 1
Rifadin                             1
MVI Adult                           1
Bendroflumethiazide / nadolol       1
Hydroxyamphetamine / tropicamide    1
Name: drugName, dtype: int64
In [38]:
df['drugName'].value_counts().nsmallest(20).plot(kind='bar')
Out[38]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f916705d450>
In [ ]:
### What are the groups/classification of drugs used?
+ suffix or endings
In [39]:
drug_suffix = {"azole":"antifungal (except metronidazole)",
"caine":"anesthetic",
"cillin":"antibiotic(penicillins)",
"mycin":"antibiotic",
"micin":"antibiotic",
"cycline":"antibiotic",
"oxacin":"antibiotic",
"ceph":"antibiotic(cephalosporins)",
"cef":"antibiotic (cephalosporins)",
"dine":"h2 blockers (anti-ulcers)",
"done":"opiod analgesics",
"ide":"oral hypoglycemics",
"lam":"anti-anxiety",
"pam":"anti-anxiety",
"mide":"diuretics",
"zide":"diuretics",
"nium":"neuromuscular blocking agents",
"olol":"beta blockers",
"tidine":"h2 antagonist",
"tropin":"pituitary hormone",
"zosin":"alpha blocker",
"ase":"thrombolytics",
"plase":"thrombolytics",
"azepam":"anti-anziety(benzodiazepine)",
"azine":"antipyschotics (phenothiazine)",
"barbital":"barbiturate",
"dipine":"calcium channel blocker",
"lol":"beta blocker",
"zolam":"cns depressants",
"pril":"ace inhibitor",
"artan":"arb blocker",
"statins":"lipid-lowering drugs",
"parin":"anticoagulants",
"sone":"corticosteroid (prednisone)"}
In [40]:
def classify_drug(drugname):
    for i in drug_suffix.keys():
        if drugname.endswith(i):
            print(True)
            print(drug_suffix[i])
In [41]:
classify_drug('Valsartan')
True
arb blocker
In [43]:
classify_drug('losartan')
True
arb blocker
In [44]:
def classify_drug(drugname):
    for i in drug_suffix.keys():
        if drugname.endswith(i):
            return drug_suffix[i]
In [45]:
classify_drug('valsartan')
Out[45]:
'arb blocker'
In [46]:
df['drug_class'] = df['drugName'].apply(classify_drug)
In [47]:
df[['drugName','drug_class']]
Out[47]:
drugName drug_class
0 Valsartan arb blocker
1 Guanfacine None
2 Lybrel None
3 Ortho Evra None
4 Buprenorphine / naloxone None
161292 Campral None
161293 Metoclopramide oral hypoglycemics
161294 Orencia None
161295 Thyroid desiccated None
161296 Lubiprostone None

161297 rows × 2 columns

In [48]:
# How many Groups of Drugs By Class
df['drug_class'].unique().tolist()
Out[48]:
['arb blocker',
 None,
 'antifungal (except metronidazole)',
 'oral hypoglycemics',
 'opiod analgesics',
 'antibiotic',
 'anti-anxiety',
 'h2 blockers (anti-ulcers)',
 'beta blockers',
 'ace inhibitor',
 'thrombolytics',
 'alpha blocker',
 'corticosteroid (prednisone)',
 'antipyschotics (phenothiazine)',
 'antibiotic(penicillins)',
 'barbiturate',
 'calcium channel blocker',
 'anesthetic',
 'pituitary hormone',
 'antibiotic (cephalosporins)',
 'beta blocker',
 'neuromuscular blocking agents',
 'anticoagulants']
In [50]:
# How many Groups of Drugs By Class
len(df['drug_class'].unique().tolist())
Out[50]:
23
In [51]:
# Which of class of drug  is the most commonest
df['drug_class'].value_counts()
Out[51]:
antifungal (except metronidazole)    4201
opiod analgesics                     3945
oral hypoglycemics                   3555
antibiotic                           3401
anti-anxiety                         2645
h2 blockers (anti-ulcers)            1228
beta blockers                         966
corticosteroid (prednisone)           886
antipyschotics (phenothiazine)        664
arb blocker                           560
ace inhibitor                         432
calcium channel blocker               233
alpha blocker                         153
anesthetic                            129
antibiotic(penicillins)               119
thrombolytics                         116
beta blocker                           97
neuromuscular blocking agents          45
antibiotic (cephalosporins)            29
pituitary hormone                      28
barbiturate                            19
anticoagulants                          9
Name: drug_class, dtype: int64
In [52]:
# Which of class of drug  is the most commonest
plt.figure(figsize=(20,10))
df['drug_class'].value_counts().plot(kind='bar')
plt.title("Distribution of Drugs By Class")
plt.show()

Narrative

  • The most commonest class/group of drugs used is
    • Antifungal
    • Opiod Analgesics(Pain Killers)
    • Oral Hypoglycemics (DM)
    • Antibiotic
In [69]:
# Distribution of Drugs Per Drug Group based on size
drug_groups = df.groupby('drug_class').size()
In [70]:
type(drug_groups)
Out[70]:
pandas.core.series.Series
In [71]:
# Convert to DF
# Method 1
drug_groups.to_frame()
Out[71]:
0
drug_class
ace inhibitor 432
alpha blocker 153
anesthetic 129
anti-anxiety 2645
antibiotic 3401
antibiotic (cephalosporins) 29
antibiotic(penicillins) 119
anticoagulants 9
antifungal (except metronidazole) 4201
antipyschotics (phenothiazine) 664
arb blocker 560
barbiturate 19
beta blocker 97
beta blockers 966
calcium channel blocker 233
corticosteroid (prednisone) 886
h2 blockers (anti-ulcers) 1228
neuromuscular blocking agents 45
opiod analgesics 3945
oral hypoglycemics 3555
pituitary hormone 28
thrombolytics 116
In [73]:
# Convert to DF
# Method 2
drug_groups_df = pd.DataFrame({'drug_class':drug_groups.index,'counts':drug_groups.values})
In [75]:
# Seaborn Plot
plt.figure(figsize=(20,10))
g = sns.barplot(data=drug_groups_df,x='drug_class',y='counts')
plt.show()
In [76]:
# Seaborn Plot
plt.figure(figsize=(20,10))
g = sns.barplot(data=drug_groups_df,x='drug_class',y='counts')
g.set_xticklabels(drug_groups_df['drug_class'].values,rotation=30)
plt.show()
In [77]:
# Seaborn Plot
plt.figure(figsize=(20,10))
g = sns.barplot(data=drug_groups_df,x='drug_class',y='counts')
plt.xticks(rotation=30)
plt.show()
In [ ]:
### Question on Conditions
+ How many conditions are there?
+ Which conditions are the most common?
+ Distribution of conditions and rating
In [54]:
# Number of Conditions
df['condition'].unique()
Out[54]:
array(['Left Ventricular Dysfunction', 'ADHD', 'Birth Control',
       'Opiate Dependence', 'Benign Prostatic Hyperplasia',
       'Emergency Contraception', 'Bipolar Disorde', 'Epilepsy',
       'Migraine Prevention', 'Depression', "Crohn's Disease", 'Cough',
       'Obesity', 'Urinary Tract Infection', 'ibromyalgia',
       'Chronic Myelogenous Leukemia', 'HIV Infection', 'Insomnia',
       'Rheumatoid Arthritis', 'Vaginal Yeast Infection',
       'Chlamydia Infection', 'Hirsutism', 'Panic Disorde', 'Migraine',
       nan, 'Pain', 'Irritable Bowel Syndrome', 'Osteoarthritis',
       'Constipation', 'Bowel Preparation', 'Psychosis', 'Muscle Spasm',
       'Hepatitis C', 'Overactive Bladde', 'Diabetes, Type 2',
       'Asthma, Maintenance', 'Non-Small Cell Lung Cance',
       'Schizophrenia', 'Dysuria', 'Smoking Cessation', 'Anxiety', 'Acne',
       'emale Infertility', 'Constipation, Acute',
       'Constipation, Drug Induced', 'Erectile Dysfunction',
       'Trigeminal Neuralgia', 'Underactive Thyroid', 'Chronic Pain',
       'Atrophic Vaginitis', 'Skin and Structure Infection', 'Tinnitus',
       'Major Depressive Disorde', 'Anxiety and Stress', 'Rosacea',
       'High Blood Pressure',
       '2</span> users found this comment helpful.',
       'Restless Legs Syndrome',
       'Osteolytic Bone Metastases of Solid Tumors', 'Bronchitis',
       'Skin or Soft Tissue Infection', 'Obsessive Compulsive Disorde',
       'Endometriosis', 'Keratoconjunctivitis Sicca', 'Breakthrough Pain',
       'Seizures', 'Neuropathic Pain', 'Sedation', 'Menstrual Disorders',
       'Allergic Rhinitis', 'Anesthesia',
       'Undifferentiated Connective Tissue Disease', 'Diabetes, Type 1',
       'Abnormal Uterine Bleeding', 'Weight Loss',
       'Constipation, Chronic', 'Breast Cancer, Metastatic',
       'Period Pain', 'Helicobacter Pylori Infection',
       'Atrial Fibrillation', 'Uterine Fibroids',
       '4</span> users found this comment helpful.', 'Kidney Infections',
       'Generalized Anxiety Disorde', 'Asthma', 'Postmenopausal Symptoms',
       'High Cholesterol', 'Hypogonadism, Male', 'Hyperthyroidism',
       'Back Pain', 'Anaplastic Oligodendroglioma', "Sjogren's Syndrome",
       'Asthma, acute', 'Hot Flashes',
       '3</span> users found this comment helpful.',
       'Herpes Simplex, Suppression', 'Bacterial Infection', 'Bursitis',
       'Diaper Rash', 'Systemic Mastocytosis', 'Trichotillomania',
       "Hashimoto's disease", 'Eczema', 'Dental Abscess', 'Headache',
       'Hypersomnia', 'Xerostomia', 'Breast Cance', 'Sore Throat',
       "Barrett's Esophagus", 'Pain/Feve', 'Diverticulitis', 'Sinusitis',
       'Polycystic Ovary Syndrome', 'Influenza',
       'Hypothyroidism, After Thyroid Removal', 'Onychomycosis, Toenail',
       'Progesterone Insufficiency',
       '11</span> users found this comment helpful.', 'GERD',
       'Nausea/Vomiting, Postoperative', 'Herpes Simplex',
       'Gastroparesis', 'Gout, Acute', 'Motion Sickness',
       'Multiple Sclerosis', 'Autism', 'Otitis Media',
       'Upper Respiratory Tract Infection', 'Surgical Prophylaxis',
       'Psoriatic Arthritis', 'Erosive Esophagitis',
       'Premature Ventricular Depolarizations', 'Stomach Ulce',
       'Nausea/Vomiting', 'Light Anesthesia',
       'Angina Pectoris Prophylaxis',
       '0</span> users found this comment helpful.', 'Paranoid Disorde',
       'Prostatitis', 'Extrapyramidal Reaction', 'mance Anxiety',
       'Night Terrors', 'High Cholesterol, Familial Heterozygous',
       'Spondyloarthritis', 'Clostridial Infection', 'Dermatomyositis',
       'Bronchiectasis', 'Nasal Congestion', 'Benign Essential Trem',
       'Angina', 'moterol / mometasone)', 'Impetig',
       'Conjunctivitis, Bacterial', 'Post Traumatic Stress Disorde',
       'Alcohol Withdrawal', 'Psoriasis', 'Cold Sores', 'Hyperhidrosis',
       '1</span> users found this comment helpful.',
       'Ankylosing Spondylitis', 'Hemorrhoids',
       '142</span> users found this comment helpful.',
       'Schizoaffective Disorde', 'Not Listed / Othe', 'Rhinitis',
       'Oral Thrush', 'Hyperlipoproteinemia',
       'Neutropenia Associated with Chemotherapy', 'Osteoporosis',
       'Reflex Sympathetic Dystrophy Syndrome', 'Urticaria', 'Narcolepsy',
       'Systemic Lupus Erythematosus', 'Ulcerative Colitis',
       'Adult Human Growth Hormone Deficiency', 'Bacterial Vaginitis',
       'COPD, Maintenance', 'Anorexia', 'TSH Suppression',
       'Breast Cancer, Adjuvant', 'Glaucoma',
       'Cough and Nasal Congestion',
       '8</span> users found this comment helpful.',
       'Inflammatory Conditions', 'Urinary Incontinence', 'Gout',
       'Bladder Infection', 'Human Papillomavirus Prophylaxis',
       'Glioblastoma Multiforme', 'Strep Throat',
       'Bacterial Skin Infection', 'Hereditary Angioedema',
       'Cold Symptoms', 'Labor Pain', 'Dry Skin', 'Diabetes Insipidus',
       'Methicillin-Resistant Staphylococcus Aureus Infection',
       'Borderline Personality Disorde', 'Amenorrhea', 'Pneumonia',
       'Seborrheic Dermatitis', 'Interstitial Cystitis',
       'Malaria Prevention', 'Prevention of Dental Caries',
       'Herbal Supplementation', 'Plaque Psoriasis', "Raynaud's Syndrome",
       "Addison's Disease", 'Prostate Cance', 'Allergies',
       'Opioid-Induced Constipation', 'moterol)',
       '13</span> users found this comment helpful.', 'Diarrhea',
       'Seasonal Allergic Conjunctivitis', 'Opioid Overdose',
       'Spondylolisthesis', 'Shift Work Sleep Disorde',
       'Obstructive Sleep Apnea/Hypopnea Syndrome',
       'Nausea/Vomiting of Pregnancy', 'Mucositis',
       'Ulcerative Colitis, Active', 'Head Lice',
       'Tonsillitis/Pharyngitis', 'Pseudotumor Cerebri',
       "Parkinson's Disease", 'Sciatica', 'Cance',
       'Bacterial Endocarditis Prevention', 'Diarrhea, Chronic',
       'Hypertensive Emergency', 'Keratosis', 'Ovarian Cysts',
       "Behcet's Disease", 'Chronic Idiopathic Constipation',
       'lic Acid Deficiency', 'Chronic Fatigue Syndrome',
       'Basal Cell Carcinoma', 'Cataplexy',
       "Crohn's Disease, Maintenance", 'Diabetic Peripheral Neuropathy',
       'Arrhythmia', 'Primary Ovarian Failure', 'Influenza Prophylaxis',
       'Agitated State', 'Heart Failure', 'atigue', 'Opiate Withdrawal',
       'Endometrial Hyperplasia, Prophylaxis', 'Immunosuppression',
       'Dystonia', 'Alopecia', 'Vulvodynia',
       'Premenstrual Dysphoric Disorde', 'Alcohol Dependence',
       'Myasthenia Gravis', 'Social Anxiety Disorde', 'Atopic Dermatitis',
       'Schistosoma japonicum', 'Sinus Symptoms', 'min / sitagliptin)',
       'Dermatitis', 'Eye Redness', 'Warts', 'Menorrhagia',
       'Seizure Prevention', 'Ophthalmic Surgery', 'Skin Rash',
       'Condylomata Acuminata', 'NSAID-Induced Ulcer Prophylaxis',
       'Tinea Versicol', 'Peripheral Neuropathy', 'Deep Vein Thrombosis',
       '6</span> users found this comment helpful.', 'Heart Attack',
       'Pulmonary Embolism, Recurrent Event', 'Light Sedation',
       'Acute Lymphoblastic Leukemia', 'Hyperprolactinemia',
       'Indigestion', 'Hepatitis B', 'Dysautonomia', 'Status Epilepticus',
       'Postpartum Depression', 'Multiple Myeloma',
       'Prevention of Hypokalemia', 'Edema', 'Urinary Retention',
       'Prevention of Thromboembolism in Atrial Fibrillation',
       'Cluster Headaches', 'Sexual Dysfunction, SSRI Induced',
       'Dermatitis Herpetiformis', 'Temporomandibular Joint Disorde',
       'Burns, External', 'Actinic Keratosis', 'Pharyngitis',
       'Melanoma, Metastatic', 'Atrial Flutte', 'Lyme Disease',
       'Dry Eye Disease', 'Allergic Reactions', 'Hypertriglyceridemia',
       'Pruritus', 'Carcinoid Tum', 'Muscle Pain', 'Colorectal Cance',
       'Vitamin/Mineral Supplementation during Pregnancy/Lactation',
       'Nausea/Vomiting, Chemotherapy Induced', 'Women (oxybutynin)',
       'Primary Immunodeficiency Syndrome',
       'New Daily Persistent Headache',
       'Pneumococcal Disease Prophylaxis', 'Burning Mouth Syndrome',
       'Urinary Tract Stones', 'Pseudobulbar Affect',
       '94</span> users found this comment helpful.',
       'Eye Redness/Itching', 'Deep Vein Thrombosis, First Event',
       'Pulmonary Hypertension', 'Malaria', 'Sarcoidosis',
       'Dietary Supplementation', 'Bulimia', 'Tendonitis', 'Nasal Polyps',
       'Hypokalemia', 'Anemia, Sickle Cell', 'Uveitis',
       'Streptococcal Infection', 'Perimenopausal Symptoms',
       'Asperger Syndrome', 'Tinea Corporis', 'Mania',
       'Renal Cell Carcinoma', 'COPD', 'Biliary Cirrhosis', 'Vertig',
       'Reversal of Opioid Sedation', "Non-Hodgkin's Lymphoma",
       'High Cholesterol, Familial Homozygous',
       'Periodic Limb Movement Disorde', 'Supraventricular Tachycardia',
       'Hypoestrogenism', 'Juvenile Idiopathic Arthritis', 'Swine Flu',
       'Giardiasis', 'Binge Eating Disorde', "Tourette's Syndrome",
       'Trichomoniasis', 'acial Wrinkles',
       '28</span> users found this comment helpful.',
       'Pulmonary Embolism', 'Conjunctivitis, Allergic',
       'Avian Influenza', '16</span> users found this comment helpful.',
       'Hemangioma', 'Nocturnal Leg Cramps', 'Thyroid Suppression Test',
       'Muscle Twitching', 'Pupillary Dilation',
       'Lennox-Gastaut Syndrome', 'Opiate Adjunct', 'Postoperative Pain',
       'Candida Urinary Tract Infection', 'Cerebral Spasticity',
       'Lipodystrophy', 'Androgenetic Alopecia', 'Computed Tomography',
       'Mitral Valve Prolapse', 'Vitamin D Deficiency',
       'Glaucoma, Open Angle', 'Endoscopy or Radiology Premedication',
       "Alzheimer's Disease", 'Gouty Arthritis',
       'Paroxysmal Supraventricular Tachycardia',
       'Deep Vein Thrombosis, Prophylaxis', 'Gaucher Disease',
       'Lymphocytic Colitis', 'Pancreatic Cance', 'Cystic Fibrosis',
       'Noninfectious Colitis',
       '27</span> users found this comment helpful.', 'Nephrocalcinosis',
       'Iron Deficiency Anemia', 'mulation) (phenylephrine)', 'Hiccups',
       '75</span> users found this comment helpful.',
       'Bronchospasm Prophylaxis', 'Chronic Spasticity',
       'min / saxagliptin)', 'Post-Cholecystectomy Diarrhea',
       'Postherpetic Neuralgia', 'Insomnia, Stimulant-Associated',
       'COPD, Acute', 'Herpes Simplex Dendritic Keratitis',
       'Oophorectomy', 'Cyclic Vomiting Syndrome',
       'Chronic Lymphocytic Leukemia', 'Lyme Disease, Arthritis',
       'Pseudomembranous Colitis', 'Conjunctivitis',
       '15</span> users found this comment helpful.', 'min)',
       'Intraocular Hypertension', 'Aphthous Ulce',
       'Ulcerative Colitis, Maintenance', 'Melasma',
       'Lyme Disease, Neurologic', 'ge (amlodipine / valsartan)',
       'Herpes Zoste', '12</span> users found this comment helpful.',
       'Cervical Dystonia', 'Labor Induction', 'Human Papilloma Virus',
       'Chronic Pancreatitis', 'Polycythemia Vera',
       '9</span> users found this comment helpful.',
       'Dermatological Disorders', 'Lewy Body Dementia',
       'amilial Mediterranean Feve', 'Neurosurgery', 'Gastroenteritis',
       'Macular Edema', 'Tinea Pedis',
       '7</span> users found this comment helpful.',
       'Diagnosis and Investigation',
       '35</span> users found this comment helpful.', 'Gas', 'Neuralgia',
       'Local Anesthesia', '54</span> users found this comment helpful.',
       'Acute Coronary Syndrome', 'Aspiration Pneumonia',
       'Idiopathic Thrombocytopenic Purpura', 'Onychomycosis, Fingernail',
       'Photoaging of the Skin', 'Premature Lab', 'Precocious Puberty',
       'Prevention of Bladder infection', 'Seasonal Affective Disorde',
       'Diabetic Kidney Disease', "Crohn's Disease, Acute",
       'Insulin Resistance Syndrome', 'Pudendal Neuralgia',
       "Reiter's Syndrome", '17</span> users found this comment helpful.',
       'Amyotrophic Lateral Sclerosis', 'Body Dysmorphic Disorde',
       'Prosthetic Heart Valves, Mechanical Valves - Thrombosis Prophylaxis',
       'Dandruff', 'Vitamin B12 Deficiency', 'Bone infection',
       'Prosthetic Heart Valves, Tissue Valves - Thrombosis Prophylaxis',
       'Iritis', 'Allergic Urticaria', 'Cardiovascular Risk Reduction',
       'Giant Cell Tumor of Bone', 'Babesiosis',
       'Secondary Hyperparathyroidism', 'Hypoparathyroidism',
       'Performance Anxiety', 'Abortion', 'Skin Cance',
       'Ovulation Induction', 'Liver Magnetic Resonance Imaging',
       'Vitamin/Mineral Supplementation and Deficiency',
       '79</span> users found this comment helpful.',
       'Herpes Simplex, Mucocutaneous/Immunocompetent Host',
       '10</span> users found this comment helpful.',
       'Anemia Associated with Chronic Renal Failure',
       'Hyperphosphatemia of Renal Failure',
       'Dissociative Identity Disorde', 'Anal Fissure and Fistula',
       '14</span> users found this comment helpful.',
       'Herpes Simplex, Mucocutaneous/Immunocompromised Host', 'Scabies',
       '5</span> users found this comment helpful.', 'Endometrial Cance',
       'Transient Ischemic Attack', 'Granuloma Annulare',
       "Traveler's Diarrhea", 'Candidemia',
       't Pac with Cyclobenzaprine (cyclobenzaprine)',
       'Hypoactive Sexual Desire Disorde', 'Epicondylitis, Tennis Elbow',
       'Nightmares', 'Dientamoeba fragilis', 'Ventricular Tachycardia',
       'Dumping Syndrome', 'Myelodysplastic Syndrome', 'Hypodermoclysis',
       'zen Shoulde', 'Topical Disinfection', 'Perioral Dermatitis',
       'Agitation', 'Intermittent Claudication',
       'Prevention of Osteoporosis', 'Leukemia', 'Dermatitis Herpeti',
       'mis', 'Eosinophilic Esophagitis',
       'Hyperlipoproteinemia Type IIa, Elevated LDL',
       'Endometrial Hyperplasia',
       '19</span> users found this comment helpful.', 'Peptic Ulce',
       'Chronic Myofascial Pain', 'Enterocolitis',
       'Secondary Cutaneous Bacterial Infections', 'Syringomyelia',
       'Postoperative Ocular Inflammation',
       'Persistent Depressive Disorde', 'Otitis Externa',
       'Organ Transplant, Rejection Prophylaxis',
       'Intermittent Explosive Disorde', 'Dermatophytosis',
       'Inflammatory Bowel Disease', 'Porphyria', 'Anemia',
       'Hyperuricemia Secondary to Chemotherapy',
       'Wolff-Parkinson-White Syndrome', 'eve', 'Ectopic Pregnancy',
       'Thyroid Cance', 'Tuberculosis, Latent',
       'Nasal Carriage of Staphylococcus aureus', 'Systemic Candidiasis',
       'Ear Wax Impaction', 'Hepatocellular Carcinoma', 'Dyspareunia',
       '41</span> users found this comment helpful.', 'Tic Disorde',
       'Head and Neck Cance', 'Klinefelter Syndrome', 'Rhinorrhea',
       'Soft Tissue Sarcoma', 'Diabetic Macular Edema',
       'Menopausal Disorders', 'Anesthetic Adjunct', 'Tinea Cruris',
       'tic (mycophenolic acid)', 'Ischemic Stroke', 'Malignant Glioma',
       'Thrombocythemia', 'Atrophic Urethritis', 'Systemic Sclerosis',
       'Macular Degeneration', 'AIDS Related Wasting', 'Hemophilia A',
       'Osteolytic Bone Lesions of Multiple Myeloma',
       'Autoimmune Hemolytic Anemia', 'ailure to Thrive',
       'Strongyloidiasis', 'Vitamin K Deficiency', 'Ulcerative Proctitis',
       'Premenstrual Syndrome',
       '23</span> users found this comment helpful.',
       'Primary Hyperaldosteronism', 'Lactose Intolerance',
       'Anal Itching', 'amilial Cold Autoinflammatory Syndrome',
       'Duodenal Ulce', 'Tuberculosis, Prophylaxis', 'Neurosis',
       "Turner's Syndrome", 'NSAID-Induced Gastric Ulce',
       'CNS Magnetic Resonance Imaging', 'Atherosclerosis',
       'Deep Vein Thrombosis Prophylaxis after Hip Replacement Surgery',
       'Gastritis/Duodenitis', 'Diarrhea, Acute', 'Costochondritis',
       'Portal Hypertension', 'Glaucoma/Intraocular Hypertension',
       'Toothache', 'Benzodiazepine Withdrawal', 'm Pain Disorde',
       'Esophageal Candidiasis',
       'Deep Vein Thrombosis Prophylaxis after Knee Replacement Surgery',
       'Peripheral Arterial Disease',
       'Deep Vein Thrombosis, Recurrent Event', 'Pseudogout, Prophylaxis',
       'Lichen Planus', 'CMV Prophylaxis',
       '64</span> users found this comment helpful.', 'Neuritis',
       'Typhoid Feve', 'Tardive Dyskinesia', 'Ichthyosis',
       'Juvenile Rheumatoid Arthritis', 'B12 Nutritional Deficiency',
       '18</span> users found this comment helpful.',
       'Primary Nocturnal Enuresis',
       '146</span> users found this comment helpful.', "Dercum's Disease",
       'Cutaneous Candidiasis', 'Gingivitis', 'Q Feve', 'Hyperekplexia',
       '44</span> users found this comment helpful.', 'Niacin Deficiency',
       'Dietary Fiber Supplementation', 'Nephrotic Syndrome',
       'Pinworm Infection (Enterobius vermicularis)',
       'Pancreatic Exocrine Dysfunction',
       'Nausea/Vomiting, Radiation Induced', 'Schilling Test',
       'Mild Cognitive Impairment', 'Ischemic Stroke, Prophylaxis',
       '20</span> users found this comment helpful.',
       'Gonococcal Infection, Uncomplicated', 'Ovarian Cance',
       'Eyelash Hypotrichosis', "Meniere's Disease", 'Tinea Capitis',
       '21</span> users found this comment helpful.', 'Lichen Sclerosus',
       'min / pioglitazone)', 'Renal Transplant', 'Gout, Prophylaxis',
       "von Willebrand's Disease",
       'Prevention of Atherothrombotic Events', 'Small Fiber Neuropathy',
       '110</span> users found this comment helpful.',
       'min / rosiglitazone)', "Peyronie's Disease",
       'Autoimmune Hepatitis', 'llicular Lymphoma',
       'Auditory Processing Disorde', 'Herpes Zoster, Prophylaxis',
       'Submental Fullness', 'Lactation Augmentation',
       'Radionuclide Myocardial Perfusion Study',
       'Prevention of Cardiovascular Disease', 'Varicella-Zoste',
       'Pelvic Inflammatory Disease', 'Intraabdominal Infection', 'Croup',
       '85</span> users found this comment helpful.',
       'Dermatologic Lesion',
       'Hyperlipoproteinemia Type IV, Elevated VLDL', 'Expectoration',
       'Primary Hyperaldosteronism Diagnosis', 'Abdominal Distension',
       'Salivary Gland Cance', 'Pulmonary Embolism, First Event',
       'Postpartum Breast Pain',
       'Postural Orthostatic Tachycardia Syndrome',
       '46</span> users found this comment helpful.',
       'Pediatric Growth Hormone Deficiency', 'Hypomagnesemia',
       'ge HCT (amlodipine / hydrochlorothiazide / valsartan)',
       'Hairy Cell Leukemia', 'Histoplasmosis', 'Hypoglycemia',
       '31</span> users found this comment helpful.', 'Brain Tum',
       'Gastrointestinal Stromal Tum', 'Tetanus',
       'Breast Cancer, Prevention', 'ICU Agitation', 'Women (minoxidil)',
       'Peripheral T-cell Lymphoma',
       'Chronic Inflammatory Demyelinating Polyradiculoneuropathy',
       'Pathological Hypersecretory Conditions',
       'Oral and Dental Conditions', 'Antiphospholipid Syndrome',
       'Ventricular Arrhythmia', 'Asystole', "Wegener's Granulomatosis",
       'Thromboembolic Stroke Prophylaxis',
       'Platelet Aggregation Inhibition', 'Sleep Paralysis',
       'Rejection Prophylaxis', 'Delayed Puberty, Male', 'Ascariasis',
       '25</span> users found this comment helpful.',
       'Acute Promyelocytic Leukemia',
       '32</span> users found this comment helpful.', 'Bartonellosis',
       'Cyclothymic Disorde', 'Hypokalemic Periodic Paralysis',
       'Varicose Veins', 'Mononucleosis', 'Cachexia', 'Hyperkalemia',
       "Still's Disease", '48</span> users found this comment helpful.',
       'Dementia', 'Ocular Rosacea', 'Hidradenitis Suppurativa', 'SIADH',
       'Bullous Pemphigoid', 'Angioedema',
       'Mountain Sickness / Altitude Sickness',
       'Severe Mood Dysregulation', 'Cutaneous T-cell Lymphoma',
       'Adrenocortical Insufficiency', 'Myxedema Coma',
       'Small Bowel Bacterial Overgrowth', 'Sunburn',
       '33</span> users found this comment helpful.',
       'Transverse Myelitis', 'Squamous Cell Carcinoma', 'Parkinsonism',
       '22</span> users found this comment helpful.', 'Thyrotoxicosis',
       '29</span> users found this comment helpful.',
       '30</span> users found this comment helpful.',
       'Epididymitis, Sexually Transmitted', 'Neck Pain',
       'Bleeding Disorde', '63</span> users found this comment helpful.',
       'actor IX Deficiency', 'Melanoma', 'Thrombocytopenia',
       'Esophageal Variceal Hemorrhage Prophylaxis', 'Glioblastoma Multi',
       'Cholera', 'Anorexia/Feeding Problems',
       '45</span> users found this comment helpful.', 'Peritonitis',
       'AV Heart Block', 'Pe', "Wilson's Disease",
       'Nonalcoholic Fatty Liver Disease',
       '34</span> users found this comment helpful.', 'Sepsis', 'Anthrax',
       'Body Imaging', 'Aggressive Behavi', 'Hepatic Tum', 'Ehrlichiosis',
       'Hypopituitarism', 'Gender Dysphoria', 'Infectious Diarrhea',
       'Ventricular Fibrillation', 'Anaphylaxis', 'Pemphigus',
       'Multiple Endocrine Adenomas', 'Pre-Exposure Prophylaxis',
       'Postoperative Increased Intraocular Pressure',
       'Pruritus of Partial Biliary Obstruction', 'Pertussis',
       'Periodontitis', 'Lymphoma', 'Hypercalcemia of Malignancy',
       'Pityriasis rubra pilaris', 'Amebiasis', 't Care',
       'Hepatic Encephalopathy',
       '55</span> users found this comment helpful.',
       'Deep Neck Infection', 'Meningitis, Meningococcal',
       'Parkinsonian Trem', 'Rabies Prophylaxis',
       '39</span> users found this comment helpful.', 'Hypotension',
       'Myelofibrosis', '98</span> users found this comment helpful.',
       'cal Segmental Glomerulosclerosis',
       'Gastric Ulcer Maintenance Treatment', "Paget's Disease",
       'Infection Prophylaxis', 'Gastrointestinal Decontamination',
       'Mixed Connective Tissue Disease',
       '24</span> users found this comment helpful.',
       'Somatoform Pain Disorde', 'Esophageal Spasm',
       'Campylobacter Gastroenteritis', 'Hyperphosphatemia',
       'Oligospermia', 'Wound Cleansing', 'Euvolemic Hyponatremia',
       'Gallbladder Disease',
       '84</span> users found this comment helpful.',
       'Mycobacterium avium-intracellulare, Treatment',
       'Oppositional Defiant Disorde', 'Legionella Pneumonia',
       'Breast Cancer, Palliative', 'Hydrocephalus',
       'Hyperlipoproteinemia Type III, Elevated beta-VLDL   IDL',
       '36</span> users found this comment helpful.',
       'Anaplastic Astrocytoma', "Dupuytren's contracture",
       '40</span> users found this comment helpful.', 'Mumps Prophylaxis',
       'Skin Disinfection, Preoperative', 'Hyperbilirubinemia',
       'Meningitis', 'Corneal Ulce', 'acial Lipoatrophy',
       '43</span> users found this comment helpful.',
       'Percutaneous Coronary Intervention', 'Hepatitis B Prevention',
       'Tuberculosis, Active', 'Cerebrovascular Insufficiency',
       'Head Injury', 'Anti NMDA Receptor Encephalitis',
       'Nonoccupational Exposure',
       '72</span> users found this comment helpful.',
       'Gonadotropin Inhibition', 'unctional Gastric Disorde',
       'Chronic Eosinophilic Leukemia', 'Acetaminophen Overdose',
       'Duodenal Ulcer Prophylaxis', 'Paragonimus westermani, Lung Fluke',
       'Alpha-1 Proteinase Inhibitor Deficiency', "Cogan's Syndrome",
       'Uterine Bleeding', 'Stomach Cance', 'Sporotrichosis',
       'Cluster-Tic Syndrome', 'Gestational Diabetes',
       'Stress Ulcer Prophylaxis',
       'Reversal of Nondepolarizing Muscle Relaxants', 'Solid Tumors',
       'mist (', 'Schnitzler Syndrome', 'Hypocalcemia',
       '26</span> users found this comment helpful.',
       'Meningococcal Meningitis Prophylaxis', 'Nocardiosis',
       'Hemophilia B', '42</span> users found this comment helpful.',
       'Microscopic polyangiitis', 'Gonococcal Infection, Disseminated',
       'Neurotic Depression', 'Keratitis',
       '99</span> users found this comment helpful.',
       "Hodgkin's Lymphoma", 'me', 'STD Prophylaxis',
       '123</span> users found this comment helpful.',
       'Small Bowel or Pancreatic Fistula',
       'Prevention of Perinatal Group B Streptococcal Disease',
       '74</span> users found this comment helpful.', 'Cerebral Edema',
       'Testicular Cance', 'Short Stature for Age',
       '47</span> users found this comment helpful.',
       'Aspergillosis, Aspergilloma', 'Pemphigoid',
       'Hyperparathyroidism Secondary to Renal Impairment',
       '76</span> users found this comment helpful.',
       'Ramsay Hunt Syndrome', 'Cutaneous Larva Migrans',
       'Occipital Neuralgia', 'Blepharitis', 'Patent Ductus Arteriosus',
       'Joint Infection', '77</span> users found this comment helpful.',
       'Manscaping Pain', 'Strabismus',
       'Organ Transplant, Rejection Reversal',
       'Leukocytoclastic Vasculitis', 'Coronary Artery Disease',
       'Gastric Cance', 'ibrocystic Breast Disease',
       '121</span> users found this comment helpful.',
       'ungal Infection Prophylaxis', 'Short Stature', 'Hypercalcemia',
       'Coccidioidomycosis', 'Cyclitis', 'Anemia, Chemotherapy Induced',
       'Upper Limb Spasticity',
       '95</span> users found this comment helpful.',
       '61</span> users found this comment helpful.',
       'Diagnostic Bronchograms', 'Neoplastic Diseases',
       '51</span> users found this comment helpful.',
       'Mycoplasma Pneumonia', 'Linear IgA Disease',
       'Subarachnoid Hemorrhage', 'Myeloproliferative Disorders',
       'ungal Pneumonia', '145</span> users found this comment helpful.',
       'Scleroderma', 'Zollinger-Ellison Syndrome', 'Tinea Barbae',
       'Acute Nonlymphocytic Leukemia',
       '62</span> users found this comment helpful.',
       '92</span> users found this comment helpful.', 'Neutropenia'],
      dtype=object)
In [55]:
len(df['condition'].unique().tolist())
Out[55]:
885

Narrative

  • We have 885 different conditions
In [56]:
#### Distribution of Conditions
df['condition'].value_counts()
Out[56]:
Birth Control                                   28788
Depression                                       9069
Pain                                             6145
Anxiety                                          5904
Acne                                             5588
                                                ...  
Gonadotropin Inhibition                             1
Anti NMDA Receptor Encephalitis                     1
Aspergillosis, Aspergilloma                         1
40</span> users found this comment helpful.         1
121</span> users found this comment helpful.        1
Name: condition, Length: 884, dtype: int64
In [57]:
#### Most commonest Conditions
df['condition'].value_counts().nlargest(20)
Out[57]:
Birth Control                28788
Depression                    9069
Pain                          6145
Anxiety                       5904
Acne                          5588
Bipolar Disorde               4224
Insomnia                      3673
Weight Loss                   3609
Obesity                       3568
ADHD                          3383
Diabetes, Type 2              2554
Emergency Contraception       2463
High Blood Pressure           2321
Vaginal Yeast Infection       2274
Abnormal Uterine Bleeding     2096
Bowel Preparation             1859
ibromyalgia                   1791
Smoking Cessation             1780
Migraine                      1694
Anxiety and Stress            1663
Name: condition, dtype: int64
In [58]:
#### Most commonest Conditions
df['condition'].value_counts().nlargest(20).plot(kind='bar',figsize=(20,10))
Out[58]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9167cd1cd0>

Narrative

  • The most commonest condition is Birth Control,followed by Depression and Pain and Anxiety
  • Makes sense as compared to the drug distribution
In [60]:
df['condition'].value_counts().nsmallest(20)
Out[60]:
Hemophilia B                                   1
Legionella Pneumonia                           1
Upper Limb Spasticity                          1
ungal Infection Prophylaxis                    1
Dercum's Disease                               1
Stomach Cance                                  1
Ventricular Arrhythmia                         1
Corneal Ulce                                   1
Pemphigoid                                     1
34</span> users found this comment helpful.    1
Bartonellosis                                  1
Thyrotoxicosis                                 1
77</span> users found this comment helpful.    1
Strongyloidiasis                               1
Hemangioma                                     1
64</span> users found this comment helpful.    1
Epicondylitis, Tennis Elbow                    1
Esophageal Spasm                               1
Cerebrovascular Insufficiency                  1
Ramsay Hunt Syndrome                           1
Name: condition, dtype: int64
In [59]:
#### Least commonest Conditions
df['condition'].value_counts().nsmallest(20).plot(kind='bar',figsize=(20,10))
Out[59]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9166fab0d0>

Questions on Drugs and Conditions

  • How many drugs per condition
In [63]:
# How many Drugs per condition (Top 20)
df.groupby('condition')['drugName'].nunique().nlargest(20)
Out[63]:
condition
Not Listed / Othe                             214
Pain                                          200
Birth Control                                 172
High Blood Pressure                           140
Acne                                          117
Depression                                    105
Rheumatoid Arthritis                           98
Diabetes, Type 2                               89
Allergic Rhinitis                              88
Bipolar Disorde                                80
Osteoarthritis                                 80
Anxiety                                        78
Insomnia                                       78
Abnormal Uterine Bleeding                      74
Migraine                                       59
Psoriasis                                      58
3</span> users found this comment helpful.     57
Endometriosis                                  57
ADHD                                           55
Asthma, Maintenance                            54
Name: drugName, dtype: int64
In [66]:
# How many Drugs per condition (Top 20)
plt.figure(figsize=(15,10))
df.groupby('condition')['drugName'].nunique().nlargest(20).plot(kind='bar')
plt.title("Number of Drugs Per Condition")
plt.grid()
plt.show()

Narrative

  • Pain,Birth Control and HBP have the highest number of different/unique drugs for their condition
In [ ]:
#### Questions on Rating
+ Distribution of rating
+ Average Rating Per Count
In [78]:
df['rating']
Out[78]:
0          9.0
1          8.0
2          5.0
3          8.0
4          9.0
          ... 
161292    10.0
161293     1.0
161294     2.0
161295    10.0
161296     9.0
Name: rating, Length: 161297, dtype: float64
In [79]:
# Distrubtion of Rating By Size
df.groupby('rating').size()
Out[79]:
rating
1.0     21619
2.0      6931
3.0      6513
4.0      5012
5.0      8013
6.0      6343
7.0      9456
8.0     18890
9.0     27531
10.0    50989
dtype: int64
In [80]:
# Distrubtion of Rating By Size
df.groupby('rating').size().plot(kind='bar')
Out[80]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f91641b1890>
In [81]:
# # Distrubtion of Rating By Size Using Histogram
plt.figure(figsize=(20,10))
df['rating'].hist()
plt.title("Distrubtion of Rating By Size Using Histogram")
plt.show()

Narative

  • Most people rated at the extremes
In [83]:
# Average Rating of Drugs
avg_rating = (df['rating'].groupby(df['drugName']).mean())
In [84]:
avg_rating
Out[84]:
drugName
A + D Cracked Skin Relief               10.000000
A / B Otic                              10.000000
Abacavir / dolutegravir / lamivudine     8.211538
Abacavir / lamivudine / zidovudine       9.000000
Abatacept                                7.157895
                                          ...    
Zyvox                                    9.000000
ZzzQuil                                  2.500000
depo-subQ provera 104                    1.000000
ella                                     6.980392
femhrt                                   4.000000
Name: rating, Length: 3436, dtype: float64
In [86]:
# Average Rating For All Drugs
plt.figure(figsize=(20,10))
avg_rating.hist()
plt.title("Distrubtion of Average Rating For All Drugs")
plt.show()
In [92]:
# Average Rating of Drugs By Class
avg_rating_per_drug_class = (df['rating'].groupby(df['drug_class']).mean())
In [93]:
avg_rating_per_drug_class
Out[93]:
drug_class
ace inhibitor                        5.759259
alpha blocker                        6.954248
anesthetic                           5.937984
anti-anxiety                         8.543667
antibiotic                           6.500735
antibiotic (cephalosporins)          6.344828
antibiotic(penicillins)              7.033613
anticoagulants                       9.222222
antifungal (except metronidazole)    5.580100
antipyschotics (phenothiazine)       7.146084
arb blocker                          6.464286
barbiturate                          8.894737
beta blocker                         6.587629
beta blockers                        7.681159
calcium channel blocker              5.725322
corticosteroid (prednisone)          7.477427
h2 blockers (anti-ulcers)            7.280945
neuromuscular blocking agents        8.622222
opiod analgesics                     7.446388
oral hypoglycemics                   7.268917
pituitary hormone                    8.500000
thrombolytics                        7.103448
Name: rating, dtype: float64
In [94]:
# Average Rating For All Drugs
plt.figure(figsize=(20,10))
avg_rating_per_drug_class.hist()
plt.title("Distrubtion of Average Rating For Drug Classes")
plt.show()
In [96]:
# Which Group of Drugs have the higest mean/average rating
avg_rating_per_drug_class.nlargest(20)
Out[96]:
drug_class
anticoagulants                    9.222222
barbiturate                       8.894737
neuromuscular blocking agents     8.622222
anti-anxiety                      8.543667
pituitary hormone                 8.500000
beta blockers                     7.681159
corticosteroid (prednisone)       7.477427
opiod analgesics                  7.446388
h2 blockers (anti-ulcers)         7.280945
oral hypoglycemics                7.268917
antipyschotics (phenothiazine)    7.146084
thrombolytics                     7.103448
antibiotic(penicillins)           7.033613
alpha blocker                     6.954248
beta blocker                      6.587629
antibiotic                        6.500735
arb blocker                       6.464286
antibiotic (cephalosporins)       6.344828
anesthetic                        5.937984
ace inhibitor                     5.759259
Name: rating, dtype: float64
In [97]:
# Which Drugs have the higest mean/average rating
avg_rating.nlargest(20)
Out[97]:
drugName
A + D Cracked Skin Relief                              10.0
A / B Otic                                             10.0
Absorbine Jr.                                          10.0
Accolate                                               10.0
Acetaminophen / caffeine / magnesium salicylate        10.0
Acetaminophen / dextromethorphan / doxylamine          10.0
Acetaminophen / phenylephrine                          10.0
Acetaminophen / pseudoephedrine                        10.0
Acetic acid / antipyrine / benzocaine / polycosanol    10.0
Acrivastine / pseudoephedrine                          10.0
Acyclovir / hydrocortisone                             10.0
Advil Cold and Sinus Liqui-Gels                        10.0
Aerobid-M                                              10.0
Afrin 4 Hour Extra Moisturizing                        10.0
Ala-Quin                                               10.0
Alavert                                                10.0
Aldactazide                                            10.0
Alefacept                                              10.0
Alka-Seltzer Cold and Sinus                            10.0
Allegra ODT                                            10.0
Name: rating, dtype: float64
In [98]:
df.columns
Out[98]:
Index(['Unnamed: 0', 'drugName', 'condition', 'review', 'rating', 'date',
       'usefulCount', 'drug_class'],
      dtype='object')
In [ ]:
### Question on Review
+ How genuine is the review? (Using sentiment analysis)
+ How many reviews are positive,negative,neutral?
+ Correlation between rating and review and users who found the review useful
+ Distribution of rating
+ Amount of review made per year and per month
+ Which condition has the most review on drugs
+ Can you predict the rating using the review?
In [99]:
# How genuine is the review? (Using sentiment analysis)
from textblob import TextBlob
In [100]:
df['review']
Out[100]:
0         "It has no side effect, I take it in combinati...
1         "My son is halfway through his fourth week of ...
2         "I used to take another oral contraceptive, wh...
3         "This is my first time using any form of birth...
4         "Suboxone has completely turned my life around...
                                ...                        
161292    "I wrote my first report in Mid-October of 201...
161293    "I was given this in IV before surgey. I immed...
161294    "Limited improvement after 4 months, developed...
161295    "I&#039;ve been on thyroid medication 49 years...
161296    "I&#039;ve had chronic constipation all my adu...
Name: review, Length: 161297, dtype: object
In [101]:
def get_sentiment(text):
    blob = TextBlob(text)
    return blob.polarity

def get_sentiment_label(text):
    blob = TextBlob(text)
    if blob.polarity > 0:
        result = 'positive'
    elif blob.polarity < 0:
        result = 'negative'
    else:
        result = 'neutral'
    return result
In [102]:
# text fxn
get_sentiment("I love apples")
Out[102]:
0.5
In [104]:
# text fxn
get_sentiment_label("I love apples")
Out[104]:
'positive'
In [105]:
# Sentiment Score for Review
df['sentiment'] = df['review'].apply(get_sentiment)
In [106]:
# Sentiment Labels for Review
df['sentiment_label'] = df['review'].apply(get_sentiment_label)
In [107]:
df[['review','sentiment','sentiment_label']]
Out[107]:
review sentiment sentiment_label
0 “It has no side effect, I take it in combinati… 0.000000 neutral
1 “My son is halfway through his fourth week of … 0.168333 positive
2 “I used to take another oral contraceptive, wh… 0.067210 positive
3 “This is my first time using any form of birth… 0.179545 positive
4 “Suboxone has completely turned my life around… 0.194444 positive
161292 “I wrote my first report in Mid-October of 201… 0.262917 positive
161293 “I was given this in IV before surgey. I immed… -0.276389 negative
161294 “Limited improvement after 4 months, developed… -0.223810 negative
161295 “I&#039;ve been on thyroid medication 49 years… 0.212597 positive
161296 “I&#039;ve had chronic constipation all my adu… 0.085417 positive

161297 rows × 3 columns

In [109]:
# How many positive and negative and neutral reviews?
df['sentiment_label'].value_counts()
Out[109]:
positive    101041
negative     53303
neutral       6953
Name: sentiment_label, dtype: int64
In [110]:
# How many positive and negative and neutral reviews?
df['sentiment_label'].value_counts().plot(kind='bar')
Out[110]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9163e91c10>
In [111]:
#### Correlation Between Our sentiment and rating
sns.lineplot(data=df,x='rating',y='sentiment')
plt.show()

Narrative

  • The rating increases with increase in sentiment
In [112]:
# Correlation  btween rating and sentiment
sns.lineplot(data=df,x='rating',y='sentiment',hue='sentiment_label')
Out[112]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9163e1ac10>
In [ ]:
# How many reviews are genuine as compared to the rating
+ genuine good rating =positive + rating 10-6
+ genuine bad rating = negative + rating 4-1
In [119]:
# Genuine Good  Rating Per Review
good_review =  df[(df['rating'] >= 6) & (df['sentiment_label'] == 'positive')]
In [117]:
# Genuine Bad  Rating Per Review
bad_review = df[(df['rating'] <= 4) & (df['sentiment_label'] == 'negative')]
In [120]:
good_review.head()
Out[120]:
Unnamed: 0 drugName condition review rating date usefulCount drug_class sentiment sentiment_label
1 95260 Guanfacine ADHD “My son is halfway through his fourth week of … 8.0 April 27, 2010 192 None 0.168333 positive
3 138000 Ortho Evra Birth Control “This is my first time using any form of birth… 8.0 November 3, 2015 10 None 0.179545 positive
4 35696 Buprenorphine / naloxone Opiate Dependence “Suboxone has completely turned my life around… 9.0 November 27, 2016 37 None 0.194444 positive
7 102654 Aripiprazole Bipolar Disorde “Abilify changed my life. There is hope. I was… 10.0 March 14, 2015 32 antifungal (except metronidazole) 0.074107 positive
9 48928 Ethinyl estradiol / levonorgestrel Birth Control “I had been on the pill for many years. When m… 8.0 December 8, 2016 1 None 0.079167 positive
In [122]:
good_review.iloc[0]['review']
Out[122]:
'"My son is halfway through his fourth week of Intuniv. We became concerned when he began this last week, when he started taking the highest dose he will be on. For two days, he could hardly get out of bed, was very cranky, and slept for nearly 8 hours on a drive home from school vacation (very unusual for him.) I called his doctor on Monday morning and she said to stick it out a few days. See how he did at school, and with getting up in the morning. The last two days have been problem free. He is MUCH more agreeable than ever. He is less emotional (a good thing), less cranky. He is remembering all the things he should. Overall his behavior is better. \r\nWe have tried many different medications and so far this is the most effective."'
In [ ]:

In [ ]:
#### Questions on UsefulCount
+ number of users who found review useful
+  Top UsefulCount By Drugs/Class
+ Best drugs based usefulcount
In [124]:
df.groupby('drugName')['usefulCount'].value_counts()
Out[124]:
drugName                              usefulCount
A + D Cracked Skin Relief             6              1
A / B Otic                            20             1
Abacavir / dolutegravir / lamivudine  9              6
                                      1              5
                                      12             5
                                                    ..
ella                                  32             1
                                      42             1
femhrt                                0              1
                                      2              1
                                      42             1
Name: usefulCount, Length: 54324, dtype: int64
In [126]:
# Top Drugs Per UsefulCount
df.groupby('drugName')['usefulCount'].nunique().nlargest(20)
Out[126]:
drugName
Fluoxetine       181
Gabapentin       181
Bupropion        177
Citalopram       176
Sertraline       172
Escitalopram     171
Prozac           171
Zoloft           171
Lexapro          169
Celexa           166
Amitriptyline    162
Lorcaserin       157
Trazodone        157
Duloxetine       153
Phentermine      150
Belviq           148
Alprazolam       146
Cymbalta         144
Venlafaxine      144
BuSpar           141
Name: usefulCount, dtype: int64
In [127]:
# Top Drugs Per UsefulCount
df.groupby('drugName')['usefulCount'].nunique().nlargest(20).plot(kind='bar')
Out[127]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f91642abc10>
In [128]:
# Top Drugs Class Per UsefulCount
df.groupby('drug_class')['usefulCount'].nunique().nlargest(20)
Out[128]:
drug_class
opiod analgesics                     212
anti-anxiety                         198
oral hypoglycemics                   157
h2 blockers (anti-ulcers)            147
antifungal (except metronidazole)    139
arb blocker                          129
beta blockers                        123
antibiotic                           118
ace inhibitor                        111
calcium channel blocker              108
corticosteroid (prednisone)           97
antipyschotics (phenothiazine)        95
alpha blocker                         73
beta blocker                          65
antibiotic(penicillins)               60
thrombolytics                         59
anesthetic                            47
neuromuscular blocking agents         37
antibiotic (cephalosporins)           20
barbiturate                           16
Name: usefulCount, dtype: int64
In [129]:
# Top Drugs Class Per UsefulCount
df.groupby('drug_class')['usefulCount'].nunique().nlargest(20).plot(kind='bar')
plt.title("Top Drug Class Per Usefulcount")
plt.show()
In [130]:
# Top Drugs Class Per UsefulCount
df.groupby('drug_class')['usefulCount'].nunique().nsmallest(20).plot(kind='bar')
plt.title("Least Drug Class Per Usefulcount")
plt.show()
In [131]:
### Correlation between Rating and Usefulcount
sns.lineplot(data=df,x='rating',y='usefulCount')
Out[131]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f916410b8d0>

Narrative

  • As the rating goes up the usefulcount goes up
In [133]:
#### Question on Date
df.columns
Out[133]:
Index(['Unnamed: 0', 'drugName', 'condition', 'review', 'rating', 'date',
       'usefulCount', 'drug_class', 'sentiment', 'sentiment_label'],
      dtype='object')
In [134]:
# Rating Per Year
df.groupby('date')['rating'].size()
Out[134]:
date
April 1, 2008        28
April 1, 2009        21
April 1, 2010        16
April 1, 2011        12
April 1, 2012        21
                     ..
September 9, 2013    44
September 9, 2014    45
September 9, 2015    90
September 9, 2016    99
September 9, 2017    55
Name: rating, Length: 3579, dtype: int64
In [135]:
# Averaging Rating Per Day of A Year
df.groupby('date')['rating'].mean()
Out[135]:
date
April 1, 2008        8.285714
April 1, 2009        7.666667
April 1, 2010        7.812500
April 1, 2011        8.583333
April 1, 2012        9.238095
                       ...   
September 9, 2013    8.295455
September 9, 2014    8.800000
September 9, 2015    5.733333
September 9, 2016    6.777778
September 9, 2017    5.127273
Name: rating, Length: 3579, dtype: float64
In [138]:
# Average Rating Per Day of Every Year
df.groupby('date')['rating'].mean().plot(figsize=(20,10))
plt.title("Average Rating Per Day of Every Year")
plt.show()
In [139]:
# Average Useful Per Day of Every Year
df.groupby('date')['usefulCount'].mean().plot(figsize=(20,10))
plt.title("Average UsefulCount Per Day of Every Year")
plt.show()
In [140]:
# Average Sentiment Per Day of Every Year
df.groupby('date')['sentiment'].mean().plot(figsize=(20,10))
plt.title("Average sentiment Per Day of Every Year")
plt.show()
In [144]:
# Amount of Review Per Day of Every Year
df.groupby('date')['review'].size().plot(figsize=(20,10))
plt.title("Amount of Review Per Day of Every Year")
plt.show()
In [145]:
# Amount of Review Per Day of Every Year
df.groupby('date')['review'].size().plot(kind='bar',figsize=(20,10))
plt.title("Amount of Review Per Day of Every Year")
plt.show()
In [ ]:

In [150]:
####  Using DatetimeIndex
grouped_date = df.groupby('date').agg({'rating':np.mean,'usefulCount':np.sum,'review':np.size})
In [151]:
grouped_date
Out[151]:
rating usefulCount review
date
April 1, 2008 8.285714 2303 28
April 1, 2009 7.666667 3698 21
April 1, 2010 7.812500 342 16
April 1, 2011 8.583333 216 12
April 1, 2012 9.238095 1178 21
September 9, 2013 8.295455 1941 44
September 9, 2014 8.800000 2935 45
September 9, 2015 5.733333 1901 90
September 9, 2016 6.777778 1728 99
September 9, 2017 5.127273 298 55

3579 rows × 3 columns

In [154]:
grouped_date.index
Out[154]:
Index(['April 1, 2008', 'April 1, 2009', 'April 1, 2010', 'April 1, 2011',
       'April 1, 2012', 'April 1, 2013', 'April 1, 2014', 'April 1, 2015',
       'April 1, 2016', 'April 1, 2017',
       ...
       'September 9, 2008', 'September 9, 2009', 'September 9, 2010',
       'September 9, 2011', 'September 9, 2012', 'September 9, 2013',
       'September 9, 2014', 'September 9, 2015', 'September 9, 2016',
       'September 9, 2017'],
      dtype='object', name='date', length=3579)
In [155]:
grouped_date['date'] = grouped_date.index
In [157]:
grouped_date['date'] = pd.DatetimeIndex(grouped_date['date'])
In [158]:
grouped_date.dtypes
Out[158]:
rating                float64
usefulCount             int64
review                  int64
date           datetime64[ns]
dtype: object
In [159]:
grouped_date = grouped_date.set_index('date')
In [161]:
# Select A Particular Date Range
grouped_date['2008'].plot()
Out[161]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f915b5241d0>
In [164]:
# AMount of Review Fr 2008
grouped_date['2008']['review'].plot()
plt.title("Amount of Review For 2008")
plt.show()
In [166]:
# AMount of Review Fr 2008
grouped_date['2008':'2009']['review'].plot()
plt.title("Amount of Review For 2008-2009")
plt.show()
In [167]:
# Distribution of Rating Over Time
grouped_date['2008':'2009']['rating'].plot()
plt.title("Distribution of Rating Over Time")
plt.show()
In [169]:
# Distribution of Rating Over Time
grouped_date['2008':'2012']['rating'].plot(figsize=(20,10))
plt.title("Distribution of Rating Over Time")
plt.show()
In [172]:
grouped_date['2008-04'].plot()
Out[172]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f915889f110>
In [173]:
# Distribution of Rating Over A Month
grouped_date['2008-4':'2008-5']['rating'].plot()
plt.title("Distribution of Rating Over Time")
plt.show()
In [174]:
# Save Dataset
df.to_csv("drug_review_dataset_with_sentiment.csv",index=False)

 

You can also check out the video tutorial on YouTube or below

 

Thanks for Your Time

Jesus Saves

By Jesse E.Agbe(JCharis)

Leave a Comment

Your email address will not be published. Required fields are marked *