Data Analysis of Coronavirus Outbreak with Python and Geopandas

 

The Coronavirus (COVID-19) outbreak is an ongoing outbreak which started in 2019 hence the number 19 in the COVID-19. It is caused by the Severe Acute Respiratory Syndrome(SARS) -CoV-2 virus.

In this tutorial we will see how to use python to do some basic exploratory data analysis of the coronavirus outbreak. We will be using the dataset collected from several sources such as

Let us see some basic questions we will be answering with the data we have

  • Number of Cases (Recovered,Confirmed,Deaths)
  • Which country has the highest cases?
  • List of countries affected
  • Distribution Per Continents and Country
  • Cases Per Day
  • Cases Per Country
  • Timeseries Analysis

By analysing our data we can see that it consist of data about

  • Country/Region
  • Latitude and Longitude
  • Date/ Time
  • Numerical Data

Hence by these simple overview we can perform the following types of analysis

  • Geo-spatial analysis from the Latitude and Longitudes.
  • Time series analysis from the Date/Time.
  • Statistical analysis from the numerical data.

Let us start with the basic EDA and then the rest.

We will be using pandas,matplotlib and geopandas to help us with our analysis.

Installation

pip install pandas geopandas matplotlib
Let us see the entire code for our analysis. You can get the notebook and the dataset here
In [1]:
# Load EDA Pkgs
import pandas as pd
import numpy as np
In [2]:
# Load Data Viz Packages
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [3]:
### Load Geopandas
import geopandas as gpd
from shapely.geometry import Point, Polygon
import descartes
In [4]:
# Load Dataset
df = pd.read_csv("data/coronavirus_data.csv")
In [5]:
df.head()
Out[5]:
Index Province/State\n Country/Region\n Lat\n Long\n\n\n\n\n Date\n\n\n\n\n Confirmed\n Deaths\n\n Recovered\n\n\n\n\n
0 1 Anhui Mainland China 31.8257 117.2264 1/22/20 1 0 0
1 2 Beijing Mainland China 40.1824 116.4142 1/22/20 14 0 0
2 3 Chongqing Mainland China 30.0572 107.8740 1/22/20 6 0 0
3 4 Fujian Mainland China 26.0789 117.9874 1/22/20 1 0 0
4 5 Gansu Mainland China 36.0611 103.8343 1/22/20 0 0 0
In [6]:
df.columns
Out[6]:
Index(['Index', 'Province/State\n', 'Country/Region\n', 'Lat\n',
       'Long\n\n\n\n\n', 'Date\n\n\n\n\n', 'Confirmed\n', 'Deaths\n\n',
       'Recovered\n\n\n\n\n'],
      dtype='object')
In [8]:
df.columns.str.replace(r'\n','', regex=True)
Out[8]:
Index(['Index', 'Province/State', 'Country/Region', 'Lat', 'Long', 'Date',
       'Confirmed', 'Deaths', 'Recovered'],
      dtype='object')
In [9]:
df.columns = df.columns.str.replace(r'\n','', regex=True)
In [10]:
df.columns
Out[10]:
Index(['Index', 'Province/State', 'Country/Region', 'Lat', 'Long', 'Date',
       'Confirmed', 'Deaths', 'Recovered'],
      dtype='object')
In [11]:
df.rename(columns={'Province/State':'Province_State','Country/Region':'Country_Region'},inplace=True)
In [12]:
df.columns
Out[12]:
Index(['Index', 'Province_State', 'Country_Region', 'Lat', 'Long', 'Date',
       'Confirmed', 'Deaths', 'Recovered'],
      dtype='object')
In [13]:
# Shape of Dataset
df.shape
Out[13]:
(3885, 9)
In [14]:
# Datatypes
df.dtypes
Out[14]:
Index               int64
Province_State     object
Country_Region     object
Lat               float64
Long              float64
Date               object
Confirmed           int64
Deaths              int64
Recovered           int64
dtype: object
In [15]:
# First 10
df.head(10)
Out[15]:
Index Province_State Country_Region Lat Long Date Confirmed Deaths Recovered
0 1 Anhui Mainland China 31.8257 117.2264 1/22/20 1 0 0
1 2 Beijing Mainland China 40.1824 116.4142 1/22/20 14 0 0
2 3 Chongqing Mainland China 30.0572 107.8740 1/22/20 6 0 0
3 4 Fujian Mainland China 26.0789 117.9874 1/22/20 1 0 0
4 5 Gansu Mainland China 36.0611 103.8343 1/22/20 0 0 0
5 6 Guangdong Mainland China 23.3417 113.4244 1/22/20 26 0 0
6 7 Guangxi Mainland China 23.8298 108.7881 1/22/20 2 0 0
7 8 Guizhou Mainland China 26.8154 106.8748 1/22/20 1 0 0
8 9 Hainan Mainland China 19.1959 109.7453 1/22/20 4 0 0
9 10 Hebei Mainland China 38.0428 114.5149 1/22/20 1 0 0
In [17]:
df = df[['Province_State', 'Country_Region', 'Lat', 'Long', 'Date',
       'Confirmed', 'Deaths', 'Recovered']]
In [18]:
df.isna().sum()
Out[18]:
Province_State    1665
Country_Region       0
Lat                  0
Long                 0
Date                 0
Confirmed            0
Deaths               0
Recovered            0
dtype: int64
In [19]:
df.describe()
Out[19]:
Lat Long Confirmed Deaths Recovered
count 3885.000000 3885.000000 3885.000000 3885.000000 3885.000000
mean 32.252000 45.775760 396.487773 10.804118 78.544402
std 18.256877 84.338854 4017.397180 137.191519 846.918788
min -37.813600 -123.869500 0.000000 0.000000 0.000000
25% 27.610400 8.227500 0.000000 0.000000 0.000000
50% 35.191700 78.000000 2.000000 0.000000 0.000000
75% 42.315400 113.614000 40.000000 0.000000 4.000000
max 64.000000 153.400000 65596.000000 2641.000000 23383.000000
In [20]:
# Number of Case Per Date/Day
df.head()
Out[20]:
Province_State Country_Region Lat Long Date Confirmed Deaths Recovered
0 Anhui Mainland China 31.8257 117.2264 1/22/20 1 0 0
1 Beijing Mainland China 40.1824 116.4142 1/22/20 14 0 0
2 Chongqing Mainland China 30.0572 107.8740 1/22/20 6 0 0
3 Fujian Mainland China 26.0789 117.9874 1/22/20 1 0 0
4 Gansu Mainland China 36.0611 103.8343 1/22/20 0 0 0
In [21]:
df.columns
Out[21]:
Index(['Province_State', 'Country_Region', 'Lat', 'Long', 'Date', 'Confirmed',
       'Deaths', 'Recovered'],
      dtype='object')
In [22]:
df.groupby('Date')['Confirmed','Deaths', 'Recovered'].sum()
Out[22]:
Confirmed Deaths Recovered
Date
1/22/20 555 17 28
1/23/20 653 18 30
1/24/20 941 26 36
1/25/20 1434 42 39
1/26/20 2118 56 52
1/27/20 2927 82 61
1/28/20 5578 131 107
1/29/20 6166 133 126
1/30/20 8234 171 143
1/31/20 9927 213 222
2/1/20 12038 259 284
2/10/20 42763 1013 3946
2/11/20 44803 1113 4683
2/12/20 45222 1118 5150
2/13/20 60370 1371 6295
2/14/20 66887 1523 8058
2/15/20 69032 1666 9395
2/16/20 71226 1770 10865
2/17/20 73260 1868 12583
2/18/20 75138 2007 14352
2/19/20 75641 2122 16121
2/2/20 16787 362 472
2/20/20 76199 2247 18177
2/21/20 76843 2251 18890
2/22/20 78599 2458 22886
2/23/20 78985 2469 23394
2/24/20 79570 2629 25227
2/25/20 80415 2708 27905
2/26/20 81397 2770 30384
2/27/20 82756 2814 33277
2/3/20 19881 426 623
2/4/20 23892 492 852
2/5/20 27636 564 1124
2/6/20 30818 634 1487
2/7/20 34392 719 2011
2/8/20 37121 806 2616
2/9/20 40151 906 3244
In [23]:
df.groupby('Date')['Confirmed','Deaths', 'Recovered'].max()
Out[23]:
Confirmed Deaths Recovered
Date
1/22/20 444 17 28
1/23/20 444 17 28
1/24/20 549 24 31
1/25/20 761 40 32
1/26/20 1058 52 42
1/27/20 1423 76 45
1/28/20 3554 125 80
1/29/20 3554 125 88
1/30/20 4903 162 90
1/31/20 5806 204 141
2/1/20 7153 249 168
2/10/20 31728 974 2222
2/11/20 33366 1068 2639
2/12/20 33366 1068 2686
2/13/20 48206 1310 3459
2/14/20 54406 1457 4774
2/15/20 56249 1596 5623
2/16/20 58182 1696 6639
2/17/20 59989 1789 7862
2/18/20 61682 1921 9128
2/19/20 62031 2029 10337
2/2/20 11177 350 295
2/20/20 62442 2144 11788
2/21/20 62662 2144 11881
2/22/20 64084 2346 15299
2/23/20 64084 2346 15343
2/24/20 64287 2495 16748
2/25/20 64786 2563 18971
2/26/20 65187 2615 20969
2/27/20 65596 2641 23383
2/3/20 13522 414 386
2/4/20 16678 479 522
2/5/20 19665 549 633
2/6/20 22112 618 817
2/7/20 24953 699 1115
2/8/20 27100 780 1439
2/9/20 29631 871 1795
In [24]:
df_per_day = df.groupby('Date')['Confirmed','Deaths', 'Recovered'].max()
In [25]:
df_per_day.head()
Out[25]:
Confirmed Deaths Recovered
Date
1/22/20 444 17 28
1/23/20 444 17 28
1/24/20 549 24 31
1/25/20 761 40 32
1/26/20 1058 52 42
In [26]:
df_per_day.describe()
Out[26]:
Confirmed Deaths Recovered
count 37.000000 37.000000 37.000000
mean 32616.756757 1082.513514 5338.540541
std 25664.132012 915.678972 6895.411802
min 444.000000 17.000000 28.000000
25% 5806.000000 204.000000 141.000000
50% 29631.000000 871.000000 1795.000000
75% 61682.000000 1921.000000 9128.000000
max 65596.000000 2641.000000 23383.000000
In [29]:
# Max No of Cases
df_per_day['Confirmed'].max()
Out[29]:
65596
In [30]:
# Min No Of Cases
df_per_day['Confirmed'].min()
Out[30]:
444
In [31]:
# Date for Maximum Number Cases
df_per_day['Confirmed'].idxmax()
Out[31]:
'2/27/20'
In [32]:
# Date for Min Number Cases
df_per_day['Confirmed'].idxmin()
Out[32]:
'1/22/20'
In [33]:
# Number of Case Per Country/Province
df.groupby(['Country_Region'])['Confirmed','Deaths', 'Recovered'].max()
Out[33]:
Confirmed Deaths Recovered
Country_Region
Afghanistan 1 0 0
Algeria 1 0 0
Australia 8 0 4
Austria 3 0 0
Bahrain 33 0 0
Belgium 1 0 1
Brazil 1 0 0
Cambodia 1 0 1
Canada 7 0 3
Croatia 3 0 0
Denmark 1 0 0
Egypt 1 0 0
Estonia 1 0 0
Finland 2 0 1
France 38 2 11
Georgia 1 0 0
Germany 46 0 16
Greece 3 0 0
Hong Kong 92 2 24
India 3 0 3
Iran 245 26 49
Iraq 7 0 0
Israel 3 0 1
Italy 655 17 45
Japan 214 4 22
Kuwait 43 0 0
Lebanon 2 0 0
Macau 10 0 8
Mainland China 65596 2641 23383
Malaysia 23 0 18
Nepal 1 0 1
Netherlands 1 0 0
North Macedonia 1 0 0
Norway 1 0 0
Oman 4 0 0
Others 705 4 10
Pakistan 2 0 0
Philippines 3 1 1
Romania 1 0 0
Russia 2 0 2
San Marino 1 0 0
Singapore 93 0 62
South Korea 1766 13 22
Spain 15 0 2
Sri Lanka 1 0 1
Sweden 7 0 0
Switzerland 8 0 0
Taiwan 32 1 5
Thailand 40 0 22
UK 15 0 8
US 42 0 2
United Arab Emirates 13 0 4
Vietnam 16 0 16
In [34]:
# Number of Case Per Country/Province
df.groupby(['Province_State','Country_Region'])['Confirmed','Deaths', 'Recovered'].max()
Out[34]:
Confirmed Deaths Recovered
Province_State Country_Region
Anhui Mainland China 989 6 792
Beijing Mainland China 410 5 248
Boston, MA US 1 0 0
British Columbia Canada 7 0 3
Chicago, IL US 2 0 2
Chongqing Mainland China 576 6 401
Diamond Princess cruise ship Others 705 4 10
From Diamond Princess Australia 8 0 0
Fujian Mainland China 296 1 228
Gansu Mainland China 91 2 81
Guangdong Mainland China 1347 7 890
Guangxi Mainland China 252 2 161
Guizhou Mainland China 146 2 112
Hainan Mainland China 168 5 131
Hebei Mainland China 317 6 274
Heilongjiang Mainland China 480 13 270
Henan Mainland China 1272 20 1068
Hong Kong Hong Kong 92 2 24
Hubei Mainland China 65596 2641 23383
Humboldt County, CA US 1 0 0
Hunan Mainland China 1017 4 804
Inner Mongolia Mainland China 75 0 43
Jiangsu Mainland China 631 0 498
Jiangxi Mainland China 934 1 754
Jilin Mainland China 93 1 67
Lackland, TX (From Diamond Princess) US 2 0 0
Liaoning Mainland China 121 1 93
London, ON Canada 1 0 1
Los Angeles, CA US 1 0 0
Macau Macau 10 0 8
Madison, WI US 1 0 0
New South Wales Australia 4 0 4
Ningxia Mainland China 72 0 68
Omaha, NE (From Diamond Princess) US 11 0 0
Orange, CA US 1 0 0
Qinghai Mainland China 18 0 18
Queensland Australia 5 0 1
Sacramento County, CA US 2 0 0
San Antonio, TX US 1 0 0
San Benito, CA US 2 0 0
San Diego County, CA US 2 0 1
Santa Clara, CA US 2 0 1
Seattle, WA US 1 0 1
Shaanxi Mainland China 245 1 195
Shandong Mainland China 756 6 387
Shanghai Mainland China 337 3 276
Shanxi Mainland China 133 0 107
Sichuan Mainland China 534 3 321
South Australia Australia 2 0 2
Taiwan Taiwan 32 1 5
Tempe, AZ US 1 0 1
Tianjin Mainland China 136 3 102
Tibet Mainland China 1 0 1
Toronto, ON Canada 5 0 2
Travis, CA (From Diamond Princess) US 5 0 0
Unassigned Location (From Diamond Princess) US 42 0 0
Victoria Australia 4 0 4
Xinjiang Mainland China 76 2 43
Yunnan Mainland China 174 2 150
Zhejiang Mainland China 1205 1 932
In [35]:
df['Country_Region'].value_counts()
Out[35]:
Mainland China          1147
US                       629
Australia                185
Canada                   111
Philippines               37
UK                        37
Egypt                     37
India                     37
Sri Lanka                 37
Cambodia                  37
Spain                     37
Georgia                   37
Hong Kong                 37
South Korea               37
Oman                      37
Estonia                   37
Afghanistan               37
Algeria                   37
Vietnam                   37
Pakistan                  37
San Marino                37
Sweden                    37
Greece                    37
Taiwan                    37
Croatia                   37
Bahrain                   37
Iran                      37
Romania                   37
Netherlands               37
Brazil                    37
Lebanon                   37
Russia                    37
Switzerland               37
Austria                   37
France                    37
Belgium                   37
Germany                   37
North Macedonia           37
Iraq                      37
Norway                    37
Finland                   37
Others                    37
Nepal                     37
Japan                     37
Israel                    37
Thailand                  37
Denmark                   37
Italy                     37
Malaysia                  37
Macau                     37
Kuwait                    37
Singapore                 37
United Arab Emirates      37
Name: Country_Region, dtype: int64
In [36]:
df['Country_Region'].value_counts().plot(kind='bar',figsize=(20,10))
Out[36]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f814b9716d0>
In [37]:
# How Many Country Affect
df['Country_Region'].unique()
Out[37]:
array(['Mainland China', 'Thailand', 'Japan', 'South Korea', 'Taiwan',
       'US', 'Macau', 'Hong Kong', 'Singapore', 'Vietnam', 'France',
       'Nepal', 'Malaysia', 'Canada', 'Australia', 'Cambodia',
       'Sri Lanka', 'Germany', 'Finland', 'United Arab Emirates',
       'Philippines', 'India', 'Italy', 'UK', 'Russia', 'Sweden', 'Spain',
       'Belgium', 'Others', 'Egypt', 'Iran', 'Lebanon', 'Iraq', 'Oman',
       'Afghanistan', 'Bahrain', 'Kuwait', 'Algeria', 'Croatia',
       'Switzerland', 'Austria', 'Israel', 'Pakistan', 'Brazil',
       'Georgia', 'Greece', 'North Macedonia', 'Norway', 'Romania',
       'Denmark', 'Estonia', 'Netherlands', 'San Marino'], dtype=object)
In [38]:
# How Many Country Affect
len(df['Country_Region'].unique())
Out[38]:
53
In [40]:
plt.figure(figsize=(20,10))
df['Country_Region'].value_counts().plot.pie(autopct="%1.1f%%")
Out[40]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f814b1e6a90>

Check for Distribution on Map

  • Lat/Long
  • Geometry/ Point
In [41]:
dir(gpd)
Out[41]:
['GeoDataFrame',
 'GeoSeries',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_compat',
 '_config',
 '_version',
 'array',
 'base',
 'clip',
 'datasets',
 'geodataframe',
 'geopandas',
 'geoseries',
 'gpd',
 'io',
 'np',
 'options',
 'overlay',
 'pd',
 'plotting',
 'points_from_xy',
 'read_file',
 'read_postgis',
 'show_versions',
 'sjoin',
 'tools']
In [43]:
df.head()
Out[43]:
Province_State Country_Region Lat Long Date Confirmed Deaths Recovered
0 Anhui Mainland China 31.8257 117.2264 1/22/20 1 0 0
1 Beijing Mainland China 40.1824 116.4142 1/22/20 14 0 0
2 Chongqing Mainland China 30.0572 107.8740 1/22/20 6 0 0
3 Fujian Mainland China 26.0789 117.9874 1/22/20 1 0 0
4 Gansu Mainland China 36.0611 103.8343 1/22/20 0 0 0
In [45]:
# Convert Data to GeoDataframe
gdf01 = gpd.GeoDataFrame(df,geometry=gpd.points_from_xy(df['Long'],df['Lat']))
In [46]:
gdf01.head()
Out[46]:
Province_State Country_Region Lat Long Date Confirmed Deaths Recovered geometry
0 Anhui Mainland China 31.8257 117.2264 1/22/20 1 0 0 POINT (117.22640 31.82570)
1 Beijing Mainland China 40.1824 116.4142 1/22/20 14 0 0 POINT (116.41420 40.18240)
2 Chongqing Mainland China 30.0572 107.8740 1/22/20 6 0 0 POINT (107.87400 30.05720)
3 Fujian Mainland China 26.0789 117.9874 1/22/20 1 0 0 POINT (117.98740 26.07890)
4 Gansu Mainland China 36.0611 103.8343 1/22/20 0 0 0 POINT (103.83430 36.06110)
In [47]:
type(gdf01)
Out[47]:
geopandas.geodataframe.GeoDataFrame
In [48]:
# Method 2
points = [ Point(x,y) for x,y in zip(df.Long,df.Lat)]
In [49]:
gdf03 = gpd.GeoDataFrame(df,geometry=points)
In [50]:
gdf03
Out[50]:
Province_State Country_Region Lat Long Date Confirmed Deaths Recovered geometry
0 Anhui Mainland China 31.8257 117.2264 1/22/20 1 0 0 POINT (117.22640 31.82570)
1 Beijing Mainland China 40.1824 116.4142 1/22/20 14 0 0 POINT (116.41420 40.18240)
2 Chongqing Mainland China 30.0572 107.8740 1/22/20 6 0 0 POINT (107.87400 30.05720)
3 Fujian Mainland China 26.0789 117.9874 1/22/20 1 0 0 POINT (117.98740 26.07890)
4 Gansu Mainland China 36.0611 103.8343 1/22/20 0 0 0 POINT (103.83430 36.06110)
3880 NaN Romania 45.9432 24.9668 2/27/20 1 0 0 POINT (24.96680 45.94320)
3881 NaN Denmark 56.2639 9.5018 2/27/20 1 0 0 POINT (9.50180 56.26390)
3882 NaN Estonia 58.5953 25.0136 2/27/20 1 0 0 POINT (25.01360 58.59530)
3883 NaN Netherlands 52.1326 5.2913 2/27/20 1 0 0 POINT (5.29130 52.13260)
3884 NaN San Marino 43.9424 12.4578 2/27/20 1 0 0 POINT (12.45780 43.94240)

3885 rows × 9 columns

In [51]:
# Map Plot
gdf01.plot(figsize=(20,10))
Out[51]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f814af95fd0>
In [52]:
# overlapping with world map
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
ax = world.plot(figsize=(20,10))
ax.axis('off')
Out[52]:
(-198.0, 198.00000000000006, -98.6822565, 92.32738650000002)
In [53]:
# Overlap
fig,ax = plt.subplots(figsize=(20,10))
gdf01.plot(cmap='Purples',ax=ax)
world.geometry.boundary.plot(color=None,edgecolor='k',linewidth=2,ax=ax)
Out[53]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f81491e8e10>
In [54]:
fig,ax = plt.subplots(figsize=(20,10))
gdf01.plot(cmap='Purples',ax=ax)
world.geometry.plot(color='Yellow',edgecolor='k',linewidth=2,ax=ax)
Out[54]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8149126050>
In [55]:
# Per Country
world
Out[55]:
pop_est continent name iso_a3 gdp_md_est geometry
0 920938 Oceania Fiji FJI 8374.0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000…
1 53950935 Africa Tanzania TZA 150600.0 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982…
2 603253 Africa W. Sahara ESH 906.5 POLYGON ((-8.66559 27.65643, -8.66512 27.58948…
3 35623680 North America Canada CAN 1674000.0 MULTIPOLYGON (((-122.84000 49.00000, -122.9742…
4 326625791 North America United States of America USA 18560000.0 MULTIPOLYGON (((-122.84000 49.00000, -120.0000…
172 7111024 Europe Serbia SRB 101800.0 POLYGON ((18.82982 45.90887, 18.82984 45.90888…
173 642550 Europe Montenegro MNE 10610.0 POLYGON ((20.07070 42.58863, 19.80161 42.50009…
174 1895250 Europe Kosovo -99 18490.0 POLYGON ((20.59025 41.85541, 20.52295 42.21787…
175 1218208 North America Trinidad and Tobago TTO 43570.0 POLYGON ((-61.68000 10.76000, -61.10500 10.890…
176 13026129 Africa S. Sudan SSD 20880.0 POLYGON ((30.83385 3.50917, 29.95350 4.17370, …

177 rows × 6 columns

In [56]:
world['continent'].unique()
Out[56]:
array(['Oceania', 'Africa', 'North America', 'Asia', 'South America',
       'Europe', 'Seven seas (open ocean)', 'Antarctica'], dtype=object)
In [57]:
asia = world[world['continent'] == 'Asia']
In [58]:
asia
Out[58]:
pop_est continent name iso_a3 gdp_md_est geometry
5 18556698 Asia Kazakhstan KAZ 460700.00 POLYGON ((87.35997 49.21498, 86.59878 48.54918…
6 29748859 Asia Uzbekistan UZB 202300.00 POLYGON ((55.96819 41.30864, 55.92892 44.99586…
8 260580739 Asia Indonesia IDN 3028000.00 MULTIPOLYGON (((141.00021 -2.60015, 141.01706 …
24 1291358 Asia Timor-Leste TLS 4975.00 POLYGON ((124.96868 -8.89279, 125.08625 -8.656…
76 8299706 Asia Israel ISR 297000.00 POLYGON ((35.71992 32.70919, 35.54567 32.39399…
77 6229794 Asia Lebanon LBN 85160.00 POLYGON ((35.82110 33.27743, 35.55280 33.26427…
79 4543126 Asia Palestine PSE 21220.77 POLYGON ((35.39756 31.48909, 34.92741 31.35344…
83 10248069 Asia Jordan JOR 86190.00 POLYGON ((35.54567 32.39399, 35.71992 32.70919…
84 6072475 Asia United Arab Emirates ARE 667200.00 POLYGON ((51.57952 24.24550, 51.75744 24.29407…
85 2314307 Asia Qatar QAT 334500.00 POLYGON ((50.81011 24.75474, 50.74391 25.48242…
86 2875422 Asia Kuwait KWT 301100.00 POLYGON ((47.97452 29.97582, 48.18319 29.53448…
87 39192111 Asia Iraq IRQ 596700.00 POLYGON ((39.19547 32.16101, 38.79234 33.37869…
88 3424386 Asia Oman OMN 173100.00 MULTIPOLYGON (((55.20834 22.70833, 55.23449 23…
90 16204486 Asia Cambodia KHM 58940.00 POLYGON ((102.58493 12.18659, 102.34810 13.394…
91 68414135 Asia Thailand THA 1161000.00 POLYGON ((105.21878 14.27321, 104.28142 14.416…
92 7126706 Asia Laos LAO 40960.00 POLYGON ((107.38273 14.20244, 106.49637 14.570…
93 55123814 Asia Myanmar MMR 311100.00 POLYGON ((100.11599 20.41785, 99.54331 20.1866…
94 96160163 Asia Vietnam VNM 594900.00 POLYGON ((104.33433 10.48654, 105.19991 10.889…
95 25248140 Asia North Korea PRK 40000.00 MULTIPOLYGON (((130.78000 42.22001, 130.78000 …
96 51181299 Asia South Korea KOR 1929000.00 POLYGON ((126.17476 37.74969, 126.23734 37.840…
97 3068243 Asia Mongolia MNG 37000.00 POLYGON ((87.75126 49.29720, 88.80557 49.47052…
98 1281935911 Asia India IND 8721000.00 POLYGON ((97.32711 28.26158, 97.40256 27.88254…
99 157826578 Asia Bangladesh BGD 628400.00 POLYGON ((92.67272 22.04124, 92.65226 21.32405…
100 758288 Asia Bhutan BTN 6432.00 POLYGON ((91.69666 27.77174, 92.10371 27.45261…
101 29384297 Asia Nepal NPL 71520.00 POLYGON ((88.12044 27.87654, 88.04313 27.44582…
102 204924861 Asia Pakistan PAK 988200.00 POLYGON ((77.83745 35.49401, 76.87172 34.65354…
103 34124811 Asia Afghanistan AFG 64080.00 POLYGON ((66.51861 37.36278, 67.07578 37.35614…
104 8468555 Asia Tajikistan TJK 25810.00 POLYGON ((67.83000 37.14499, 68.39203 38.15703…
105 5789122 Asia Kyrgyzstan KGZ 21010.00 POLYGON ((70.96231 42.26615, 71.18628 42.70429…
106 5351277 Asia Turkmenistan TKM 94720.00 POLYGON ((52.50246 41.78332, 52.94429 42.11603…
107 82021564 Asia Iran IRN 1459000.00 POLYGON ((48.56797 29.92678, 48.01457 30.45246…
108 18028549 Asia Syria SYR 50280.00 POLYGON ((35.71992 32.70919, 35.70080 32.71601…
109 3045191 Asia Armenia ARM 26300.00 POLYGON ((46.50572 38.77061, 46.14362 38.74120…
124 80845215 Asia Turkey TUR 1670000.00 MULTIPOLYGON (((44.77268 37.17044, 44.29345 37…
138 22409381 Asia Sri Lanka LKA 236700.00 POLYGON ((81.78796 7.52306, 81.63732 6.48178, …
139 1379302771 Asia China CHN 21140000.00 MULTIPOLYGON (((109.47521 18.19770, 108.65521 …
140 23508428 Asia Taiwan TWN 1127000.00 POLYGON ((121.77782 24.39427, 121.17563 22.790…
145 9961396 Asia Azerbaijan AZE 167900.00 MULTIPOLYGON (((46.40495 41.86068, 46.68607 41…
146 4926330 Asia Georgia GEO 37270.00 POLYGON ((39.95501 43.43500, 40.07696 43.55310…
147 104256076 Asia Philippines PHL 801900.00 MULTIPOLYGON (((120.83390 12.70450, 120.32344 …
148 31381992 Asia Malaysia MYS 863000.00 MULTIPOLYGON (((100.08576 6.46449, 100.25960 6…
149 443593 Asia Brunei BRN 33730.00 POLYGON ((115.45071 5.44773, 115.40570 4.95523…
155 126451398 Asia Japan JPN 4932000.00 MULTIPOLYGON (((141.88460 39.18086, 140.95949 …
157 28036829 Asia Yemen YEM 73450.00 POLYGON ((52.00001 19.00000, 52.78218 17.34974…
158 28571770 Asia Saudi Arabia SAU 1731000.00 POLYGON ((34.95604 29.35655, 36.06894 29.19749…
160 265100 Asia N. Cyprus -99 3600.00 POLYGON ((32.73178 35.14003, 32.80247 35.14550…
161 1221549 Asia Cyprus CYP 29260.00 POLYGON ((32.73178 35.14003, 32.91957 35.08783…
In [59]:
africa = world[world['continent'] == 'Africa']
north_america = world[world['continent'] == 'North America']
europe = world[world['continent'] == 'Europe']
In [60]:
# Cases in China
df.head()
Out[60]:
Province_State Country_Region Lat Long Date Confirmed Deaths Recovered geometry
0 Anhui Mainland China 31.8257 117.2264 1/22/20 1 0 0 POINT (117.22640 31.82570)
1 Beijing Mainland China 40.1824 116.4142 1/22/20 14 0 0 POINT (116.41420 40.18240)
2 Chongqing Mainland China 30.0572 107.8740 1/22/20 6 0 0 POINT (107.87400 30.05720)
3 Fujian Mainland China 26.0789 117.9874 1/22/20 1 0 0 POINT (117.98740 26.07890)
4 Gansu Mainland China 36.0611 103.8343 1/22/20 0 0 0 POINT (103.83430 36.06110)
In [61]:
df[df['Country_Region'] == 'Mainland China']
Out[61]:
Province_State Country_Region Lat Long Date Confirmed Deaths Recovered geometry
0 Anhui Mainland China 31.8257 117.2264 1/22/20 1 0 0 POINT (117.22640 31.82570)
1 Beijing Mainland China 40.1824 116.4142 1/22/20 14 0 0 POINT (116.41420 40.18240)
2 Chongqing Mainland China 30.0572 107.8740 1/22/20 6 0 0 POINT (107.87400 30.05720)
3 Fujian Mainland China 26.0789 117.9874 1/22/20 1 0 0 POINT (117.98740 26.07890)
4 Gansu Mainland China 36.0611 103.8343 1/22/20 0 0 0 POINT (103.83430 36.06110)
3806 Tianjin Mainland China 39.3054 117.3230 2/27/20 136 3 102 POINT (117.32300 39.30540)
3807 Tibet Mainland China 31.6927 88.0924 2/27/20 1 0 1 POINT (88.09240 31.69270)
3808 Xinjiang Mainland China 41.1129 85.2401 2/27/20 76 2 43 POINT (85.24010 41.11290)
3809 Yunnan Mainland China 24.9740 101.4870 2/27/20 174 2 150 POINT (101.48700 24.97400)
3810 Zhejiang Mainland China 29.1832 120.0934 2/27/20 1205 1 932 POINT (120.09340 29.18320)

1147 rows × 9 columns

In [62]:
gdf01[gdf01['Country_Region'] == 'Mainland China']
Out[62]:
Province_State Country_Region Lat Long Date Confirmed Deaths Recovered geometry
0 Anhui Mainland China 31.8257 117.2264 1/22/20 1 0 0 POINT (117.22640 31.82570)
1 Beijing Mainland China 40.1824 116.4142 1/22/20 14 0 0 POINT (116.41420 40.18240)
2 Chongqing Mainland China 30.0572 107.8740 1/22/20 6 0 0 POINT (107.87400 30.05720)
3 Fujian Mainland China 26.0789 117.9874 1/22/20 1 0 0 POINT (117.98740 26.07890)
4 Gansu Mainland China 36.0611 103.8343 1/22/20 0 0 0 POINT (103.83430 36.06110)
3806 Tianjin Mainland China 39.3054 117.3230 2/27/20 136 3 102 POINT (117.32300 39.30540)
3807 Tibet Mainland China 31.6927 88.0924 2/27/20 1 0 1 POINT (88.09240 31.69270)
3808 Xinjiang Mainland China 41.1129 85.2401 2/27/20 76 2 43 POINT (85.24010 41.11290)
3809 Yunnan Mainland China 24.9740 101.4870 2/27/20 174 2 150 POINT (101.48700 24.97400)
3810 Zhejiang Mainland China 29.1832 120.0934 2/27/20 1205 1 932 POINT (120.09340 29.18320)

1147 rows × 9 columns

In [63]:
# Overlap
fig,ax = plt.subplots(figsize=(20,10))
gdf01[gdf01['Country_Region'] == 'Mainland China'].plot(cmap='Purples',ax=ax)
world.geometry.boundary.plot(color=None,edgecolor='k',linewidth=2,ax=ax)
Out[63]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8149bb84d0>
In [65]:
# Overlap
fig,ax = plt.subplots(figsize=(20,10))
gdf01[gdf01['Country_Region'] == 'Mainland China'].plot(cmap='Purples',ax=ax)
asia.geometry.boundary.plot(color=None,edgecolor='k',linewidth=2,ax=ax)
Out[65]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f81491e1b10>
In [66]:
# Overlap
fig,ax = plt.subplots(figsize=(20,10))
gdf01[gdf01['Country_Region'] == 'India'].plot(cmap='Purples',ax=ax)
asia.geometry.boundary.plot(color=None,edgecolor='k',linewidth=2,ax=ax)
Out[66]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8149021bd0>
In [67]:
# Overlap
fig,ax = plt.subplots(figsize=(20,10))
gdf01[gdf01['Country_Region'] == 'Egypt'].plot(cmap='Purples',ax=ax)
africa.geometry.boundary.plot(color=None,edgecolor='k',linewidth=2,ax=ax)
Out[67]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8148f946d0>
In [68]:
# Overlap
fig,ax = plt.subplots(figsize=(20,10))
gdf01[gdf01['Country_Region'] == 'US'].plot(cmap='Purples',ax=ax)
north_america.geometry.boundary.plot(color=None,edgecolor='k',linewidth=2,ax=ax)
Out[68]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8148f102d0>
In [69]:
# Overlap
fig,ax = plt.subplots(figsize=(20,10))
gdf01[gdf01['Country_Region'] == 'UK'].plot(cmap='Purples',ax=ax)
europe.geometry.boundary.plot(color=None,edgecolor='k',linewidth=2,ax=ax)
Out[69]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8148e85b50>
In [ ]:

In [70]:
# Time Series Analysis
df.head()
Out[70]:
Province_State Country_Region Lat Long Date Confirmed Deaths Recovered geometry
0 Anhui Mainland China 31.8257 117.2264 1/22/20 1 0 0 POINT (117.22640 31.82570)
1 Beijing Mainland China 40.1824 116.4142 1/22/20 14 0 0 POINT (116.41420 40.18240)
2 Chongqing Mainland China 30.0572 107.8740 1/22/20 6 0 0 POINT (107.87400 30.05720)
3 Fujian Mainland China 26.0789 117.9874 1/22/20 1 0 0 POINT (117.98740 26.07890)
4 Gansu Mainland China 36.0611 103.8343 1/22/20 0 0 0 POINT (103.83430 36.06110)
In [71]:
df_per_day
Out[71]:
Confirmed Deaths Recovered
Date
1/22/20 444 17 28
1/23/20 444 17 28
1/24/20 549 24 31
1/25/20 761 40 32
1/26/20 1058 52 42
1/27/20 1423 76 45
1/28/20 3554 125 80
1/29/20 3554 125 88
1/30/20 4903 162 90
1/31/20 5806 204 141
2/1/20 7153 249 168
2/10/20 31728 974 2222
2/11/20 33366 1068 2639
2/12/20 33366 1068 2686
2/13/20 48206 1310 3459
2/14/20 54406 1457 4774
2/15/20 56249 1596 5623
2/16/20 58182 1696 6639
2/17/20 59989 1789 7862
2/18/20 61682 1921 9128
2/19/20 62031 2029 10337
2/2/20 11177 350 295
2/20/20 62442 2144 11788
2/21/20 62662 2144 11881
2/22/20 64084 2346 15299
2/23/20 64084 2346 15343
2/24/20 64287 2495 16748
2/25/20 64786 2563 18971
2/26/20 65187 2615 20969
2/27/20 65596 2641 23383
2/3/20 13522 414 386
2/4/20 16678 479 522
2/5/20 19665 549 633
2/6/20 22112 618 817
2/7/20 24953 699 1115
2/8/20 27100 780 1439
2/9/20 29631 871 1795
In [72]:
# Copy
df2 = df
In [73]:
df.to_csv("coronavirus_data_clean.csv")
In [74]:
import datetime as dt
In [75]:
df['cases_date'] = pd.to_datetime(df2['Date'])
In [76]:
df2.dtypes
Out[76]:
Province_State            object
Country_Region            object
Lat                      float64
Long                     float64
Date                      object
Confirmed                  int64
Deaths                     int64
Recovered                  int64
geometry                geometry
cases_date        datetime64[ns]
dtype: object
In [77]:
df['cases_date'].plot(figsize=(20,10))
Out[77]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8148e09750>
In [78]:
ts = df2.set_index('cases_date')

ts

In [80]:
# Select For January
ts.loc['2020-01']
Out[80]:
Province_State Country_Region Lat Long Date Confirmed Deaths Recovered geometry
cases_date
2020-01-22 Anhui Mainland China 31.8257 117.2264 1/22/20 1 0 0 POINT (117.22640 31.82570)
2020-01-22 Beijing Mainland China 40.1824 116.4142 1/22/20 14 0 0 POINT (116.41420 40.18240)
2020-01-22 Chongqing Mainland China 30.0572 107.8740 1/22/20 6 0 0 POINT (107.87400 30.05720)
2020-01-22 Fujian Mainland China 26.0789 117.9874 1/22/20 1 0 0 POINT (117.98740 26.07890)
2020-01-22 Gansu Mainland China 36.0611 103.8343 1/22/20 0 0 0 POINT (103.83430 36.06110)
2020-01-31 NaN Romania 45.9432 24.9668 1/31/20 0 0 0 POINT (24.96680 45.94320)
2020-01-31 NaN Denmark 56.2639 9.5018 1/31/20 0 0 0 POINT (9.50180 56.26390)
2020-01-31 NaN Estonia 58.5953 25.0136 1/31/20 0 0 0 POINT (25.01360 58.59530)
2020-01-31 NaN Netherlands 52.1326 5.2913 1/31/20 0 0 0 POINT (5.29130 52.13260)
2020-01-31 NaN San Marino 43.9424 12.4578 1/31/20 0 0 0 POINT (12.45780 43.94240)

1050 rows × 9 columns

In [81]:
ts.loc['2020-02-24':'2020-02-25']
Out[81]:
Province_State Country_Region Lat Long Date Confirmed Deaths Recovered geometry
cases_date
2020-02-24 Anhui Mainland China 31.8257 117.2264 2/24/20 989 6 663 POINT (117.22640 31.82570)
2020-02-24 Beijing Mainland China 40.1824 116.4142 2/24/20 399 4 198 POINT (116.41420 40.18240)
2020-02-24 Chongqing Mainland China 30.0572 107.8740 2/24/20 576 6 349 POINT (107.87400 30.05720)
2020-02-24 Fujian Mainland China 26.0789 117.9874 2/24/20 293 1 183 POINT (117.98740 26.07890)
2020-02-24 Gansu Mainland China 36.0611 103.8343 2/24/20 91 2 80 POINT (103.83430 36.06110)
2020-02-25 NaN Romania 45.9432 24.9668 2/25/20 0 0 0 POINT (24.96680 45.94320)
2020-02-25 NaN Denmark 56.2639 9.5018 2/25/20 0 0 0 POINT (9.50180 56.26390)
2020-02-25 NaN Estonia 58.5953 25.0136 2/25/20 0 0 0 POINT (25.01360 58.59530)
2020-02-25 NaN Netherlands 52.1326 5.2913 2/25/20 0 0 0 POINT (5.29130 52.13260)
2020-02-25 NaN San Marino 43.9424 12.4578 2/25/20 0 0 0 POINT (12.45780 43.94240)

210 rows × 9 columns

In [82]:
ts.loc['2020-02-24':'2020-02-25'][['Confirmed','Recovered']]
Out[82]:
Confirmed Recovered
cases_date
2020-02-24 989 663
2020-02-24 399 198
2020-02-24 576 349
2020-02-24 293 183
2020-02-24 91 80
2020-02-25 0 0
2020-02-25 0 0
2020-02-25 0 0
2020-02-25 0 0
2020-02-25 0 0

210 rows × 2 columns

In [83]:
ts.loc['2020-02-24':'2020-02-25'][['Confirmed','Recovered']].plot(figsize=(20,10))
Out[83]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8148e01950>
In [86]:
ts.loc['2020-02-2':'2020-02-25'][['Confirmed','Deaths']].plot(figsize=(20,10))
/usr/local/lib/python3.7/dist-packages/pandas/plotting/_matplotlib/core.py:1085: UserWarning: Attempting to set identical left == right == 737480.0 results in singular transformations; automatically expanding.
  ax.set_xlim(left, right)
Out[86]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8148bad7d0>
In [89]:
df_by_date = ts.groupby(['cases_date']).sum().reset_index(drop=None)
In [90]:
df_by_date
Out[90]:
cases_date Lat Long Confirmed Deaths Recovered
0 2020-01-22 3386.46002 4806.4548 555 17 28
1 2020-01-23 3386.46002 4806.4548 653 18 30
2 2020-01-24 3386.46002 4806.4548 941 26 36
3 2020-01-25 3386.46002 4806.4548 1434 42 39
4 2020-01-26 3386.46002 4806.4548 2118 56 52
5 2020-01-27 3386.46002 4806.4548 2927 82 61
6 2020-01-28 3386.46002 4806.4548 5578 131 107
7 2020-01-29 3386.46002 4806.4548 6166 133 126
8 2020-01-30 3386.46002 4806.4548 8234 171 143
9 2020-01-31 3386.46002 4806.4548 9927 213 222
10 2020-02-01 3386.46002 4806.4548 12038 259 284
11 2020-02-02 3386.46002 4806.4548 16787 362 472
12 2020-02-03 3386.46002 4806.4548 19881 426 623
13 2020-02-04 3386.46002 4806.4548 23892 492 852
14 2020-02-05 3386.46002 4806.4548 27636 564 1124
15 2020-02-06 3386.46002 4806.4548 30818 634 1487
16 2020-02-07 3386.46002 4806.4548 34392 719 2011
17 2020-02-08 3386.46002 4806.4548 37121 806 2616
18 2020-02-09 3386.46002 4806.4548 40151 906 3244
19 2020-02-10 3386.46002 4806.4548 42763 1013 3946
20 2020-02-11 3386.46002 4806.4548 44803 1113 4683
21 2020-02-12 3386.46002 4806.4548 45222 1118 5150
22 2020-02-13 3386.46002 4806.4548 60370 1371 6295
23 2020-02-14 3386.46002 4806.4548 66887 1523 8058
24 2020-02-15 3386.46002 4806.4548 69032 1666 9395
25 2020-02-16 3386.46002 4806.4548 71226 1770 10865
26 2020-02-17 3386.46002 4806.4548 73260 1868 12583
27 2020-02-18 3386.46002 4806.4548 75138 2007 14352
28 2020-02-19 3386.46002 4806.4548 75641 2122 16121
29 2020-02-20 3386.46002 4806.4548 76199 2247 18177
30 2020-02-21 3386.46002 4806.4548 76843 2251 18890
31 2020-02-22 3386.46002 4806.4548 78599 2458 22886
32 2020-02-23 3386.46002 4806.4548 78985 2469 23394
33 2020-02-24 3386.46002 4806.4548 79570 2629 25227
34 2020-02-25 3386.46002 4806.4548 80415 2708 27905
35 2020-02-26 3386.46002 4806.4548 81397 2770 30384
36 2020-02-27 3386.46002 4806.4548 82756 2814 33277
In [91]:
df_by_date.columns
Out[91]:
Index(['cases_date', 'Lat', 'Long', 'Confirmed', 'Deaths', 'Recovered'], dtype='object')
In [92]:
df_by_date[['Confirmed', 'Deaths', 'Recovered']].plot(kind='line',figsize=(20,10))
Out[92]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8148b40bd0>

 

You can check out the entire video here


Thanks for your time

Jesus Saves

By Jesse E.Agbe(JCharis)

 

 

 

2 thoughts on “Data Analysis of Coronavirus Outbreak with Python and Geopandas”

Leave a Comment

Your email address will not be published. Required fields are marked *