Data Analysis of Covid19 using Julia

In this tutorial we will be doing a simple data analysis of the coronavirus outbreak or pandemic using the Julia Programming language.

We will start with fetching the dataset and then do a simple data preparation before continuing with our data analysis. We will be fetching our dataset directly from github( John Hopkins Repo) .

The following packages will useful for our analysis

  • CSV.jl
  • DataFrames.jl
  • Plots.jl and StatsPlots.jl
  • PyCall
  • DataStructures.jl
  • Gadfly
  • etc

To install a package in Julia, you can either use the Package Mode from the REPL by typing ] on the REPL to switch to the Package Mode.

Alternatively you can use the inbuilt Pkg package to do the same

using Pkg
Pkg.add("CSV")

Let us start with our task

In [1]:
# URL
confirmed_cases_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
recovered_cases_url ="https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv"
death_cases_url ="https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"
Out[1]:
"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"
In [ ]:
# Installing Pkgs
using Pkg
Pkg.add("DataFrames")
In [2]:
# Load Pkgs
using CSV,DataFrames
In [3]:
# Load Our Dataset
confirmed_df = CSV.read(download(confirmed_cases_url))
recovered_df = CSV.read(download(recovered_cases_url))
death_df = CSV.read(download(death_cases_url))
Out[3]:

264 rows × 89 columns (omitted printing of 84 columns)

Province/State Country/Region Lat Long 1/22/20
String⍰ String Float64 Float64 Int64
1 missing Afghanistan 33.0 65.0 0
2 missing Albania 41.1533 20.1683 0
3 missing Algeria 28.0339 1.6596 0
4 missing Andorra 42.5063 1.5218 0
5 missing Angola -11.2027 17.8739 0
6 missing Antigua and Barbuda 17.0608 -61.7964 0
7 missing Argentina -38.4161 -63.6167 0
8 missing Armenia 40.0691 45.0382 0
9 Australian Capital Territory Australia -35.4735 149.012 0
10 New South Wales Australia -33.8688 151.209 0
11 Northern Territory Australia -12.4634 130.846 0
12 Queensland Australia -28.0167 153.4 0
13 South Australia Australia -34.9285 138.601 0
14 Tasmania Australia -41.4545 145.971 0
15 Victoria Australia -37.8136 144.963 0
16 Western Australia Australia -31.9505 115.861 0
17 missing Austria 47.5162 14.5501 0
18 missing Azerbaijan 40.1431 47.5769 0
19 missing Bahamas 25.0343 -77.3963 0
20 missing Bahrain 26.0275 50.55 0
21 missing Bangladesh 23.685 90.3563 0
22 missing Barbados 13.1939 -59.5432 0
23 missing Belarus 53.7098 27.9534 0
24 missing Belgium 50.8333 4.0 0
25 missing Benin 9.3077 2.3158 0
26 missing Bhutan 27.5142 90.4336 0
27 missing Bolivia -16.2902 -63.5887 0
28 missing Bosnia and Herzegovina 43.9159 17.6791 0
29 missing Brazil -14.235 -51.9253 0
30 missing Brunei 4.5353 114.728 0
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
In [4]:
# First 10 rows
# head() Deprecated
first(confirmed_df,10)
Out[4]:

10 rows × 89 columns (omitted printing of 84 columns)

Province/State Country/Region Lat Long 1/22/20
String⍰ String Float64 Float64 Int64
1 missing Afghanistan 33.0 65.0 0
2 missing Albania 41.1533 20.1683 0
3 missing Algeria 28.0339 1.6596 0
4 missing Andorra 42.5063 1.5218 0
5 missing Angola -11.2027 17.8739 0
6 missing Antigua and Barbuda 17.0608 -61.7964 0
7 missing Argentina -38.4161 -63.6167 0
8 missing Armenia 40.0691 45.0382 0
9 Australian Capital Territory Australia -35.4735 149.012 0
10 New South Wales Australia -33.8688 151.209 0
In [5]:
# last rows
last(confirmed_df,10)
Out[5]:

10 rows × 89 columns (omitted printing of 84 columns)

Province/State Country/Region Lat Long 1/22/20
String⍰ String Float64 Float64 Int64
1 missing Burundi -3.3731 29.9189 0
2 missing Sierra Leone 8.46056 -11.7799 0
3 Bonaire, Sint Eustatius and Saba Netherlands 12.1784 -68.2385 0
4 missing Malawi -13.2543 34.3015 0
5 Falkland Islands (Malvinas) United Kingdom -51.7963 -59.5236 0
6 Saint Pierre and Miquelon France 46.8852 -56.3159 0
7 missing South Sudan 6.877 31.307 0
8 missing Western Sahara 24.2155 -12.8858 0
9 missing Sao Tome and Principe 0.18636 6.61308 0
10 missing Yemen 15.5527 48.5164 0
In [6]:
# Columns
names(confirmed_df)
Out[6]:
89-element Array{Symbol,1}:
 Symbol("Province/State")
 Symbol("Country/Region")
 :Lat
 :Long
 Symbol("1/22/20")
 Symbol("1/23/20")
 Symbol("1/24/20")
 Symbol("1/25/20")
 Symbol("1/26/20")
 Symbol("1/27/20")
 Symbol("1/28/20")
 Symbol("1/29/20")
 Symbol("1/30/20")
 ⋮
 Symbol("4/4/20")
 Symbol("4/5/20")
 Symbol("4/6/20")
 Symbol("4/7/20")
 Symbol("4/8/20")
 Symbol("4/9/20")
 Symbol("4/10/20")
 Symbol("4/11/20")
 Symbol("4/12/20")
 Symbol("4/13/20")
 Symbol("4/14/20")
 Symbol("4/15/20")
In [7]:
size(confirmed_df)
Out[7]:
(264, 89)
In [8]:
size(recovered_df)
Out[8]:
(250, 89)
In [9]:
size(death_df)
Out[9]:
(264, 89)
In [ ]:
# Restructure our DF
In [11]:
#dir()
names(DataFrames,all=true)
Out[11]:
959-element Array{Symbol,1}:
 Symbol("##DataFrame!#113")
 Symbol("##DataFrame!#114")
 Symbol("##DataFrame!#115")
 Symbol("##DataFrame#100")
 Symbol("##DataFrame#103")
 Symbol("##DataFrame#104")
 Symbol("##DataFrame#105")
 Symbol("##DataFrame#108")
 Symbol("##DataFrame#109")
 Symbol("##DataFrame#110")
 Symbol("##DataFrame#111")
 Symbol("##DataFrame#112")
 Symbol("##DataFrame#156")
 ⋮
 :titlecase
 :uncompact
 :unique!
 :unstack
 :update_row_maps!
 :upgrade_scalar
 :uppercase
 :uppercasefirst
 :view
 :without
 :wrap
 :writetable
In [12]:
# Check for melt
:melt in names(DataFrames,all=true)
Out[12]:
true
In [13]:
names(confirmed_df)
Out[13]:
89-element Array{Symbol,1}:
 Symbol("Province/State")
 Symbol("Country/Region")
 :Lat
 :Long
 Symbol("1/22/20")
 Symbol("1/23/20")
 Symbol("1/24/20")
 Symbol("1/25/20")
 Symbol("1/26/20")
 Symbol("1/27/20")
 Symbol("1/28/20")
 Symbol("1/29/20")
 Symbol("1/30/20")
 ⋮
 Symbol("4/4/20")
 Symbol("4/5/20")
 Symbol("4/6/20")
 Symbol("4/7/20")
 Symbol("4/8/20")
 Symbol("4/9/20")
 Symbol("4/10/20")
 Symbol("4/11/20")
 Symbol("4/12/20")
 Symbol("4/13/20")
 Symbol("4/14/20")
 Symbol("4/15/20")
In [15]:
melt(confirmed_df,[Symbol("Province/State"),Symbol("Country/Region"),:Lat,:Long])
┌ Warning: `melt(df::AbstractDataFrame, id_vars; variable_name::Symbol = :variable, value_name::Symbol = :value, view::Bool = false)` is deprecated, use `stack(df, Not(id_vars); variable_name = variable_name, value_name = value_name, view = view)` instead.
│   caller = top-level scope at In[15]:1
└ @ Core In[15]:1
Out[15]:

22,440 rows × 6 columns (omitted printing of 1 columns)

variable value Province/State Country/Region Lat
Symbol Int64 String⍰ String Float64
1 1/22/20 0 missing Afghanistan 33.0
2 1/22/20 0 missing Albania 41.1533
3 1/22/20 0 missing Algeria 28.0339
4 1/22/20 0 missing Andorra 42.5063
5 1/22/20 0 missing Angola -11.2027
6 1/22/20 0 missing Antigua and Barbuda 17.0608
7 1/22/20 0 missing Argentina -38.4161
8 1/22/20 0 missing Armenia 40.0691
9 1/22/20 0 Australian Capital Territory Australia -35.4735
10 1/22/20 0 New South Wales Australia -33.8688
11 1/22/20 0 Northern Territory Australia -12.4634
12 1/22/20 0 Queensland Australia -28.0167
13 1/22/20 0 South Australia Australia -34.9285
14 1/22/20 0 Tasmania Australia -41.4545
15 1/22/20 0 Victoria Australia -37.8136
16 1/22/20 0 Western Australia Australia -31.9505
17 1/22/20 0 missing Austria 47.5162
18 1/22/20 0 missing Azerbaijan 40.1431
19 1/22/20 0 missing Bahamas 25.0343
20 1/22/20 0 missing Bahrain 26.0275
21 1/22/20 0 missing Bangladesh 23.685
22 1/22/20 0 missing Barbados 13.1939
23 1/22/20 0 missing Belarus 53.7098
24 1/22/20 0 missing Belgium 50.8333
25 1/22/20 0 missing Benin 9.3077
26 1/22/20 0 missing Bhutan 27.5142
27 1/22/20 0 missing Bolivia -16.2902
28 1/22/20 0 missing Bosnia and Herzegovina 43.9159
29 1/22/20 0 missing Brazil -14.235
30 1/22/20 0 missing Brunei 4.5353
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
In [16]:
# Keep these as variables
stack(confirmed_df,[Symbol("Province/State"),Symbol("Country/Region"),:Lat,:Long])
Out[16]:

1,056 rows × 87 columns (omitted printing of 81 columns)

variable value 1/22/20 1/23/20 1/24/20 1/25/20
Symbol Any Int64 Int64 Int64 Int64
1 Province/State missing 0 0 0 0
2 Province/State missing 0 0 0 0
3 Province/State missing 0 0 0 0
4 Province/State missing 0 0 0 0
5 Province/State missing 0 0 0 0
6 Province/State missing 0 0 0 0
7 Province/State missing 0 0 0 0
8 Province/State missing 0 0 0 0
9 Province/State Australian Capital Territory 0 0 0 0
10 Province/State New South Wales 0 0 0 0
11 Province/State Northern Territory 0 0 0 0
12 Province/State Queensland 0 0 0 0
13 Province/State South Australia 0 0 0 0
14 Province/State Tasmania 0 0 0 0
15 Province/State Victoria 0 0 0 0
16 Province/State Western Australia 0 0 0 0
17 Province/State missing 0 0 0 0
18 Province/State missing 0 0 0 0
19 Province/State missing 0 0 0 0
20 Province/State missing 0 0 0 0
21 Province/State missing 0 0 0 0
22 Province/State missing 0 0 0 0
23 Province/State missing 0 0 0 0
24 Province/State missing 0 0 0 0
25 Province/State missing 0 0 0 0
26 Province/State missing 0 0 0 0
27 Province/State missing 0 0 0 0
28 Province/State missing 0 0 0 0
29 Province/State missing 0 0 0 0
30 Province/State missing 0 0 0 0
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
In [17]:
# Keep these as variables
stack(confirmed_df,Not([Symbol("Province/State"),Symbol("Country/Region"),:Lat,:Long]))
Out[17]:

22,440 rows × 6 columns (omitted printing of 1 columns)

variable value Province/State Country/Region Lat
Symbol Int64 String⍰ String Float64
1 1/22/20 0 missing Afghanistan 33.0
2 1/22/20 0 missing Albania 41.1533
3 1/22/20 0 missing Algeria 28.0339
4 1/22/20 0 missing Andorra 42.5063
5 1/22/20 0 missing Angola -11.2027
6 1/22/20 0 missing Antigua and Barbuda 17.0608
7 1/22/20 0 missing Argentina -38.4161
8 1/22/20 0 missing Armenia 40.0691
9 1/22/20 0 Australian Capital Territory Australia -35.4735
10 1/22/20 0 New South Wales Australia -33.8688
11 1/22/20 0 Northern Territory Australia -12.4634
12 1/22/20 0 Queensland Australia -28.0167
13 1/22/20 0 South Australia Australia -34.9285
14 1/22/20 0 Tasmania Australia -41.4545
15 1/22/20 0 Victoria Australia -37.8136
16 1/22/20 0 Western Australia Australia -31.9505
17 1/22/20 0 missing Austria 47.5162
18 1/22/20 0 missing Azerbaijan 40.1431
19 1/22/20 0 missing Bahamas 25.0343
20 1/22/20 0 missing Bahrain 26.0275
21 1/22/20 0 missing Bangladesh 23.685
22 1/22/20 0 missing Barbados 13.1939
23 1/22/20 0 missing Belarus 53.7098
24 1/22/20 0 missing Belgium 50.8333
25 1/22/20 0 missing Benin 9.3077
26 1/22/20 0 missing Bhutan 27.5142
27 1/22/20 0 missing Bolivia -16.2902
28 1/22/20 0 missing Bosnia and Herzegovina 43.9159
29 1/22/20 0 missing Brazil -14.235
30 1/22/20 0 missing Brunei 4.5353
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
In [19]:
df_confirmed = stack(confirmed_df,Not([Symbol("Province/State"),Symbol("Country/Region"),:Lat,:Long]));
In [20]:
df_recovered = stack(recovered_df,Not([Symbol("Province/State"),Symbol("Country/Region"),:Lat,:Long]));
df_death = stack(death_df,Not([Symbol("Province/State"),Symbol("Country/Region"),:Lat,:Long]));
In [21]:
head(df_confirmed)
┌ Warning: `head(df::AbstractDataFrame)` is deprecated, use `first(df, 6)` instead.
│   caller = top-level scope at In[21]:1
└ @ Core In[21]:1
Out[21]:

6 rows × 6 columns

variable value Province/State Country/Region Lat Long
Symbol Int64 String⍰ String Float64 Float64
1 1/22/20 0 missing Afghanistan 33.0 65.0
2 1/22/20 0 missing Albania 41.1533 20.1683
3 1/22/20 0 missing Algeria 28.0339 1.6596
4 1/22/20 0 missing Andorra 42.5063 1.5218
5 1/22/20 0 missing Angola -11.2027 17.8739
6 1/22/20 0 missing Antigua and Barbuda 17.0608 -61.7964
In [22]:
first(df_confirmed,10)
Out[22]:

10 rows × 6 columns

variable value Province/State Country/Region Lat Long
Symbol Int64 String⍰ String Float64 Float64
1 1/22/20 0 missing Afghanistan 33.0 65.0
2 1/22/20 0 missing Albania 41.1533 20.1683
3 1/22/20 0 missing Algeria 28.0339 1.6596
4 1/22/20 0 missing Andorra 42.5063 1.5218
5 1/22/20 0 missing Angola -11.2027 17.8739
6 1/22/20 0 missing Antigua and Barbuda 17.0608 -61.7964
7 1/22/20 0 missing Argentina -38.4161 -63.6167
8 1/22/20 0 missing Armenia 40.0691 45.0382
9 1/22/20 0 Australian Capital Territory Australia -35.4735 149.012
10 1/22/20 0 New South Wales Australia -33.8688 151.209
In [23]:
size(df_confirmed)
Out[23]:
(22440, 6)
In [24]:
size(df_recovered)
Out[24]:
(21250, 6)
In [26]:
size(df_death)
Out[26]:
(22440, 6)
In [27]:
names(df_confirmed)
Out[27]:
6-element Array{Symbol,1}:
 :variable
 :value
 Symbol("Province/State")
 Symbol("Country/Region")
 :Lat
 :Long
In [28]:
# Renaming
rename!(df_confirmed,Dict(:variable => :Dates,:value => :Confirmed))
Out[28]:

22,440 rows × 6 columns (omitted printing of 1 columns)

Dates Confirmed Province/State Country/Region Lat
Symbol Int64 String⍰ String Float64
1 1/22/20 0 missing Afghanistan 33.0
2 1/22/20 0 missing Albania 41.1533
3 1/22/20 0 missing Algeria 28.0339
4 1/22/20 0 missing Andorra 42.5063
5 1/22/20 0 missing Angola -11.2027
6 1/22/20 0 missing Antigua and Barbuda 17.0608
7 1/22/20 0 missing Argentina -38.4161
8 1/22/20 0 missing Armenia 40.0691
9 1/22/20 0 Australian Capital Territory Australia -35.4735
10 1/22/20 0 New South Wales Australia -33.8688
11 1/22/20 0 Northern Territory Australia -12.4634
12 1/22/20 0 Queensland Australia -28.0167
13 1/22/20 0 South Australia Australia -34.9285
14 1/22/20 0 Tasmania Australia -41.4545
15 1/22/20 0 Victoria Australia -37.8136
16 1/22/20 0 Western Australia Australia -31.9505
17 1/22/20 0 missing Austria 47.5162
18 1/22/20 0 missing Azerbaijan 40.1431
19 1/22/20 0 missing Bahamas 25.0343
20 1/22/20 0 missing Bahrain 26.0275
21 1/22/20 0 missing Bangladesh 23.685
22 1/22/20 0 missing Barbados 13.1939
23 1/22/20 0 missing Belarus 53.7098
24 1/22/20 0 missing Belgium 50.8333
25 1/22/20 0 missing Benin 9.3077
26 1/22/20 0 missing Bhutan 27.5142
27 1/22/20 0 missing Bolivia -16.2902
28 1/22/20 0 missing Bosnia and Herzegovina 43.9159
29 1/22/20 0 missing Brazil -14.235
30 1/22/20 0 missing Brunei 4.5353
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
In [30]:
# Renaming
rename!(df_recovered,Dict(:variable => :Dates,:value => :Recovered));
rename!(df_death,Dict(:variable => :Dates,:value => :Deaths));
In [31]:
df_recovered
Out[31]:

21,250 rows × 6 columns (omitted printing of 1 columns)

Dates Recovered Province/State Country/Region Lat
Symbol Int64 String⍰ String Float64
1 1/22/20 0 missing Afghanistan 33.0
2 1/22/20 0 missing Albania 41.1533
3 1/22/20 0 missing Algeria 28.0339
4 1/22/20 0 missing Andorra 42.5063
5 1/22/20 0 missing Angola -11.2027
6 1/22/20 0 missing Antigua and Barbuda 17.0608
7 1/22/20 0 missing Argentina -38.4161
8 1/22/20 0 missing Armenia 40.0691
9 1/22/20 0 Australian Capital Territory Australia -35.4735
10 1/22/20 0 New South Wales Australia -33.8688
11 1/22/20 0 Northern Territory Australia -12.4634
12 1/22/20 0 Queensland Australia -28.0167
13 1/22/20 0 South Australia Australia -34.9285
14 1/22/20 0 Tasmania Australia -41.4545
15 1/22/20 0 Victoria Australia -37.8136
16 1/22/20 0 Western Australia Australia -31.9505
17 1/22/20 0 missing Austria 47.5162
18 1/22/20 0 missing Azerbaijan 40.1431
19 1/22/20 0 missing Bahamas 25.0343
20 1/22/20 0 missing Bahrain 26.0275
21 1/22/20 0 missing Bangladesh 23.685
22 1/22/20 0 missing Barbados 13.1939
23 1/22/20 0 missing Belarus 53.7098
24 1/22/20 0 missing Belgium 50.8333
25 1/22/20 0 missing Belize 13.1939
26 1/22/20 0 missing Benin 9.3077
27 1/22/20 0 missing Bhutan 27.5142
28 1/22/20 0 missing Bolivia -16.2902
29 1/22/20 0 missing Bosnia and Herzegovina 43.9159
30 1/22/20 0 missing Brazil -14.235
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
In [37]:
# Joining or Merging
df = join(df_confirmed,df_death[!,[:Deaths,Symbol("Country/Region")]],on =Symbol("Country/Region"))
Out[37]:

13,236,200 rows × 7 columns

Dates Confirmed Province/State Country/Region Lat Long Deaths
Symbol Int64 String⍰ String Float64 Float64 Int64
1 1/22/20 0 missing Afghanistan 33.0 65.0 0
2 1/22/20 0 missing Afghanistan 33.0 65.0 0
3 1/22/20 0 missing Afghanistan 33.0 65.0 0
4 1/22/20 0 missing Afghanistan 33.0 65.0 0
5 1/22/20 0 missing Afghanistan 33.0 65.0 0
6 1/22/20 0 missing Afghanistan 33.0 65.0 0
7 1/22/20 0 missing Afghanistan 33.0 65.0 0
8 1/22/20 0 missing Afghanistan 33.0 65.0 0
9 1/22/20 0 missing Afghanistan 33.0 65.0 0
10 1/22/20 0 missing Afghanistan 33.0 65.0 0
11 1/22/20 0 missing Afghanistan 33.0 65.0 0
12 1/22/20 0 missing Afghanistan 33.0 65.0 0
13 1/22/20 0 missing Afghanistan 33.0 65.0 0
14 1/22/20 0 missing Afghanistan 33.0 65.0 0
15 1/22/20 0 missing Afghanistan 33.0 65.0 0
16 1/22/20 0 missing Afghanistan 33.0 65.0 0
17 1/22/20 0 missing Afghanistan 33.0 65.0 0
18 1/22/20 0 missing Afghanistan 33.0 65.0 0
19 1/22/20 0 missing Afghanistan 33.0 65.0 0
20 1/22/20 0 missing Afghanistan 33.0 65.0 0
21 1/22/20 0 missing Afghanistan 33.0 65.0 0
22 1/22/20 0 missing Afghanistan 33.0 65.0 0
23 1/22/20 0 missing Afghanistan 33.0 65.0 0
24 1/22/20 0 missing Afghanistan 33.0 65.0 0
25 1/22/20 0 missing Afghanistan 33.0 65.0 0
26 1/22/20 0 missing Afghanistan 33.0 65.0 0
27 1/22/20 0 missing Afghanistan 33.0 65.0 0
28 1/22/20 0 missing Afghanistan 33.0 65.0 0
29 1/22/20 0 missing Afghanistan 33.0 65.0 0
30 1/22/20 0 missing Afghanistan 33.0 65.0 0
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
In [38]:
# Save DF
CSV.write("covid_current_cases_dataset.csv",df)
Out[38]:
"covid_current_cases_dataset.csv"

Analysis

  • Number of Case Per Country
  • Per day
  • Top countries affected
  • Number of Countries affected
In [39]:
# Number of Countries affected
first(df,10)
Out[39]:

10 rows × 7 columns

Dates Confirmed Province/State Country/Region Lat Long Deaths
Symbol Int64 String⍰ String Float64 Float64 Int64
1 1/22/20 0 missing Afghanistan 33.0 65.0 0
2 1/22/20 0 missing Afghanistan 33.0 65.0 0
3 1/22/20 0 missing Afghanistan 33.0 65.0 0
4 1/22/20 0 missing Afghanistan 33.0 65.0 0
5 1/22/20 0 missing Afghanistan 33.0 65.0 0
6 1/22/20 0 missing Afghanistan 33.0 65.0 0
7 1/22/20 0 missing Afghanistan 33.0 65.0 0
8 1/22/20 0 missing Afghanistan 33.0 65.0 0
9 1/22/20 0 missing Afghanistan 33.0 65.0 0
10 1/22/20 0 missing Afghanistan 33.0 65.0 0
In [41]:
unique(df[!,Symbol("Country/Region")])
Out[41]:
185-element Array{String,1}:
 "Afghanistan"
 "Albania"
 "Algeria"
 "Andorra"
 "Angola"
 "Antigua and Barbuda"
 "Argentina"
 "Armenia"
 "Australia"
 "Austria"
 "Azerbaijan"
 "Bahamas"
 "Bahrain"
 ⋮
 "Saint Kitts and Nevis"
 "Kosovo"
 "Burma"
 "MS Zaandam"
 "Botswana"
 "Burundi"
 "Sierra Leone"
 "Malawi"
 "South Sudan"
 "Western Sahara"
 "Sao Tome and Principe"
 "Yemen"
In [42]:
# Number 
length(unique(df[!,Symbol("Country/Region")]))
Out[42]:
185
In [43]:
# Number of Cases Per Country
by(df,Symbol("Country/Region"),counts=:Confirmed => sum)
Out[43]:

185 rows × 2 columns

Country/Region counts
String Int64
1 Afghanistan 699380
2 Albania 685610
3 Algeria 2400825
4 Andorra 938910
5 Angola 23885
6 Antigua and Barbuda 27540
7 Argentina 2770575
8 Armenia 1469055
9 Australia 84688560
10 Austria 23549080
11 Azerbaijan 1197310
12 Bahamas 54910
13 Bahrain 1882070
14 Bangladesh 534055
15 Barbados 99365
16 Belarus 2049605
17 Belgium 37760910
18 Benin 39270
19 Bhutan 10200
20 Bolivia 350455
21 Bosnia and Herzegovina 1263185
22 Brazil 24019640
23 Brunei 315520
24 Bulgaria 1056295
25 Burkina Faso 682720
26 Cabo Verde 17765
27 Cambodia 256020
28 Cameroon 941120
29 Canada 432352500
30 Central African Republic 13685
⋮ ⋮ ⋮
In [44]:
# Number of Cases Per Country with Max
by(df,Symbol("Country/Region"),counts=:Confirmed => maximum)
Out[44]:

185 rows × 2 columns

Country/Region counts
String Int64
1 Afghanistan 784
2 Albania 494
3 Algeria 2160
4 Andorra 673
5 Angola 19
6 Antigua and Barbuda 23
7 Argentina 2443
8 Armenia 1111
9 Australia 2886
10 Austria 14336
11 Azerbaijan 1253
12 Bahamas 49
13 Bahrain 1671
14 Bangladesh 1231
15 Barbados 73
16 Belarus 3728
17 Belgium 33573
18 Benin 35
19 Bhutan 5
20 Bolivia 397
21 Bosnia and Herzegovina 1110
22 Brazil 28320
23 Brunei 136
24 Bulgaria 747
25 Burkina Faso 542
26 Cabo Verde 56
27 Cambodia 122
28 Cameroon 848
29 Canada 14860
30 Central African Republic 12
⋮ ⋮ ⋮
In [45]:
cases_per_countries = by(df,Symbol("Country/Region"),counts=:Confirmed => maximum)
Out[45]:

185 rows × 2 columns

Country/Region counts
String Int64
1 Afghanistan 784
2 Albania 494
3 Algeria 2160
4 Andorra 673
5 Angola 19
6 Antigua and Barbuda 23
7 Argentina 2443
8 Armenia 1111
9 Australia 2886
10 Austria 14336
11 Azerbaijan 1253
12 Bahamas 49
13 Bahrain 1671
14 Bangladesh 1231
15 Barbados 73
16 Belarus 3728
17 Belgium 33573
18 Benin 35
19 Bhutan 5
20 Bolivia 397
21 Bosnia and Herzegovina 1110
22 Brazil 28320
23 Brunei 136
24 Bulgaria 747
25 Burkina Faso 542
26 Cabo Verde 56
27 Cambodia 122
28 Cameroon 848
29 Canada 14860
30 Central African Republic 12
⋮ ⋮ ⋮
In [46]:
names(cases_per_countries)
Out[46]:
2-element Array{Symbol,1}:
 Symbol("Country/Region")
 :counts
In [62]:
# Top Countries Affected
sort(cases_per_countries,:counts,rev=true)
Out[62]:

185 rows × 2 columns

Country/Region counts
String Int64
1 US 636350
2 Spain 177644
3 Italy 165155
4 France 136779
5 Germany 134753
6 United Kingdom 98476
7 Iran 76389
8 Turkey 69392
9 China 67803
10 Belgium 33573
11 Brazil 28320
12 Netherlands 28153
13 Switzerland 26336
14 Russia 24490
15 Portugal 18091
16 Canada 14860
17 Austria 14336
18 Ireland 12547
19 Israel 12501
20 India 12322
21 Sweden 11927
22 Peru 11475
23 Korea, South 10591
24 Chile 8273
25 Japan 8100
26 Ecuador 7858
27 Poland 7582
28 Romania 7216
29 Norway 6740
30 Denmark 6681
⋮ ⋮ ⋮
In [69]:
sort(cases_per_countries,:counts,rev=true)[1:10,:]
Out[69]:

10 rows × 2 columns

Country/Region counts
String Int64
1 US 636350
2 Spain 177644
3 Italy 165155
4 France 136779
5 Germany 134753
6 United Kingdom 98476
7 Iran 76389
8 Turkey 69392
9 China 67803
10 Belgium 33573
In [70]:
# Group by Day
df
Out[70]:

13,236,200 rows × 7 columns

Dates Confirmed Province/State Country/Region Lat Long Deaths
Symbol Int64 String⍰ String Float64 Float64 Int64
1 1/22/20 0 missing Afghanistan 33.0 65.0 0
2 1/22/20 0 missing Afghanistan 33.0 65.0 0
3 1/22/20 0 missing Afghanistan 33.0 65.0 0
4 1/22/20 0 missing Afghanistan 33.0 65.0 0
5 1/22/20 0 missing Afghanistan 33.0 65.0 0
6 1/22/20 0 missing Afghanistan 33.0 65.0 0
7 1/22/20 0 missing Afghanistan 33.0 65.0 0
8 1/22/20 0 missing Afghanistan 33.0 65.0 0
9 1/22/20 0 missing Afghanistan 33.0 65.0 0
10 1/22/20 0 missing Afghanistan 33.0 65.0 0
11 1/22/20 0 missing Afghanistan 33.0 65.0 0
12 1/22/20 0 missing Afghanistan 33.0 65.0 0
13 1/22/20 0 missing Afghanistan 33.0 65.0 0
14 1/22/20 0 missing Afghanistan 33.0 65.0 0
15 1/22/20 0 missing Afghanistan 33.0 65.0 0
16 1/22/20 0 missing Afghanistan 33.0 65.0 0
17 1/22/20 0 missing Afghanistan 33.0 65.0 0
18 1/22/20 0 missing Afghanistan 33.0 65.0 0
19 1/22/20 0 missing Afghanistan 33.0 65.0 0
20 1/22/20 0 missing Afghanistan 33.0 65.0 0
21 1/22/20 0 missing Afghanistan 33.0 65.0 0
22 1/22/20 0 missing Afghanistan 33.0 65.0 0
23 1/22/20 0 missing Afghanistan 33.0 65.0 0
24 1/22/20 0 missing Afghanistan 33.0 65.0 0
25 1/22/20 0 missing Afghanistan 33.0 65.0 0
26 1/22/20 0 missing Afghanistan 33.0 65.0 0
27 1/22/20 0 missing Afghanistan 33.0 65.0 0
28 1/22/20 0 missing Afghanistan 33.0 65.0 0
29 1/22/20 0 missing Afghanistan 33.0 65.0 0
30 1/22/20 0 missing Afghanistan 33.0 65.0 0
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
In [71]:
# Number of Cases Per Day with Max
by(df,:Dates,counts=:Confirmed => maximum)
Out[71]:

85 rows × 2 columns

Dates counts
Symbol Int64
1 1/22/20 444
2 1/23/20 444
3 1/24/20 549
4 1/25/20 761
5 1/26/20 1058
6 1/27/20 1423
7 1/28/20 3554
8 1/29/20 3554
9 1/30/20 4903
10 1/31/20 5806
11 2/1/20 7153
12 2/2/20 11177
13 2/3/20 13522
14 2/4/20 16678
15 2/5/20 19665
16 2/6/20 22112
17 2/7/20 24953
18 2/8/20 27100
19 2/9/20 29631
20 2/10/20 31728
21 2/11/20 33366
22 2/12/20 33366
23 2/13/20 48206
24 2/14/20 54406
25 2/15/20 56249
26 2/16/20 58182
27 2/17/20 59989
28 2/18/20 61682
29 2/19/20 62031
30 2/20/20 62442
⋮ ⋮ ⋮
In [73]:
# Number of Cases Per Day with Max
cases_per_dates = by(df,:Dates,counts=:Confirmed => maximum)
Out[73]:

85 rows × 2 columns

Dates counts
Symbol Int64
1 1/22/20 444
2 1/23/20 444
3 1/24/20 549
4 1/25/20 761
5 1/26/20 1058
6 1/27/20 1423
7 1/28/20 3554
8 1/29/20 3554
9 1/30/20 4903
10 1/31/20 5806
11 2/1/20 7153
12 2/2/20 11177
13 2/3/20 13522
14 2/4/20 16678
15 2/5/20 19665
16 2/6/20 22112
17 2/7/20 24953
18 2/8/20 27100
19 2/9/20 29631
20 2/10/20 31728
21 2/11/20 33366
22 2/12/20 33366
23 2/13/20 48206
24 2/14/20 54406
25 2/15/20 56249
26 2/16/20 58182
27 2/17/20 59989
28 2/18/20 61682
29 2/19/20 62031
30 2/20/20 62442
⋮ ⋮ ⋮
In [74]:
sort(cases_per_dates,:counts,rev=true)[1:10,:]
Out[74]:

10 rows × 2 columns

Dates counts
Symbol Int64
1 4/15/20 636350
2 4/14/20 607670
3 4/13/20 580619
4 4/12/20 555313
5 4/11/20 526396
6 4/10/20 496535
7 4/9/20 461437
8 4/8/20 429052
9 4/7/20 396223
10 4/6/20 366667

Data Visualization

  • Plots
  • StatsPlots
  • Gadfly
  • Plotly
In [75]:
using Plots, StatsPlots
In [76]:
@df cases_per_dates plot(x=:Dates,:counts)
Out[76]:
SVG Image
In [77]:
# Value Counts
value_counts = by(df,Symbol("Country/Region"),nrow)
Out[77]:

185 rows × 2 columns

Country/Region x1
String Int64
1 Afghanistan 7225
2 Albania 7225
3 Algeria 7225
4 Andorra 7225
5 Angola 7225
6 Antigua and Barbuda 7225
7 Argentina 7225
8 Armenia 7225
9 Australia 462400
10 Austria 7225
11 Azerbaijan 7225
12 Bahamas 7225
13 Bahrain 7225
14 Bangladesh 7225
15 Barbados 7225
16 Belarus 7225
17 Belgium 7225
18 Benin 7225
19 Bhutan 7225
20 Bolivia 7225
21 Bosnia and Herzegovina 7225
22 Brazil 7225
23 Brunei 7225
24 Bulgaria 7225
25 Burkina Faso 7225
26 Cabo Verde 7225
27 Cambodia 7225
28 Cameroon 7225
29 Canada 1625625
30 Central African Republic 7225
⋮ ⋮ ⋮
In [79]:
@df value_counts bar(x=Symbol("Country/Region"),:x1)
Out[79]:
SVG Image
In [80]:
cases_per_countries
Out[80]:

185 rows × 2 columns

Country/Region counts
String Int64
1 Afghanistan 784
2 Albania 494
3 Algeria 2160
4 Andorra 673
5 Angola 19
6 Antigua and Barbuda 23
7 Argentina 2443
8 Armenia 1111
9 Australia 2886
10 Austria 14336
11 Azerbaijan 1253
12 Bahamas 49
13 Bahrain 1671
14 Bangladesh 1231
15 Barbados 73
16 Belarus 3728
17 Belgium 33573
18 Benin 35
19 Bhutan 5
20 Bolivia 397
21 Bosnia and Herzegovina 1110
22 Brazil 28320
23 Brunei 136
24 Bulgaria 747
25 Burkina Faso 542
26 Cabo Verde 56
27 Cambodia 122
28 Cameroon 848
29 Canada 14860
30 Central African Republic 12
⋮ ⋮ ⋮
In [81]:
# Bar Chary
@df cases_per_countries bar(x=Symbol("Country/Region"),:counts)
Out[81]:
SVG Image
In [82]:
# Pie chart
@df cases_per_countries pie(x=Symbol("Country/Region"),:counts)
Out[82]:
SVG Image
In [83]:
using DataStructures
WARNING: using DataStructures.head in module Main conflicts with an existing identifier.
In [84]:
DataStructures.counter(df[!,Symbol("Country/Region")])
Out[84]:
Accumulator{String,Int64} with 185 entries:
  "Peru"               => 7225
  "Indonesia"          => 7225
  "Gabon"              => 7225
  "North Macedonia"    => 7225
  "Bangladesh"         => 7225
  "Kosovo"             => 7225
  "Ethiopia"           => 7225
  "Dominican Republic" => 7225
  "Vietnam"            => 7225
  "South Sudan"        => 7225
  "Morocco"            => 7225
  "Libya"              => 7225
  "US"                 => 7225
  "Sierra Leone"       => 7225
  "Serbia"             => 7225
  "Malaysia"           => 7225
  "Mali"               => 7225
  "West Bank and Gaza" => 7225
  "Western Sahara"     => 7225
  "Russia"             => 7225
  "Mongolia"           => 7225
  "Tunisia"            => 7225
  "Kuwait"             => 7225
  "Eswatini"           => 7225
  "Cuba"               => 7225
  ⋮                    => ⋮

 

You can check out the video tutorial below

 

Thanks For Your Time

Jesus Saves

By. Jesse E.Agbe(JCharis)

 

Leave a Comment

Your email address will not be published. Required fields are marked *