Project: Analysis of World Happiness Reports (2015-2019)

Author: Robert Zacchigna

Table of Contents

Problem Statement:

The citizens of the world are vast and diverse across the 150+ plus countries on the planet and thus the perceptions of one countries citizens to another can vary greatly. The World Happiness Report aimed to collect and quantify this information to see what people around the world think of their country and the direction it might be going in. This report has not been without controversy, specifically the metrics being measured are debated on being are skewed a particular direction that puts other countries at a disadvantage or misrepresents the citizen's true feelings of their country.

Proposal:

A detailed analysis of the World Happiness Reports from 2015-2019 to see what makes citizens happy with their country and what are the major contributors of that happiness. Along with this, analyze the metrics to see if the criticism about the measured metrics hold true for the happiness reports. This will be done by analyzing their relationship to the overall happiness score (which determines a countries ranking in the report) and plotting the data on geographic maps to bring everything into a single view to see how the data looks from a holistic perspective. This would hopefully expose trends between countries and make it easier to see not only what direction a country might be heading but what they might be lacking for their citizens.

Dataset - World Happiness Reports (2015-2019)

Download Location: https://www.kaggle.com/unsdsn/world-happiness

Columns:

  • Country – Name of the Country
  • Region – Region the country belongs to
  • Happiness Rank – Rank of the country based on the Happiness Score
  • Happiness Score - A metric measured in 2015 by asking the sampled people the question: "How would you rate your happiness on a scale of 0 to 10 where 10 is the happiest."
  • Standard Error – The standard error of the happiness score
  • Economy (GDP per Capita) – The extent to which GDP contributes to the calculation of the Happiness Score
  • Family - The extent to which Family contributes to the calculation of the Happiness Score
  • Health (Life Expectancy) – The extent to which Life expectancy contributed to the calculation of the Happiness Score
  • Freedom – The extent to which Freedom contributed to the calculation of the Happiness Score
  • Trust (Government Corruption) – The extent to which Perception of Corruption contributes to Happiness Score.

Imports

In [1]:
import ssl
import warnings
import pycountry
import numpy as np
import pandas as pd
import seaborn as sb
import pandas_profiling as pp

from notebook import __version__ as nbv

# Basemap
from mpl_toolkits.basemap import Basemap
from mpl_toolkits.basemap import __version__ as basev

# scipy Libraries
from scipy.stats import norm, stats
from scipy import __version__ as scipv

# matplotlib Libraries
import matplotlib.pyplot as plt
from matplotlib import __version__ as mpv

# plotly Libraries
import plotly.express as px
import plotly.graph_objects as go
from plotly import __version__ as pvm

# Library Versions
lib_info = [('ssl', ssl.OPENSSL_VERSION.split(' ')[1]), ('scipy', scipv), ('numpy', np.__version__), 
            ('pandas', pd.__version__),('plotly', pvm), ('seaborn', sb.__version__), 
            ('pycountry', pycountry.__version__), ('matplotlib', mpv),('pandas_profiling', pp.__version__), 
            ('mpl_toolkits.basemap', basev), ('Jupyter Notebook (notebook)', nbv)]

print('Library Versions\n' + '='*16)

for name, vers in lib_info:
    print('{:>27} = {}'.format(name, vers))
Library Versions
================
                        ssl = 1.1.1d
                      scipy = 1.6.0
                      numpy = 1.19.5
                     pandas = 1.3.3
                     plotly = 4.14.3
                    seaborn = 0.11.1
                  pycountry = 20.7.3
                 matplotlib = 3.3.4
           pandas_profiling = 2.10.0
       mpl_toolkits.basemap = 1.2.2+dev
Jupyter Notebook (notebook) = 6.4.4

Part 1: Exploratory Data Analysis and Data Preprocessing

Step 1: Load Datasets

In [2]:
rep2015 = pd.read_csv('Report_Data/2015.csv')
rep2016 = pd.read_csv('Report_Data/2016.csv')
rep2017 = pd.read_csv('Report_Data/2017.csv')
rep2018 = pd.read_csv('Report_Data/2018.csv')
rep2019 = pd.read_csv('Report_Data/2019.csv')

Step 2: Datasets Dimensions and Heads

2015 Report

In [3]:
print("Dataset Dimensions: {:,} columns and {:,} rows".format(rep2015.shape[1], rep2015.shape[0]))

rep2015.head()
Dataset Dimensions: 12 columns and 158 rows
Out[3]:
Country Region Happiness Rank Happiness Score Standard Error Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual
0 Switzerland Western Europe 1 7.587 0.03411 1.39651 1.34951 0.94143 0.66557 0.41978 0.29678 2.51738
1 Iceland Western Europe 2 7.561 0.04884 1.30232 1.40223 0.94784 0.62877 0.14145 0.43630 2.70201
2 Denmark Western Europe 3 7.527 0.03328 1.32548 1.36058 0.87464 0.64938 0.48357 0.34139 2.49204
3 Norway Western Europe 4 7.522 0.03880 1.45900 1.33095 0.88521 0.66973 0.36503 0.34699 2.46531
4 Canada North America 5 7.427 0.03553 1.32629 1.32261 0.90563 0.63297 0.32957 0.45811 2.45176

2016 Report

In [4]:
print("Dataset Dimensions: {:,} columns and {:,} rows".format(rep2016.shape[1], rep2016.shape[0]))

rep2016.head()
Dataset Dimensions: 13 columns and 157 rows
Out[4]:
Country Region Happiness Rank Happiness Score Lower Confidence Interval Upper Confidence Interval Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual
0 Denmark Western Europe 1 7.526 7.460 7.592 1.44178 1.16374 0.79504 0.57941 0.44453 0.36171 2.73939
1 Switzerland Western Europe 2 7.509 7.428 7.590 1.52733 1.14524 0.86303 0.58557 0.41203 0.28083 2.69463
2 Iceland Western Europe 3 7.501 7.333 7.669 1.42666 1.18326 0.86733 0.56624 0.14975 0.47678 2.83137
3 Norway Western Europe 4 7.498 7.421 7.575 1.57744 1.12690 0.79579 0.59609 0.35776 0.37895 2.66465
4 Finland Western Europe 5 7.413 7.351 7.475 1.40598 1.13464 0.81091 0.57104 0.41004 0.25492 2.82596

2017 Report

In [5]:
print("Dataset Dimensions: {:,} columns and {:,} rows".format(rep2017.shape[1], rep2017.shape[0]))

rep2017.head()
Dataset Dimensions: 12 columns and 155 rows
Out[5]:
Country Happiness.Rank Happiness.Score Whisker.high Whisker.low Economy..GDP.per.Capita. Family Health..Life.Expectancy. Freedom Generosity Trust..Government.Corruption. Dystopia.Residual
0 Norway 1 7.537 7.594445 7.479556 1.616463 1.533524 0.796667 0.635423 0.362012 0.315964 2.277027
1 Denmark 2 7.522 7.581728 7.462272 1.482383 1.551122 0.792566 0.626007 0.355280 0.400770 2.313707
2 Iceland 3 7.504 7.622030 7.385970 1.480633 1.610574 0.833552 0.627163 0.475540 0.153527 2.322715
3 Switzerland 4 7.494 7.561772 7.426227 1.564980 1.516912 0.858131 0.620071 0.290549 0.367007 2.276716
4 Finland 5 7.469 7.527542 7.410458 1.443572 1.540247 0.809158 0.617951 0.245483 0.382612 2.430182

2018 Report

In [6]:
print("Dataset Dimensions: {:,} columns and {:,} rows".format(rep2018.shape[1], rep2018.shape[0]))

rep2018.head()
Dataset Dimensions: 9 columns and 156 rows
Out[6]:
Overall rank Country or region Score GDP per capita Social support Healthy life expectancy Freedom to make life choices Generosity Perceptions of corruption
0 1 Finland 7.632 1.305 1.592 0.874 0.681 0.202 0.393
1 2 Norway 7.594 1.456 1.582 0.861 0.686 0.286 0.340
2 3 Denmark 7.555 1.351 1.590 0.868 0.683 0.284 0.408
3 4 Iceland 7.495 1.343 1.644 0.914 0.677 0.353 0.138
4 5 Switzerland 7.487 1.420 1.549 0.927 0.660 0.256 0.357

2019 Report

In [7]:
print("Dataset Dimensions: {:,} columns and {:,} rows".format(rep2019.shape[1], rep2019.shape[0]))

rep2019.head()
Dataset Dimensions: 9 columns and 156 rows
Out[7]:
Overall rank Country or region Score GDP per capita Social support Healthy life expectancy Freedom to make life choices Generosity Perceptions of corruption
0 1 Finland 7.769 1.340 1.587 0.986 0.596 0.153 0.393
1 2 Denmark 7.600 1.383 1.573 0.996 0.592 0.252 0.410
2 3 Norway 7.554 1.488 1.582 1.028 0.603 0.271 0.341
3 4 Iceland 7.494 1.380 1.624 1.026 0.591 0.354 0.118
4 5 Netherlands 7.488 1.396 1.522 0.999 0.557 0.322 0.298

From the heads of the various datasets above, we can see that none of them are in the same format, specially their column names. In order to combine all of the datasets correctly they will need to be parsed and remapped accordingly.

Step 3: Parse and Combine Datasets

Columns starting with Happiness, Whisker and the Dystopia.Residual are the targets, just differently named targets. Dystopia Residual compares each countries scores to the theoretical unhappiest country in the world. Since the data from the different report years have different naming conventions, a common name will need to be abstracted in order to combine them all correctly.

In [8]:
# This function takes the relevant report dataset and 
# year in order to parse the data into a usable format
def parse_report(report_df, year):
    
    # Rename columns of reports 2018 and 2019 to match 
    # that of the earlier reports (2015, 2016, 2017)
    if 2017 < year < 2020:
        report_df.rename(columns={'Overall rank': 'Happiness Rank', 'Country or region': 'Country',
                                  'Score': 'Happiness Score', 'GDP per capita': 'Economy (GDP per Capita)', 
                                  'Social support': 'Family', 'Healthy life expectancy': 'Health (Life Expectancy)', 
                                  'Freedom to make life choices': 'Freedom', 
                                  'Perceptions of corruption': 'Trust (Government Corruption)'}, inplace=True)
    
    targets = ['Low', 'Low-Mid', 'Top-Mid', 'Top']
    df_cols = ['Country', 'Rank', 'GDP', 'Family', 'Health', 'Freedom', 'Generosity', 'Trust']
    
    # Load report data into common columns
    target_cols = []
    for col in df_cols:
        target_cols.extend([new_col for new_col in report_df.columns if col in new_col])
    
    df = pd.DataFrame()
    df[df_cols] = report_df[target_cols]
    df['Happiness Score'] = report_df[[col for col in report_df.columns if 'Score' in col]]
    
    # Calculate quartiles on the data.
    df["Target"] = pd.qcut(df[df.columns[-1]], len(targets), labels=targets)
    df["Target_n"] = pd.qcut(df[df.columns[-2]], len(targets), labels=range(len(targets)))
    
    # Insert Year column
    df.insert(1, 'Year', pd.Series([year] * len(report_df)))
    
    return df

Combine Datasets

In [9]:
report_data = parse_report(rep2015, 2015)

for repData, year in [(rep2016, 2016), (rep2017.round(5), 2017), (rep2018, 2018), (rep2019, 2019)]:
    report_data = report_data.append(parse_report(repData, year), sort=False)

report_data = report_data.reset_index(drop=True)

Rename Columns and Fix Misc. Country Names to be Consistent

In [10]:
fix_names = [('Taiwan Province of China', 'Taiwan'), ('Macedonia', 'North Macedonia'), 
             ('Hong Kong S.A.R., China', 'Hong Kong'), ('Trinidad & Tobago', 'Trinidad and Tobago')]
    
for wrong_name, right_name in fix_names:
    report_data.loc[report_data.Country == wrong_name, 'Country'] = right_name
    
# Rename "Happiness Score" column to "Happiness_Score",
# "Health" column to "Life_Expectancy" and "Trust" to "Gov_Trustworthiness"
report_data.rename(columns={'Happiness Score': 'Happiness_Score', 'Health': 'Life_Expectancy',
                            'Trust': 'Gov_Trustworthiness'}, inplace=True)
In [11]:
print("Combined Dataset Dimensions: {:,} columns and {:,} rows".format(report_data.shape[1], report_data.shape[0]))
report_data.head()
Combined Dataset Dimensions: 12 columns and 782 rows
Out[11]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
0 Switzerland 2015 1 1.39651 1.34951 0.94143 0.66557 0.29678 0.41978 7.587 Top 3
1 Iceland 2015 2 1.30232 1.40223 0.94784 0.62877 0.43630 0.14145 7.561 Top 3
2 Denmark 2015 3 1.32548 1.36058 0.87464 0.64938 0.34139 0.48357 7.527 Top 3
3 Norway 2015 4 1.45900 1.33095 0.88521 0.66973 0.34699 0.36503 7.522 Top 3
4 Canada 2015 5 1.32629 1.32261 0.90563 0.63297 0.45811 0.32957 7.427 Top 3

Step 4: Check Combined Dataset for Rows with Missing Values

In [12]:
print('Missing Value Counts for Each Column\n' + '='*36)

print(report_data.isnull().sum())

print('\n\nRow(s) in dataset with missing data:')
report_data[report_data['Gov_Trustworthiness'].isna()]
Missing Value Counts for Each Column
====================================
Country                0
Year                   0
Rank                   0
GDP                    0
Family                 0
Life_Expectancy        0
Freedom                0
Generosity             0
Gov_Trustworthiness    1
Happiness_Score        0
Target                 0
Target_n               0
dtype: int64


Row(s) in dataset with missing data:
Out[12]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
489 United Arab Emirates 2018 20 2.096 0.776 0.67 0.284 0.186 NaN 6.774 Top 3

We can see that the row with the missing data came from the 2018 report. Because there is only one row with missing data and the extent of the analysis does not hinge on the missing data, the row will not be removed and left as is.

Step 5: Describe the Combined Dataset

In [13]:
print("Describe Dataset:")

report_data.describe()
Describe Dataset:
Out[13]:
Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score
count 782.000000 782.000000 782.000000 782.000000 782.000000 782.000000 782.000000 781.000000 782.000000
mean 2016.993606 78.698210 0.916047 1.078392 0.612416 0.411091 0.218576 0.125436 5.379018
std 1.417364 45.182384 0.407340 0.329548 0.248309 0.152880 0.122321 0.105816 1.127456
min 2015.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.693000
25% 2016.000000 40.000000 0.606500 0.869363 0.440183 0.309767 0.130000 0.054000 4.509750
50% 2017.000000 79.000000 0.982205 1.124735 0.647310 0.431000 0.201980 0.091000 5.322000
75% 2018.000000 118.000000 1.236187 1.327250 0.808000 0.531000 0.278832 0.156030 6.189500
max 2019.000000 158.000000 2.096000 1.644000 1.141000 0.724000 0.838080 0.551910 7.769000

Step 6: Numerical Column Histograms

In [14]:
fig = plt.figure()
fig.subplots_adjust(hspace=0.8, wspace=0.5)
fig.set_size_inches(13.5, 13)
sb.set(font_scale = 1.25)

warnings.filterwarnings('ignore')

i = 1
for var in report_data.columns:
    try:
        fig.add_subplot(4, 2, i)
        sb.distplot(pd.Series(report_data[var], name=''), bins=50,
                    fit=norm, kde=False).set_title(var + " Histogram")
        plt.ylabel('Count')

        i += 1
    except ValueError:
        pass

fig.tight_layout()
warnings.filterwarnings('default')

Step 7: Pandas Profiling Report: Summary, Correlation Matrices, and Missing Value Information.

In [15]:
# Combined Happiness Reports Profiling Report
pp.ProfileReport(report_data).to_notebook_iframe()

Step 8: Annotated Correlation Matrix of Combined Dataset

In [16]:
plt.rcParams['figure.figsize'] = (15, 10)
plt.rcParams.update({'font.size': 13})

sb.set(font_scale = 1.5)
sb.set_style(style='white')

sb.heatmap(report_data.corr(), annot=True, linewidth=1).set_title('Annotated Correlation Matrix of Combined Dataset')
Out[16]:
Text(0.5, 1.0, 'Annotated Correlation Matrix of Combined Dataset')

It looks like GDP, Family, and Life Expectancy are strongly correlated with the Happiness score. While Freedom correlates very well with the Happiness score, it's also correlated quite well with all data columns (except Rank). Gov_Trustworthiness still has a moderately good correlation with the Happiness score.

Step 9: Birds Eye of View of Column Distributions and Correlations

Below is a pairwise comparison of our variables to give us a birds eye view of the distributions and correlations of the dataset. The color is based on quartiles of the Happiness_Score so (0%-25%, 25%-50%, 50%-75%, 75%-100%).

Note: right-click the graph and select "Open Image in New Tab" to zoom in to get a better view.

In [17]:
fig = plt.figure()
fig.set_size_inches(12, 12)
sb.set(font_scale = 1.25)

sb.pairplot(report_data.drop(['Target_n'], axis=1), 
            hue='Target').fig.suptitle("Birds Eye of View of Column Distributions and Correlations", y=1.01)
Out[17]:
Text(0.5, 1.01, 'Birds Eye of View of Column Distributions and Correlations')
<Figure size 864x864 with 0 Axes>

In the scatterplots, we see that GDP, Family, and Life_Expectancy are quite linearly correlated with some noise. It is to see interesting that the correlation of Gov_Trustworthiness has distributions all over the place, with no straightforward pattern evident.

Part 1 Conclusion

Based on the preprocessing and analysis above, i can see that the data has (essentially) no missing or duplicated values and there are some strong correlations between several variables in the dataset. With EDA finished, we will move onto a deeper and more detailed analysis of the data.

Part 2: Deeper Analysis - Interactive Plots and Data Coordination

In this section we will take a deeper look into the various relationships (highs and lows) between the data columns using interactive plots and data coordination (how the data points connect to each other.

Step 1: Highs and Lows of Metric Values

Before we dive deeper into the dataset, lets take a look at the highs and lows for each of the metrics to get a better idea of our range of values.

GDP

Highs

In [18]:
report_data.sort_values(by='GDP', ascending=False).head()
Out[18]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
489 United Arab Emirates 2018 20 2.09600 0.77600 0.67000 0.28400 0.18600 NaN 6.774 Top 3
349 Qatar 2017 35 1.87077 1.27430 0.71010 0.60413 0.33047 0.43930 6.375 Top 3
193 Qatar 2016 36 1.82427 0.87964 0.71723 0.56679 0.32388 0.48049 6.375 Top 3
332 Luxembourg 2017 18 1.74194 1.45758 0.84509 0.59663 0.28318 0.31883 6.863 Top 3
177 Luxembourg 2016 20 1.69752 1.03999 0.84542 0.54870 0.27571 0.35329 6.871 Top 3

Lows

In [19]:
report_data.sort_values(by='GDP', ascending=True).head()
Out[19]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
567 Somalia 2018 98 0.0 0.71200 0.11500 0.67400 0.23800 0.28200 4.982 Low-Mid 1
233 Somalia 2016 76 0.0 0.33613 0.11466 0.56778 0.27225 0.31180 5.440 Top-Mid 2
469 Central African Republic 2017 155 0.0 0.00000 0.01877 0.27084 0.28088 0.05657 2.693 Low 0
737 Somalia 2019 112 0.0 0.69800 0.26800 0.55900 0.24300 0.27000 4.668 Low-Mid 1
119 Congo (Kinshasa) 2015 120 0.0 1.00120 0.09806 0.22605 0.24834 0.07625 4.517 Low 0

Family

Highs

In [20]:
report_data.sort_values(by='Family', ascending=False).head()
Out[20]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
473 Iceland 2018 4 1.34300 1.64400 0.91400 0.67700 0.35300 0.13800 7.495 Top 3
629 Iceland 2019 4 1.38000 1.62400 1.02600 0.59100 0.35400 0.11800 7.494 Top 3
317 Iceland 2017 3 1.48063 1.61057 0.83355 0.62716 0.47554 0.15353 7.504 Top 3
477 New Zealand 2018 8 1.26800 1.60100 0.87600 0.66900 0.36500 0.38900 7.324 Top 3
470 Finland 2018 1 1.30500 1.59200 0.87400 0.68100 0.20200 0.39300 7.632 Top 3

Lows

In [21]:
report_data.sort_values(by='Family', ascending=True).head()
Out[21]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
312 Togo 2016 155 0.28123 0.0 0.24811 0.34678 0.17517 0.11587 3.303 Low 0
624 Central African Republic 2018 155 0.02400 0.0 0.01000 0.30500 0.21800 0.03800 3.083 Low 0
469 Central African Republic 2017 155 0.00000 0.0 0.01877 0.27084 0.28088 0.05657 2.693 Low 0
780 Central African Republic 2019 155 0.02600 0.0 0.10500 0.22500 0.23500 0.03500 3.083 Low 0
147 Central African Republic 2015 148 0.07850 0.0 0.06699 0.48879 0.23835 0.08289 3.678 Low 0

Life_Expectancy

Highs

In [22]:
report_data.sort_values(by='Life_Expectancy', ascending=False).head()
Out[22]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
659 Singapore 2019 34 1.572 1.463 1.141 0.556 0.271 0.453 6.262 Top 3
701 Hong Kong 2019 76 1.438 1.277 1.122 0.440 0.258 0.287 5.430 Top-Mid 2
683 Japan 2019 58 1.327 1.419 1.088 0.445 0.069 0.140 5.886 Top-Mid 2
655 Spain 2019 30 1.286 1.484 1.062 0.362 0.153 0.079 6.354 Top 3
631 Switzerland 2019 6 1.452 1.526 1.052 0.572 0.263 0.343 7.480 Top 3

Lows

In [23]:
report_data.sort_values(by='Life_Expectancy', ascending=True).head()
Out[23]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
760 Swaziland 2019 135 0.81100 1.14900 0.0 0.31300 0.07400 0.13500 4.212 Low 0
122 Sierra Leone 2015 123 0.33024 0.95571 0.0 0.40840 0.21488 0.08786 4.507 Low 0
268 Sierra Leone 2016 111 0.36485 0.62800 0.0 0.30685 0.23897 0.08196 4.635 Low-Mid 1
453 Lesotho 2017 139 0.52102 1.19010 0.0 0.39066 0.15750 0.11909 3.808 Low 0
582 Sierra Leone 2018 113 0.25600 0.81300 0.0 0.35500 0.23800 0.05300 4.571 Low-Mid 1

Freedom

Highs

In [24]:
report_data.sort_values(by='Freedom', ascending=False).head()
Out[24]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
513 Uzbekistan 2018 44 0.719 1.584 0.605 0.724 0.328 0.259 6.096 Top-Mid 2
589 Cambodia 2018 120 0.549 1.088 0.457 0.696 0.256 0.065 4.433 Low 0
471 Norway 2018 2 1.456 1.582 0.861 0.686 0.286 0.340 7.594 Top 3
472 Denmark 2018 3 1.351 1.590 0.868 0.683 0.284 0.408 7.555 Top 3
470 Finland 2018 1 1.305 1.592 0.874 0.681 0.202 0.393 7.632 Top 3

Lows

In [25]:
report_data.sort_values(by='Freedom', ascending=True).head()
Out[25]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
290 Sudan 2016 133 0.63069 0.81928 0.29759 0.0 0.18077 0.10039 4.139 Low 0
779 Afghanistan 2019 154 0.35000 0.51700 0.36100 0.0 0.15800 0.02500 3.203 Low 0
454 Angola 2017 140 0.85843 1.10441 0.04987 0.0 0.09793 0.06972 3.795 Low 0
611 Angola 2018 142 0.73000 1.12500 0.26900 0.0 0.07900 0.06100 3.795 Low 0
111 Iraq 2015 112 0.98549 0.81889 0.60237 0.0 0.17922 0.13788 4.677 Low-Mid 1

Generosity

Highs

In [26]:
report_data.sort_values(by='Generosity', ascending=False).head()
Out[26]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
428 Myanmar 2017 114 0.36711 1.12324 0.39752 0.51449 0.83808 0.18882 4.545 Low-Mid 1
276 Myanmar 2016 119 0.34112 0.69981 0.39880 0.42692 0.81971 0.20243 4.395 Low 0
128 Myanmar 2015 129 0.27108 0.70905 0.48246 0.44017 0.79588 0.19034 4.307 Low 0
395 Indonesia 2017 81 0.99554 1.27444 0.49235 0.44332 0.61170 0.01532 5.262 Low-Mid 1
599 Myanmar 2018 130 0.68200 1.17400 0.42900 0.58000 0.59800 0.17800 4.308 Low 0

Lows

In [27]:
report_data.sort_values(by='Generosity', ascending=True).head()
Out[27]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
548 Greece 2018 79 1.15400 1.20200 0.87900 0.13100 0.0 0.04400 5.358 Low-Mid 1
101 Greece 2015 102 1.15406 0.92933 0.88213 0.07699 0.0 0.01397 4.857 Low-Mid 1
256 Greece 2016 99 1.24886 0.75473 0.80029 0.05822 0.0 0.04127 5.033 Low-Mid 1
707 Greece 2019 82 1.18100 1.15600 0.99900 0.06700 0.0 0.03400 5.287 Low-Mid 1
401 Greece 2017 87 1.28949 1.23941 0.81020 0.09573 0.0 0.04329 5.227 Low-Mid 1

Gov_Trustworthiness

Highs

In [28]:
report_data.sort_values(by='Gov_Trustworthiness', ascending=False).head()
Out[28]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
153 Rwanda 2015 154 0.22208 0.77370 0.42864 0.59201 0.22628 0.55191 3.465 Low 0
27 Qatar 2015 28 1.69042 1.07860 0.79733 0.64040 0.32573 0.52208 6.611 Top 3
309 Rwanda 2016 152 0.32846 0.61586 0.31865 0.54320 0.23552 0.50521 3.515 Low 0
23 Singapore 2015 24 1.52186 1.02000 1.02525 0.54252 0.31105 0.49210 6.798 Top 3
2 Denmark 2015 3 1.32548 1.36058 0.87464 0.64938 0.34139 0.48357 7.527 Top 3

Lows

In [29]:
report_data.sort_values(by='Gov_Trustworthiness', ascending=True).head()
Out[29]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
536 Moldova 2018 67 0.65700 1.30100 0.62000 0.23200 0.17100 0.0 5.640 Top-Mid 2
244 Bosnia and Herzegovina 2016 87 0.93383 0.64367 0.70766 0.09511 0.29889 0.0 5.163 Low-Mid 1
696 Moldova 2019 71 0.68500 1.32800 0.73900 0.24500 0.18100 0.0 5.529 Top-Mid 2
404 Bosnia and Herzegovina 2017 90 0.98241 1.06934 0.70519 0.20440 0.32887 0.0 5.182 Low-Mid 1
73 Indonesia 2015 74 0.82827 1.08708 0.63793 0.46611 0.51535 0.0 5.399 Top-Mid 2

Happiness_Score

Highs

In [30]:
report_data.sort_values(by='Happiness_Score', ascending=False).head()
Out[30]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
626 Finland 2019 1 1.34000 1.58700 0.98600 0.59600 0.15300 0.39300 7.769 Top 3
470 Finland 2018 1 1.30500 1.59200 0.87400 0.68100 0.20200 0.39300 7.632 Top 3
627 Denmark 2019 2 1.38300 1.57300 0.99600 0.59200 0.25200 0.41000 7.600 Top 3
471 Norway 2018 2 1.45600 1.58200 0.86100 0.68600 0.28600 0.34000 7.594 Top 3
0 Switzerland 2015 1 1.39651 1.34951 0.94143 0.66557 0.29678 0.41978 7.587 Top 3

Lows

In [31]:
report_data.sort_values(by='Happiness_Score', ascending=True).head()
Out[31]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n
469 Central African Republic 2017 155 0.00000 0.00000 0.01877 0.27084 0.28088 0.05657 2.693 Low 0
157 Togo 2015 158 0.20868 0.13995 0.28443 0.36453 0.16681 0.10731 2.839 Low 0
781 South Sudan 2019 156 0.30600 0.57500 0.29500 0.01000 0.20200 0.09100 2.853 Low 0
314 Burundi 2016 157 0.06831 0.23442 0.15747 0.04320 0.20290 0.09419 2.905 Low 0
468 Burundi 2017 154 0.09162 0.62979 0.15161 0.05990 0.20444 0.08415 2.905 Low 0

Step 2: Create Interactive Consolidated Graphs of Report Data

Plotly Scatter Plot Function

In [32]:
def plotlyScatterPlot(df, col1, col2, xaxis_range):
    slider = [dict(currentvalue={"prefix": "Year: "})]

    fig = px.scatter(df.sort_values('Year'), x=col1, y=col2, 
                     title=col2 + " vs. " + col1,
                     animation_frame="Year", animation_group="Country",
                     color="Target", hover_name="Country", 
                     hover_data=["Year", "Rank", "GDP", "Family", "Life_Expectancy", "Gov_Trustworthiness"],
                     width=980, height=800).update_layout(sliders=slider, xaxis_range=xaxis_range, yaxis_range=[2, 8])
    fig.show()

Happiness_Score vs. GDP

One of the biggest criticisms of the World Happiness Report is the almost linear correlation between a country's GDP and Happiness_Score. Meaning countries with a higher GPD will inherently have a higher Happiness_Score, (when in reality that might not be the case) and at the same time making lower GDP countries out to be more unhappier than they might actually be.

In [33]:
plotlyScatterPlot(report_data, 'GDP', 'Happiness_Score', [-0.05, 2.2])

Happiness_Score vs. Family

In [34]:
plotlyScatterPlot(report_data, 'Family', 'Happiness_Score', [-0.05, 1.8])

Happiness_Score vs. Life_Expectancy

In [35]:
plotlyScatterPlot(report_data, 'Life_Expectancy', 'Happiness_Score', [-0.05, 1.2])

Step 3: Parallel Coordinate Map to Show how Each Columns' Data Points Connect

In [36]:
coord_data = go.Parcoords(line = dict(color = report_data['Target_n'], colorscale = 'Temps'), 
                          dimensions=list([
                              dict(range=[report_data['Year'].min(), 
                                          report_data['Year'].max()],
                                   tickvals = report_data['Year'].unique(), 
                                   label='Year', values=report_data['Year']),
                              dict(range=[0, report_data['Target_n'].max()],
                                   tickvals = report_data['Target_n'].unique(), 
                                   ticktext = report_data['Target'].unique(),
                                   label='Targets', values=report_data['Target_n']),
                              dict(range=[(report_data['Rank'] * -1).min(), 
                                          (report_data['Rank'] * -1).max()],
                                   label='Rank', values=(report_data['Rank'] * -1)),
                              dict(range=[report_data['GDP'].min(), 
                                          report_data['GDP'].max()],
                                   label='GDP', values=report_data['GDP']),
                              dict(range=[report_data['Family'].min(), 
                                          report_data['Family'].max()],
                                   label='Family', values=report_data['Family']),
                              dict(range=[report_data['Life_Expectancy'].min(), 
                                          report_data['Life_Expectancy'].max()],
                                   label='Life_Expectancy', values=report_data['Life_Expectancy']),
                              dict(range=[report_data['Freedom'].min(), 
                                          report_data['Freedom'].max()],
                                   label='Freedom', values=report_data['Freedom']),
                              dict(range=[report_data['Generosity'].min(), 
                                          report_data['Generosity'].max()],
                                   label='Generosity', values=report_data['Generosity']),
                              dict(range=[report_data['Gov_Trustworthiness'].min(), 
                                          report_data['Gov_Trustworthiness'].max()],
                                   label='Gov_Trust', values=report_data['Gov_Trustworthiness']),
                              dict(range=[report_data['Happiness_Score'].min(), 
                                          report_data['Happiness_Score'].max()],
                                   label='Happy_Score', values=report_data['Happiness_Score'])
                          ]))

layout = go.Layout(
   title = '''Interactive Parallel Coordinate Plot
              <br><sup>(Click and Drag Vertically Along the Axes to Apply Filters)</sup>''',
   title_y=0.98, height=850, font=dict(size=15, color='black')
)

go.Figure(data=coord_data, layout=layout)

Part 2 Conclusion

From the interactive plots, we can see that overall countries seem to be heading towards the right (higher/better scores, which is good because it would not be a good look for the world if countries as whole were getting worse. There were some outliers here and there depending on the metric but so far it seems to hold true that the higher the three highly correlated metrics identified in Part 1, Step 8 are (GDP, Family, and Life_Expectancy), the happier the country is.

Part 3: Data Mapping (Geography)

This section will focus on plotting on geo-maps to bring all the data into perspective in the world view. I will be using Basemap from mpl_toolkits.basemap and Choropleth Maps from plotly.express to do the map plotting.

Step 1: Load and Parse World Country Codes

To plot maps with Plotly I'll need to use the 3 letter country codes (ISO_Alpha 3) and to do that I'll be scrapping the "Current codes" table from the ISO_3166-1 Wikipedia page using pandas.read_html().

In [37]:
# This ssl line is needed to allow for pandas to load in the table 
# from wikipedia, otherwise an SSL "Invalid Certifcate" error occures
# I'm unsure if this will happen on other systems but I was unable to fix it on mine
ssl._create_default_https_context = ssl._create_unverified_context

# Load in Wikipedia data table
world_codes = pd.read_html('https://en.wikipedia.org/wiki/ISO_3166-1')[1].rename(
    columns={'English short name (using title case)': 'World_Country',
             'Alpha-2 code': 'ISO_a2', 'Alpha-3 code': 'ISO_a3'})


# If for whatever reason pandas is unable to read the data correctly from the wikipedia page above, 
# I have included the data in a csv file to be read from instead: "Wikipedia_ISO_3166-1.csv"

# Uncomment the line below and comment the lines above to read from the csv file instead of the website 
# world_codes = pd.read_csv('Report_Data/Wikipedia_ISO_3166-1.csv')  # (Oct 9th, 2021)


# Get 3 letter country codes from pycountry
countries = {}
for country in pycountry.countries:
    countries[country.alpha_3] = country.name

world_codes = world_codes[world_codes.columns[:-3]]
world_codes['Country'] = [countries.get(country, 'Unknown Code') for country in list(world_codes['ISO_a3'])]

# Parse country names to make sure that they match the names in our dataset
# As you can see there a few that needed to be mapped manually
for country in world_codes['Country']:
    if "Unknown Code" in country:
        world_codes.loc[world_codes.Country == country, 
                        'Country'] = world_codes.loc[world_codes.Country == country, 'World_Country']
    elif "Côte d'Ivoire" == country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Ivory Coast"
    elif "Eswatini" == country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Swaziland"
    elif "Viet Nam" == country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Vietnam"
    elif "Congo" == country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Congo (Brazzaville)"
    elif "Congo," in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Congo (Kinshasa)"
    elif "Korea" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "South Korea"
    elif "Czech" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Czech Republic"
    elif "Russia" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Russia"
    elif "Somali" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Somalia"
    elif "Macedonia" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "North Macedonia"
    elif "Lao" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Laos"
    elif "Palestin" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Palestinian Territories"
    elif "Syria" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Syria"
    else:
        if ',' in country:
            country_part = country.split(',')[0]
            
            if country_part in country:
                world_codes.loc[world_codes.Country == country, 'Country'] = country_part
In [38]:
print("Dataset Dimensions: {:,} columns and {:,} rows".format(world_codes.shape[1], world_codes.shape[0]))
world_codes.head()
Dataset Dimensions: 4 columns and 249 rows
Out[38]:
World_Country ISO_a2 ISO_a3 Country
0 Afghanistan AF AFG Afghanistan
1 Åland Islands AX ALA Åland Islands
2 Albania AL ALB Albania
3 Algeria DZ DZA Algeria
4 American Samoa AS ASM American Samoa

Step 2: Load World Capital Coordinates

To visualize the maps using Basemap, I need coordinates (latitude and longitude) for the countries, in this case I'll be using country capitals. This data can be retrieved from this site: http://techslides.com/list-of-countries-and-capitals but i am specifically scraping the webpage (using pandas.read_html()) for the data table because of improper data formatting in the linked downloadable data sources.

In [39]:
map_coords = pd.read_html('http://techslides.com/list-of-countries-and-capitals')[0]

# Apply Headers to dataframe from first row of table
new_header = map_coords.iloc[0]
map_coords = map_coords[1:]
map_coords.columns = [head.replace(' ', '_') for head in new_header]
map_coords = map_coords.apply(pd.to_numeric, errors='ignore')


# If for whatever reason pandas is unable to read the data correctly from the website above, 
# I have included the data in a csv file to be read from instead: "country-capital_coordinates.csv"

# Uncomment the line below and comment the lines above to read from the csv file instead of the website 
# map_coords = pd.read_csv('Report_Data/country-capital_coordinates.csv')  # (Oct 9th, 2021)


# Some manual country parsing to match the dataset
for country in map_coords['Country_Name']:
    if "Cote d’Ivoire" == country:
        map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "Ivory Coast"
    elif "Palestin" in country:
        map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "Palestinian Territories"
    elif "Macedonia" in country:
        map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "North Macedonia"
    elif "Gambia" in country:
        map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "Gambia"
    elif "Republic of Congo" == country:
        map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "Congo (Brazzaville)"
    elif "Democratic Republic of the Congo" in country:
        map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "Congo (Kinshasa)"
In [40]:
print("Dataset Dimensions: {:,} columns and {:,} rows".format(map_coords.shape[1], map_coords.shape[0]))
map_coords.head()
Dataset Dimensions: 6 columns and 245 rows
Out[40]:
Country_Name Capital_Name Capital_Latitude Capital_Longitude Country_Code Continent_Name
1 Afghanistan Kabul 34.516667 69.183333 AF Asia
2 Aland Islands Mariehamn 60.116667 19.900000 AX Europe
3 Albania Tirana 41.316667 19.816667 AL Europe
4 Algeria Algiers 36.750000 3.050000 DZ Africa
5 American Samoa Pago Pago -14.266667 -170.700000 AS Australia

Step 3: Merge Country Coordinates and Codes (ISO_a2 and ISO_a3) into Consolidated Report Dataset

Merge Codes

In [41]:
report_data_codes = report_data.merge(world_codes.drop('World_Country', axis=1), on='Country')

print("Dataset Dimensions: {:,} columns and {:,} rows".format(report_data_codes.shape[1], report_data_codes.shape[0]))
report_data_codes.head()
Dataset Dimensions: 14 columns and 775 rows
Out[41]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n ISO_a2 ISO_a3
0 Switzerland 2015 1 1.39651 1.34951 0.94143 0.66557 0.29678 0.41978 7.587 Top 3 CH CHE
1 Switzerland 2016 2 1.52733 1.14524 0.86303 0.58557 0.28083 0.41203 7.509 Top 3 CH CHE
2 Switzerland 2017 4 1.56498 1.51691 0.85813 0.62007 0.29055 0.36701 7.494 Top 3 CH CHE
3 Switzerland 2018 5 1.42000 1.54900 0.92700 0.66000 0.25600 0.35700 7.487 Top 3 CH CHE
4 Switzerland 2019 6 1.45200 1.52600 1.05200 0.57200 0.26300 0.34300 7.480 Top 3 CH CHE

Merge Coordinates

In [42]:
report_data_coords = pd.merge(report_data_codes, 
                              map_coords[['Country_Name', 'Capital_Name', 'Capital_Latitude', 'Capital_Longitude']], 
                              left_on='Country', right_on='Country_Name'
                             ).drop('Country_Name', axis=1).sort_values(by=['Country', 'Year'], ascending=True
                                                                       ).reset_index(drop=True)

print("Dataset Dimensions: {:,} columns and {:,} rows".format(report_data_coords.shape[1], report_data_coords.shape[0]))
report_data_coords.head()
Dataset Dimensions: 17 columns and 775 rows
Out[42]:
Country Year Rank GDP Family Life_Expectancy Freedom Generosity Gov_Trustworthiness Happiness_Score Target Target_n ISO_a2 ISO_a3 Capital_Name Capital_Latitude Capital_Longitude
0 Afghanistan 2015 153 0.31982 0.30285 0.30335 0.23414 0.36510 0.09719 3.575 Low 0 AF AFG Kabul 34.516667 69.183333
1 Afghanistan 2016 154 0.38227 0.11037 0.17344 0.16430 0.31268 0.07112 3.360 Low 0 AF AFG Kabul 34.516667 69.183333
2 Afghanistan 2017 141 0.40148 0.58154 0.18075 0.10618 0.31187 0.06116 3.794 Low 0 AF AFG Kabul 34.516667 69.183333
3 Afghanistan 2018 145 0.33200 0.53700 0.25500 0.08500 0.19100 0.03600 3.632 Low 0 AF AFG Kabul 34.516667 69.183333
4 Afghanistan 2019 154 0.35000 0.51700 0.36100 0.00000 0.15800 0.02500 3.203 Low 0 AF AFG Kabul 34.516667 69.183333

"Invalid" Countries

The below output is a list of countries that do not have a valid country code and thus were not merged correctly into the dataset.

  • North and Northern Cyprus is recognized as just Cyprus in the dataset
  • Somaliland Region is recognized as Somalia in the dataset
  • Kosovo is not recognized
In [43]:
for country in report_data['Country'].unique():
    if country not in list(report_data_coords['Country'].unique()):
        print(country)
North Cyprus
Kosovo
Somaliland region
Somaliland Region
Northern Cyprus

Step 4: World Maps (Basemap)

World Map Plotting Function

In [44]:
def worldBasemap(df, col1, col2):
    sb.set(style=("white"), font_scale=1.5)
    
    m = Basemap(projection='mill', llcrnrlat=-60, urcrnrlat=90,
                llcrnrlon=-180, urcrnrlon=180, resolution='c')
    
    m.drawcountries()
    m.drawparallels(np.arange(-90, 91., 30.))
    m.drawmeridians(np.arange(-90, 90., 60.))
    
    
    lat = df['Capital_Latitude'].values
    long = df['Capital_Longitude'].values
    
    col_color = df[col1].values
    col_size = df[col2].values
    
    m.scatter(long, lat, latlon=True, c=col_color, s=150*col_size, 
              linewidth=1, edgecolors='black', cmap='hot', alpha=1)
    
    m.fillcontinents(color='#072B57', lake_color='#FFFFFF', alpha=0.4)
    plt.title("World - " + col1 + " vs. " + col2, fontsize=25)
    m.colorbar(label=col1)

Happiness_Score vs. GDP

In [45]:
plt.figure(figsize=(16, 10))
worldBasemap(report_data_coords, 'Happiness_Score', 'GDP')

Happiness_Score vs. Family

In [46]:
plt.figure(figsize=(16, 10))
worldBasemap(report_data_coords, 'Happiness_Score', 'Family')

Happiness_Score vs. Life_Expectancy

In [47]:
plt.figure(figsize=(16, 10))
worldBasemap(report_data_coords, 'Happiness_Score', 'Life_Expectancy')

The world graphs above make it clear that Much of Europe and the Americas are doing the best in terms of the metrics of this report. The graphs would lead you to believe the all of Africa, and much of Asia has a lot more room for development.

Step 5: Europe Maps (Basemap)

Europe is kind of hard to see whats going on, so lets zoom in a little.

Europe Map Plotting Function

In [48]:
def europeBasemap(df, col1, col2):
    sb.set(style=("white"), font_scale=1.5)
    
    m = Basemap(projection='mill', llcrnrlat=30, urcrnrlat=72,
                llcrnrlon=-20, urcrnrlon=55, resolution='l')
    
    m.drawstates()
    m.drawcountries()
    m.drawparallels(np.arange(-90, 91., 30.))
    m.drawmeridians(np.arange(-90, 90., 60.))
    
    lat = df['Capital_Latitude'].values
    lon = df['Capital_Longitude'].values
    
    col_color = df[col1].values
    col_size = df[col2].values
    
    m.scatter(lon, lat, latlon=True, c=col_color, s=250*col_size, 
              linewidth=2, edgecolors='black', cmap='hot', alpha=1)
    
    m.fillcontinents(color='#072B57', lake_color='#FFFFFF', alpha=0.3)
    plt.title('Europe - ' + col1 + ' vs. ' + col2, fontsize=25)
    m.colorbar(label=col1)

Happiness_Score vs. GDP

In [49]:
plt.figure(figsize=(16, 16))
europeBasemap(report_data_coords, 'Happiness_Score', 'GDP')

Happiness_Score vs. Family

In [50]:
plt.figure(figsize=(16, 16))
europeBasemap(report_data_coords, 'Happiness_Score', 'Family')

Happiness_Score vs. Life_Expectancy

In [51]:
plt.figure(figsize=(16, 16))
europeBasemap(report_data_coords, 'Happiness_Score', 'Life_Expectancy')

From the Europe maps above, we can see that much of northern and central Europe is fairing the best in terms of the metrics, while much of southern Europe is lagging behind.

Step 6: World Maps (Plotly)

NOTE: If you are viewing this notebook in nbviewer, the plotly geo-maps will not be rendered because the connections to do so get blocked by the site and I am unable to find a workaround. As a result, if you want to view this notebook in its entirety, you will need to use Binder or a fully functional Jupyter environment instead

The huge benefit of using plotly is that the maps can be animated and/or have filters applied to view the data a bit more dynamically. It makes it much easier to view data on a timescale.

Map Plotting Function

In [52]:
def plotlyMap(df, col, scope, height):
    slider = [dict(currentvalue={"prefix": "Year: "})]

    fig = px.choropleth(df.sort_values('Year'), locations="ISO_a3", scope=scope.lower(),
                        color=col, animation_frame="Year", animation_group="Country",
                        hover_name="Country", hover_data=["Year", "Rank", "Family", "Life_Expectancy", 
                                                          "Gov_Trustworthiness"],
                        color_continuous_scale=px.colors.sequential.haline).update_layout(
        autosize=False, height=height, width=980, sliders=slider, 
        title_text = 'Interactive ' + scope.capitalize() + ' Map - ' + col)
    
    fig.show()

Happiness_Score

In [53]:
plotlyMap(report_data_coords, 'Happiness_Score', 'world', 600)

GDP

In [54]:
plotlyMap(report_data_coords, 'GDP', 'world', 600)

Notable to point out that there was quite the downturn in world GDP in 2018, which appears to be related to a number of economic factors around the world, article: Economic growth is slowing all around the world.

Family

In [55]:
plotlyMap(report_data_coords, 'Family', 'world', 600)

World Life_Expectancy

In [56]:
plotlyMap(report_data_coords, 'Life_Expectancy', 'world', 600)

It is very interesting to me to see how the world changes from year to year with this data, being able to quickly look at each year of the data and compare them is very beneficial when doing this kind of analysis.

Step 7: Europe Maps (Plotly)

Just like in Step 5: Europe Maps (Basemap) section, lets zoom in on Europe.

Happiness_Score

In [57]:
plotlyMap(report_data_coords, 'Happiness_Score', 'europe', 750)

GDP

In [58]:
plotlyMap(report_data_coords, 'GDP', 'europe', 750)

As we saw on the Plotly - World GDP map, there was quite a downturn in GDP in 2018. Article: Economic growth is slowing all around the world.

Family

In [59]:
plotlyMap(report_data_coords, 'Family', 'europe', 750)

Life_Expectancy

In [60]:
plotlyMap(report_data_coords, 'Life_Expectancy', 'europe', 750)

From the Europe maps above, we can see that much of northern and central Europe is fairing a bit better in terms of the metrics, while eastern and southern Europe are lagging a bit behind.

Part 3 Conclusion - Final Analysis

From the analysis in notebook, it seems like some of the criticism for "The World Happiness Report" ring true, there is a high focus on a country's GDP along with strongly correlated features such as Family and Life_Expectancy.

It does make sense to an extent that not only having money but also having a good social net (Family) is important and does make it easier for people to advance in life in whatever direction they so choose. This also translates quite well to Life_Expectancy because of a greater ability to provide for yourself (and your Family), thus having access to better options in general.

Suffice to say, money can indeed buy happiness.

In [ ]: