Project: Analysis of World Happiness Reports (2015-2019)¶

Author: Robert Zacchigna¶

Table of Contents¶

Problem Statement
Proposal
Dataset - World Happiness Reports (2015-2019)
- Download Location
- Columns
Imports

Part 1: Exploratory Data Analysis and Data Preprocessing

Part 2: Deeper Analysis - Interactive Plots and Data Coordination

Part 3: Data Mapping (Geography)

Problem Statement:¶

The citizens of the world are vast and diverse across the 150+ plus countries on the planet and thus the perceptions of one countries citizens to another can vary greatly. The World Happiness Report aimed to collect and quantify this information to see what people around the world think of their country and the direction it might be going in. This report has not been without controversy, specifically the metrics being measured are debated on being are skewed a particular direction that puts other countries at a disadvantage or misrepresents the citizen's true feelings of their country.

Proposal:¶

A detailed analysis of the World Happiness Reports from 2015-2019 to see what makes citizens happy with their country and what are the major contributors of that happiness. Along with this, analyze the metrics to see if the criticism about the measured metrics hold true for the happiness reports. This will be done by analyzing their relationship to the overall happiness score (which determines a countries ranking in the report) and plotting the data on geographic maps to bring everything into a single view to see how the data looks from a holistic perspective. This would hopefully expose trends between countries and make it easier to see not only what direction a country might be heading but what they might be lacking for their citizens.

Dataset - World Happiness Reports (2015-2019)¶

Download Location: https://www.kaggle.com/unsdsn/world-happiness

Columns:

Country – Name of the Country
Region – Region the country belongs to
Happiness Rank – Rank of the country based on the Happiness Score
Happiness Score - A metric measured in 2015 by asking the sampled people the question: "How would you rate your happiness on a scale of 0 to 10 where 10 is the happiest."
Standard Error – The standard error of the happiness score
Economy (GDP per Capita) – The extent to which GDP contributes to the calculation of the Happiness Score
Family - The extent to which Family contributes to the calculation of the Happiness Score
Health (Life Expectancy) – The extent to which Life expectancy contributed to the calculation of the Happiness Score
Freedom – The extent to which Freedom contributed to the calculation of the Happiness Score
Trust (Government Corruption) – The extent to which Perception of Corruption contributes to Happiness Score.

Imports¶

import ssl
import warnings
import pycountry
import numpy as np
import pandas as pd
import seaborn as sb
import pandas_profiling as pp

from notebook import __version__ as nbv

# Basemap
from mpl_toolkits.basemap import Basemap
from mpl_toolkits.basemap import __version__ as basev

# scipy Libraries
from scipy.stats import norm, stats
from scipy import __version__ as scipv

# matplotlib Libraries
import matplotlib.pyplot as plt
from matplotlib import __version__ as mpv

# plotly Libraries
import plotly.express as px
import plotly.graph_objects as go
from plotly import __version__ as pvm

# Library Versions
lib_info = [('ssl', ssl.OPENSSL_VERSION.split(' ')[1]), ('scipy', scipv), ('numpy', np.__version__), 
            ('pandas', pd.__version__),('plotly', pvm), ('seaborn', sb.__version__), 
            ('pycountry', pycountry.__version__), ('matplotlib', mpv),('pandas_profiling', pp.__version__), 
            ('mpl_toolkits.basemap', basev), ('Jupyter Notebook (notebook)', nbv)]

print('Library Versions\n' + '='*16)

for name, vers in lib_info:
    print('{:>27} = {}'.format(name, vers))

Library Versions
================
                        ssl = 1.1.1d
                      scipy = 1.6.0
                      numpy = 1.19.5
                     pandas = 1.3.3
                     plotly = 4.14.3
                    seaborn = 0.11.1
                  pycountry = 20.7.3
                 matplotlib = 3.3.4
           pandas_profiling = 2.10.0
       mpl_toolkits.basemap = 1.2.2+dev
Jupyter Notebook (notebook) = 6.4.4

Part 1: Exploratory Data Analysis and Data Preprocessing¶

Step 1: Load Datasets¶

rep2015 = pd.read_csv('Report_Data/2015.csv')
rep2016 = pd.read_csv('Report_Data/2016.csv')
rep2017 = pd.read_csv('Report_Data/2017.csv')
rep2018 = pd.read_csv('Report_Data/2018.csv')
rep2019 = pd.read_csv('Report_Data/2019.csv')

Step 2: Datasets Dimensions and Heads¶

2015 Report¶

print("Dataset Dimensions: {:,} columns and {:,} rows".format(rep2015.shape[1], rep2015.shape[0]))

rep2015.head()

Dataset Dimensions: 12 columns and 158 rows

2016 Report¶

print("Dataset Dimensions: {:,} columns and {:,} rows".format(rep2016.shape[1], rep2016.shape[0]))

rep2016.head()

Dataset Dimensions: 13 columns and 157 rows

2017 Report¶

print("Dataset Dimensions: {:,} columns and {:,} rows".format(rep2017.shape[1], rep2017.shape[0]))

rep2017.head()

Dataset Dimensions: 12 columns and 155 rows

2018 Report¶

print("Dataset Dimensions: {:,} columns and {:,} rows".format(rep2018.shape[1], rep2018.shape[0]))

rep2018.head()

Dataset Dimensions: 9 columns and 156 rows

2019 Report¶

print("Dataset Dimensions: {:,} columns and {:,} rows".format(rep2019.shape[1], rep2019.shape[0]))

rep2019.head()

Dataset Dimensions: 9 columns and 156 rows

From the heads of the various datasets above, we can see that none of them are in the same format, specially their column names. In order to combine all of the datasets correctly they will need to be parsed and remapped accordingly.

Step 3: Parse and Combine Datasets¶

Columns starting with Happiness, Whisker and the Dystopia.Residual are the targets, just differently named targets. Dystopia Residual compares each countries scores to the theoretical unhappiest country in the world. Since the data from the different report years have different naming conventions, a common name will need to be abstracted in order to combine them all correctly.

# This function takes the relevant report dataset and 
# year in order to parse the data into a usable format
def parse_report(report_df, year):
    
    # Rename columns of reports 2018 and 2019 to match 
    # that of the earlier reports (2015, 2016, 2017)
    if 2017 < year < 2020:
        report_df.rename(columns={'Overall rank': 'Happiness Rank', 'Country or region': 'Country',
                                  'Score': 'Happiness Score', 'GDP per capita': 'Economy (GDP per Capita)', 
                                  'Social support': 'Family', 'Healthy life expectancy': 'Health (Life Expectancy)', 
                                  'Freedom to make life choices': 'Freedom', 
                                  'Perceptions of corruption': 'Trust (Government Corruption)'}, inplace=True)
    
    targets = ['Low', 'Low-Mid', 'Top-Mid', 'Top']
    df_cols = ['Country', 'Rank', 'GDP', 'Family', 'Health', 'Freedom', 'Generosity', 'Trust']
    
    # Load report data into common columns
    target_cols = []
    for col in df_cols:
        target_cols.extend([new_col for new_col in report_df.columns if col in new_col])
    
    df = pd.DataFrame()
    df[df_cols] = report_df[target_cols]
    df['Happiness Score'] = report_df[[col for col in report_df.columns if 'Score' in col]]
    
    # Calculate quartiles on the data.
    df["Target"] = pd.qcut(df[df.columns[-1]], len(targets), labels=targets)
    df["Target_n"] = pd.qcut(df[df.columns[-2]], len(targets), labels=range(len(targets)))
    
    # Insert Year column
    df.insert(1, 'Year', pd.Series([year] * len(report_df)))
    
    return df

Combine Datasets¶

report_data = parse_report(rep2015, 2015)

for repData, year in [(rep2016, 2016), (rep2017.round(5), 2017), (rep2018, 2018), (rep2019, 2019)]:
    report_data = report_data.append(parse_report(repData, year), sort=False)

report_data = report_data.reset_index(drop=True)

Rename Columns and Fix Misc. Country Names to be Consistent¶

fix_names = [('Taiwan Province of China', 'Taiwan'), ('Macedonia', 'North Macedonia'), 
             ('Hong Kong S.A.R., China', 'Hong Kong'), ('Trinidad & Tobago', 'Trinidad and Tobago')]
    
for wrong_name, right_name in fix_names:
    report_data.loc[report_data.Country == wrong_name, 'Country'] = right_name
    
# Rename "Happiness Score" column to "Happiness_Score",
# "Health" column to "Life_Expectancy" and "Trust" to "Gov_Trustworthiness"
report_data.rename(columns={'Happiness Score': 'Happiness_Score', 'Health': 'Life_Expectancy',
                            'Trust': 'Gov_Trustworthiness'}, inplace=True)

print("Combined Dataset Dimensions: {:,} columns and {:,} rows".format(report_data.shape[1], report_data.shape[0]))
report_data.head()

Combined Dataset Dimensions: 12 columns and 782 rows

Step 4: Check Combined Dataset for Rows with Missing Values¶

print('Missing Value Counts for Each Column\n' + '='*36)

print(report_data.isnull().sum())

print('\n\nRow(s) in dataset with missing data:')
report_data[report_data['Gov_Trustworthiness'].isna()]

Missing Value Counts for Each Column
====================================
Country                0
Year                   0
Rank                   0
GDP                    0
Family                 0
Life_Expectancy        0
Freedom                0
Generosity             0
Gov_Trustworthiness    1
Happiness_Score        0
Target                 0
Target_n               0
dtype: int64


Row(s) in dataset with missing data:

We can see that the row with the missing data came from the 2018 report. Because there is only one row with missing data and the extent of the analysis does not hinge on the missing data, the row will not be removed and left as is.

Step 5: Describe the Combined Dataset¶

print("Describe Dataset:")

report_data.describe()

Describe Dataset:

Step 6: Numerical Column Histograms¶

fig = plt.figure()
fig.subplots_adjust(hspace=0.8, wspace=0.5)
fig.set_size_inches(13.5, 13)
sb.set(font_scale = 1.25)

warnings.filterwarnings('ignore')

i = 1
for var in report_data.columns:
    try:
        fig.add_subplot(4, 2, i)
        sb.distplot(pd.Series(report_data[var], name=''), bins=50,
                    fit=norm, kde=False).set_title(var + " Histogram")
        plt.ylabel('Count')

        i += 1
    except ValueError:
        pass

fig.tight_layout()
warnings.filterwarnings('default')

Step 7: Pandas Profiling Report: Summary, Correlation Matrices, and Missing Value Information.¶

# Combined Happiness Reports Profiling Report
pp.ProfileReport(report_data).to_notebook_iframe()

Step 8: Annotated Correlation Matrix of Combined Dataset¶

plt.rcParams['figure.figsize'] = (15, 10)
plt.rcParams.update({'font.size': 13})

sb.set(font_scale = 1.5)
sb.set_style(style='white')

sb.heatmap(report_data.corr(), annot=True, linewidth=1).set_title('Annotated Correlation Matrix of Combined Dataset')

Text(0.5, 1.0, 'Annotated Correlation Matrix of Combined Dataset')

It looks like GDP, Family, and Life Expectancy are strongly correlated with the Happiness score. While Freedom correlates very well with the Happiness score, it's also correlated quite well with all data columns (except Rank). Gov_Trustworthiness still has a moderately good correlation with the Happiness score.

Step 9: Birds Eye of View of Column Distributions and Correlations¶

Below is a pairwise comparison of our variables to give us a birds eye view of the distributions and correlations of the dataset. The color is based on quartiles of the Happiness_Score so (0%-25%, 25%-50%, 50%-75%, 75%-100%).

Note: right-click the graph and select "Open Image in New Tab" to zoom in to get a better view.

fig = plt.figure()
fig.set_size_inches(12, 12)
sb.set(font_scale = 1.25)

sb.pairplot(report_data.drop(['Target_n'], axis=1), 
            hue='Target').fig.suptitle("Birds Eye of View of Column Distributions and Correlations", y=1.01)

Text(0.5, 1.01, 'Birds Eye of View of Column Distributions and Correlations')

<Figure size 864x864 with 0 Axes>

In the scatterplots, we see that GDP, Family, and Life_Expectancy are quite linearly correlated with some noise. It is to see interesting that the correlation of Gov_Trustworthiness has distributions all over the place, with no straightforward pattern evident.

Part 1 Conclusion¶

Based on the preprocessing and analysis above, i can see that the data has (essentially) no missing or duplicated values and there are some strong correlations between several variables in the dataset. With EDA finished, we will move onto a deeper and more detailed analysis of the data.

Part 2: Deeper Analysis - Interactive Plots and Data Coordination¶

In this section we will take a deeper look into the various relationships (highs and lows) between the data columns using interactive plots and data coordination (how the data points connect to each other.

Step 1: Highs and Lows of Metric Values¶

Before we dive deeper into the dataset, lets take a look at the highs and lows for each of the metrics to get a better idea of our range of values.

GDP¶

Highs¶

report_data.sort_values(by='GDP', ascending=False).head()

Lows¶

report_data.sort_values(by='GDP', ascending=True).head()

Family¶

Highs¶

report_data.sort_values(by='Family', ascending=False).head()

Lows¶

report_data.sort_values(by='Family', ascending=True).head()

Life_Expectancy¶

Highs¶

report_data.sort_values(by='Life_Expectancy', ascending=False).head()

Lows¶

report_data.sort_values(by='Life_Expectancy', ascending=True).head()

Freedom¶

Highs¶

report_data.sort_values(by='Freedom', ascending=False).head()

Lows¶

report_data.sort_values(by='Freedom', ascending=True).head()

Generosity¶

Highs¶

report_data.sort_values(by='Generosity', ascending=False).head()

Lows¶

report_data.sort_values(by='Generosity', ascending=True).head()

Gov_Trustworthiness¶

Highs¶

report_data.sort_values(by='Gov_Trustworthiness', ascending=False).head()

Lows¶

report_data.sort_values(by='Gov_Trustworthiness', ascending=True).head()

Happiness_Score¶

Highs¶

report_data.sort_values(by='Happiness_Score', ascending=False).head()

Lows¶

report_data.sort_values(by='Happiness_Score', ascending=True).head()

Step 2: Create Interactive Consolidated Graphs of Report Data¶

Plotly Scatter Plot Function¶

def plotlyScatterPlot(df, col1, col2, xaxis_range):
    slider = [dict(currentvalue={"prefix": "Year: "})]

    fig = px.scatter(df.sort_values('Year'), x=col1, y=col2, 
                     title=col2 + " vs. " + col1,
                     animation_frame="Year", animation_group="Country",
                     color="Target", hover_name="Country", 
                     hover_data=["Year", "Rank", "GDP", "Family", "Life_Expectancy", "Gov_Trustworthiness"],
                     width=980, height=800).update_layout(sliders=slider, xaxis_range=xaxis_range, yaxis_range=[2, 8])
    fig.show()

Happiness_Score vs. GDP¶

One of the biggest criticisms of the World Happiness Report is the almost linear correlation between a country's GDP and Happiness_Score. Meaning countries with a higher GPD will inherently have a higher Happiness_Score, (when in reality that might not be the case) and at the same time making lower GDP countries out to be more unhappier than they might actually be.

plotlyScatterPlot(report_data, 'GDP', 'Happiness_Score', [-0.05, 2.2])

Happiness_Score vs. Family¶

plotlyScatterPlot(report_data, 'Family', 'Happiness_Score', [-0.05, 1.8])

Happiness_Score vs. Life_Expectancy¶

plotlyScatterPlot(report_data, 'Life_Expectancy', 'Happiness_Score', [-0.05, 1.2])

Step 3: Parallel Coordinate Map to Show how Each Columns' Data Points Connect¶

coord_data = go.Parcoords(line = dict(color = report_data['Target_n'], colorscale = 'Temps'), 
                          dimensions=list([
                              dict(range=[report_data['Year'].min(), 
                                          report_data['Year'].max()],
                                   tickvals = report_data['Year'].unique(), 
                                   label='Year', values=report_data['Year']),
                              dict(range=[0, report_data['Target_n'].max()],
                                   tickvals = report_data['Target_n'].unique(), 
                                   ticktext = report_data['Target'].unique(),
                                   label='Targets', values=report_data['Target_n']),
                              dict(range=[(report_data['Rank'] * -1).min(), 
                                          (report_data['Rank'] * -1).max()],
                                   label='Rank', values=(report_data['Rank'] * -1)),
                              dict(range=[report_data['GDP'].min(), 
                                          report_data['GDP'].max()],
                                   label='GDP', values=report_data['GDP']),
                              dict(range=[report_data['Family'].min(), 
                                          report_data['Family'].max()],
                                   label='Family', values=report_data['Family']),
                              dict(range=[report_data['Life_Expectancy'].min(), 
                                          report_data['Life_Expectancy'].max()],
                                   label='Life_Expectancy', values=report_data['Life_Expectancy']),
                              dict(range=[report_data['Freedom'].min(), 
                                          report_data['Freedom'].max()],
                                   label='Freedom', values=report_data['Freedom']),
                              dict(range=[report_data['Generosity'].min(), 
                                          report_data['Generosity'].max()],
                                   label='Generosity', values=report_data['Generosity']),
                              dict(range=[report_data['Gov_Trustworthiness'].min(), 
                                          report_data['Gov_Trustworthiness'].max()],
                                   label='Gov_Trust', values=report_data['Gov_Trustworthiness']),
                              dict(range=[report_data['Happiness_Score'].min(), 
                                          report_data['Happiness_Score'].max()],
                                   label='Happy_Score', values=report_data['Happiness_Score'])
                          ]))

layout = go.Layout(
   title = '''Interactive Parallel Coordinate Plot
              <br><sup>(Click and Drag Vertically Along the Axes to Apply Filters)</sup>''',
   title_y=0.98, height=850, font=dict(size=15, color='black')
)

go.Figure(data=coord_data, layout=layout)

Part 2 Conclusion¶

From the interactive plots, we can see that overall countries seem to be heading towards the right (higher/better scores, which is good because it would not be a good look for the world if countries as whole were getting worse. There were some outliers here and there depending on the metric but so far it seems to hold true that the higher the three highly correlated metrics identified in Part 1, Step 8 are (GDP, Family, and Life_Expectancy), the happier the country is.

Part 3: Data Mapping (Geography)¶

This section will focus on plotting on geo-maps to bring all the data into perspective in the world view. I will be using Basemap from mpl_toolkits.basemap and Choropleth Maps from plotly.express to do the map plotting.

Step 1: Load and Parse World Country Codes¶

To plot maps with Plotly I'll need to use the 3 letter country codes (ISO_Alpha 3) and to do that I'll be scrapping the "Current codes" table from the ISO_3166-1 Wikipedia page using pandas.read_html().

# This ssl line is needed to allow for pandas to load in the table 
# from wikipedia, otherwise an SSL "Invalid Certifcate" error occures
# I'm unsure if this will happen on other systems but I was unable to fix it on mine
ssl._create_default_https_context = ssl._create_unverified_context

# Load in Wikipedia data table
world_codes = pd.read_html('https://en.wikipedia.org/wiki/ISO_3166-1')[1].rename(
    columns={'English short name (using title case)': 'World_Country',
             'Alpha-2 code': 'ISO_a2', 'Alpha-3 code': 'ISO_a3'})


# If for whatever reason pandas is unable to read the data correctly from the wikipedia page above, 
# I have included the data in a csv file to be read from instead: "Wikipedia_ISO_3166-1.csv"

# Uncomment the line below and comment the lines above to read from the csv file instead of the website 
# world_codes = pd.read_csv('Report_Data/Wikipedia_ISO_3166-1.csv')  # (Oct 9th, 2021)


# Get 3 letter country codes from pycountry
countries = {}
for country in pycountry.countries:
    countries[country.alpha_3] = country.name

world_codes = world_codes[world_codes.columns[:-3]]
world_codes['Country'] = [countries.get(country, 'Unknown Code') for country in list(world_codes['ISO_a3'])]

# Parse country names to make sure that they match the names in our dataset
# As you can see there a few that needed to be mapped manually
for country in world_codes['Country']:
    if "Unknown Code" in country:
        world_codes.loc[world_codes.Country == country, 
                        'Country'] = world_codes.loc[world_codes.Country == country, 'World_Country']
    elif "Côte d'Ivoire" == country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Ivory Coast"
    elif "Eswatini" == country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Swaziland"
    elif "Viet Nam" == country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Vietnam"
    elif "Congo" == country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Congo (Brazzaville)"
    elif "Congo," in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Congo (Kinshasa)"
    elif "Korea" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "South Korea"
    elif "Czech" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Czech Republic"
    elif "Russia" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Russia"
    elif "Somali" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Somalia"
    elif "Macedonia" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "North Macedonia"
    elif "Lao" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Laos"
    elif "Palestin" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Palestinian Territories"
    elif "Syria" in country:
        world_codes.loc[world_codes.Country == country, 'Country'] = "Syria"
    else:
        if ',' in country:
            country_part = country.split(',')[0]
            
            if country_part in country:
                world_codes.loc[world_codes.Country == country, 'Country'] = country_part

print("Dataset Dimensions: {:,} columns and {:,} rows".format(world_codes.shape[1], world_codes.shape[0]))
world_codes.head()

Dataset Dimensions: 4 columns and 249 rows

Step 2: Load World Capital Coordinates¶

To visualize the maps using Basemap, I need coordinates (latitude and longitude) for the countries, in this case I'll be using country capitals. This data can be retrieved from this site: http://techslides.com/list-of-countries-and-capitals but i am specifically scraping the webpage (using pandas.read_html()) for the data table because of improper data formatting in the linked downloadable data sources.

map_coords = pd.read_html('http://techslides.com/list-of-countries-and-capitals')[0]

# Apply Headers to dataframe from first row of table
new_header = map_coords.iloc[0]
map_coords = map_coords[1:]
map_coords.columns = [head.replace(' ', '_') for head in new_header]
map_coords = map_coords.apply(pd.to_numeric, errors='ignore')


# If for whatever reason pandas is unable to read the data correctly from the website above, 
# I have included the data in a csv file to be read from instead: "country-capital_coordinates.csv"

# Uncomment the line below and comment the lines above to read from the csv file instead of the website 
# map_coords = pd.read_csv('Report_Data/country-capital_coordinates.csv')  # (Oct 9th, 2021)


# Some manual country parsing to match the dataset
for country in map_coords['Country_Name']:
    if "Cote d’Ivoire" == country:
        map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "Ivory Coast"
    elif "Palestin" in country:
        map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "Palestinian Territories"
    elif "Macedonia" in country:
        map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "North Macedonia"
    elif "Gambia" in country:
        map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "Gambia"
    elif "Republic of Congo" == country:
        map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "Congo (Brazzaville)"
    elif "Democratic Republic of the Congo" in country:
        map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "Congo (Kinshasa)"

print("Dataset Dimensions: {:,} columns and {:,} rows".format(map_coords.shape[1], map_coords.shape[0]))
map_coords.head()

Dataset Dimensions: 6 columns and 245 rows

Step 3: Merge Country Coordinates and Codes (ISO_a2 and ISO_a3) into Consolidated Report Dataset¶

Merge Codes¶

report_data_codes = report_data.merge(world_codes.drop('World_Country', axis=1), on='Country')

print("Dataset Dimensions: {:,} columns and {:,} rows".format(report_data_codes.shape[1], report_data_codes.shape[0]))
report_data_codes.head()

Dataset Dimensions: 14 columns and 775 rows

Merge Coordinates¶

report_data_coords = pd.merge(report_data_codes, 
                              map_coords[['Country_Name', 'Capital_Name', 'Capital_Latitude', 'Capital_Longitude']], 
                              left_on='Country', right_on='Country_Name'
                             ).drop('Country_Name', axis=1).sort_values(by=['Country', 'Year'], ascending=True
                                                                       ).reset_index(drop=True)

print("Dataset Dimensions: {:,} columns and {:,} rows".format(report_data_coords.shape[1], report_data_coords.shape[0]))
report_data_coords.head()

Dataset Dimensions: 17 columns and 775 rows

"Invalid" Countries¶

The below output is a list of countries that do not have a valid country code and thus were not merged correctly into the dataset.

North and Northern Cyprus is recognized as just Cyprus in the dataset
Somaliland Region is recognized as Somalia in the dataset
Kosovo is not recognized

for country in report_data['Country'].unique():
    if country not in list(report_data_coords['Country'].unique()):
        print(country)

North Cyprus
Kosovo
Somaliland region
Somaliland Region
Northern Cyprus

Step 4: World Maps (Basemap)¶

World Map Plotting Function¶

def worldBasemap(df, col1, col2):
    sb.set(style=("white"), font_scale=1.5)
    
    m = Basemap(projection='mill', llcrnrlat=-60, urcrnrlat=90,
                llcrnrlon=-180, urcrnrlon=180, resolution='c')
    
    m.drawcountries()
    m.drawparallels(np.arange(-90, 91., 30.))
    m.drawmeridians(np.arange(-90, 90., 60.))
    
    
    lat = df['Capital_Latitude'].values
    long = df['Capital_Longitude'].values
    
    col_color = df[col1].values
    col_size = df[col2].values
    
    m.scatter(long, lat, latlon=True, c=col_color, s=150*col_size, 
              linewidth=1, edgecolors='black', cmap='hot', alpha=1)
    
    m.fillcontinents(color='#072B57', lake_color='#FFFFFF', alpha=0.4)
    plt.title("World - " + col1 + " vs. " + col2, fontsize=25)
    m.colorbar(label=col1)

Happiness_Score vs. GDP¶

plt.figure(figsize=(16, 10))
worldBasemap(report_data_coords, 'Happiness_Score', 'GDP')

Happiness_Score vs. Family¶

plt.figure(figsize=(16, 10))
worldBasemap(report_data_coords, 'Happiness_Score', 'Family')

Happiness_Score vs. Life_Expectancy¶

plt.figure(figsize=(16, 10))
worldBasemap(report_data_coords, 'Happiness_Score', 'Life_Expectancy')

The world graphs above make it clear that Much of Europe and the Americas are doing the best in terms of the metrics of this report. The graphs would lead you to believe the all of Africa, and much of Asia has a lot more room for development.

Step 5: Europe Maps (Basemap)¶

Europe is kind of hard to see whats going on, so lets zoom in a little.

Europe Map Plotting Function¶

def europeBasemap(df, col1, col2):
    sb.set(style=("white"), font_scale=1.5)
    
    m = Basemap(projection='mill', llcrnrlat=30, urcrnrlat=72,
                llcrnrlon=-20, urcrnrlon=55, resolution='l')
    
    m.drawstates()
    m.drawcountries()
    m.drawparallels(np.arange(-90, 91., 30.))
    m.drawmeridians(np.arange(-90, 90., 60.))
    
    lat = df['Capital_Latitude'].values
    lon = df['Capital_Longitude'].values
    
    col_color = df[col1].values
    col_size = df[col2].values
    
    m.scatter(lon, lat, latlon=True, c=col_color, s=250*col_size, 
              linewidth=2, edgecolors='black', cmap='hot', alpha=1)
    
    m.fillcontinents(color='#072B57', lake_color='#FFFFFF', alpha=0.3)
    plt.title('Europe - ' + col1 + ' vs. ' + col2, fontsize=25)
    m.colorbar(label=col1)

Happiness_Score vs. GDP¶

plt.figure(figsize=(16, 16))
europeBasemap(report_data_coords, 'Happiness_Score', 'GDP')

Happiness_Score vs. Family¶

plt.figure(figsize=(16, 16))
europeBasemap(report_data_coords, 'Happiness_Score', 'Family')

Happiness_Score vs. Life_Expectancy¶

plt.figure(figsize=(16, 16))
europeBasemap(report_data_coords, 'Happiness_Score', 'Life_Expectancy')

From the Europe maps above, we can see that much of northern and central Europe is fairing the best in terms of the metrics, while much of southern Europe is lagging behind.

Step 6: World Maps (Plotly)¶

NOTE: If you are viewing this notebook in nbviewer, the plotly geo-maps will not be rendered because the connections to do so get blocked by the site and I am unable to find a workaround. As a result, if you want to view this notebook in its entirety, you will need to use Binder or a fully functional Jupyter environment instead

The huge benefit of using plotly is that the maps can be animated and/or have filters applied to view the data a bit more dynamically. It makes it much easier to view data on a timescale.

Map Plotting Function¶

def plotlyMap(df, col, scope, height):
    slider = [dict(currentvalue={"prefix": "Year: "})]

    fig = px.choropleth(df.sort_values('Year'), locations="ISO_a3", scope=scope.lower(),
                        color=col, animation_frame="Year", animation_group="Country",
                        hover_name="Country", hover_data=["Year", "Rank", "Family", "Life_Expectancy", 
                                                          "Gov_Trustworthiness"],
                        color_continuous_scale=px.colors.sequential.haline).update_layout(
        autosize=False, height=height, width=980, sliders=slider, 
        title_text = 'Interactive ' + scope.capitalize() + ' Map - ' + col)
    
    fig.show()

Happiness_Score¶

plotlyMap(report_data_coords, 'Happiness_Score', 'world', 600)

GDP¶

plotlyMap(report_data_coords, 'GDP', 'world', 600)

Notable to point out that there was quite the downturn in world GDP in 2018, which appears to be related to a number of economic factors around the world, article: Economic growth is slowing all around the world.

Family¶

plotlyMap(report_data_coords, 'Family', 'world', 600)

World Life_Expectancy¶

plotlyMap(report_data_coords, 'Life_Expectancy', 'world', 600)

It is very interesting to me to see how the world changes from year to year with this data, being able to quickly look at each year of the data and compare them is very beneficial when doing this kind of analysis.

Step 7: Europe Maps (Plotly)¶

Just like in Step 5: Europe Maps (Basemap) section, lets zoom in on Europe.

Happiness_Score¶

plotlyMap(report_data_coords, 'Happiness_Score', 'europe', 750)

GDP¶

plotlyMap(report_data_coords, 'GDP', 'europe', 750)

As we saw on the Plotly - World GDP map, there was quite a downturn in GDP in 2018. Article: Economic growth is slowing all around the world.

Family¶

plotlyMap(report_data_coords, 'Family', 'europe', 750)

Life_Expectancy¶

plotlyMap(report_data_coords, 'Life_Expectancy', 'europe', 750)

From the Europe maps above, we can see that much of northern and central Europe is fairing a bit better in terms of the metrics, while eastern and southern Europe are lagging a bit behind.

Part 3 Conclusion - Final Analysis¶

From the analysis in notebook, it seems like some of the criticism for "The World Happiness Report" ring true, there is a high focus on a country's GDP along with strongly correlated features such as Family and Life_Expectancy.

It does make sense to an extent that not only having money but also having a good social net (Family) is important and does make it easier for people to advance in life in whatever direction they so choose. This also translates quite well to Life_Expectancy because of a greater ability to provide for yourself (and your Family), thus having access to better options in general.

Suffice to say, money can indeed buy happiness.

	Country	Region	Happiness Rank	Happiness Score	Standard Error	Economy (GDP per Capita)	Family	Health (Life Expectancy)	Freedom	Trust (Government Corruption)	Generosity	Dystopia Residual
0	Switzerland	Western Europe	1	7.587	0.03411	1.39651	1.34951	0.94143	0.66557	0.41978	0.29678	2.51738
1	Iceland	Western Europe	2	7.561	0.04884	1.30232	1.40223	0.94784	0.62877	0.14145	0.43630	2.70201
2	Denmark	Western Europe	3	7.527	0.03328	1.32548	1.36058	0.87464	0.64938	0.48357	0.34139	2.49204
3	Norway	Western Europe	4	7.522	0.03880	1.45900	1.33095	0.88521	0.66973	0.36503	0.34699	2.46531
4	Canada	North America	5	7.427	0.03553	1.32629	1.32261	0.90563	0.63297	0.32957	0.45811	2.45176

	Country	Region	Happiness Rank	Happiness Score	Lower Confidence Interval	Upper Confidence Interval	Economy (GDP per Capita)	Family	Health (Life Expectancy)	Freedom	Trust (Government Corruption)	Generosity	Dystopia Residual
0	Denmark	Western Europe	1	7.526	7.460	7.592	1.44178	1.16374	0.79504	0.57941	0.44453	0.36171	2.73939
1	Switzerland	Western Europe	2	7.509	7.428	7.590	1.52733	1.14524	0.86303	0.58557	0.41203	0.28083	2.69463
2	Iceland	Western Europe	3	7.501	7.333	7.669	1.42666	1.18326	0.86733	0.56624	0.14975	0.47678	2.83137
3	Norway	Western Europe	4	7.498	7.421	7.575	1.57744	1.12690	0.79579	0.59609	0.35776	0.37895	2.66465
4	Finland	Western Europe	5	7.413	7.351	7.475	1.40598	1.13464	0.81091	0.57104	0.41004	0.25492	2.82596

	Country	Happiness.Rank	Happiness.Score	Whisker.high	Whisker.low	Economy..GDP.per.Capita.	Family	Health..Life.Expectancy.	Freedom	Generosity	Trust..Government.Corruption.	Dystopia.Residual
0	Norway	1	7.537	7.594445	7.479556	1.616463	1.533524	0.796667	0.635423	0.362012	0.315964	2.277027
1	Denmark	2	7.522	7.581728	7.462272	1.482383	1.551122	0.792566	0.626007	0.355280	0.400770	2.313707
2	Iceland	3	7.504	7.622030	7.385970	1.480633	1.610574	0.833552	0.627163	0.475540	0.153527	2.322715
3	Switzerland	4	7.494	7.561772	7.426227	1.564980	1.516912	0.858131	0.620071	0.290549	0.367007	2.276716
4	Finland	5	7.469	7.527542	7.410458	1.443572	1.540247	0.809158	0.617951	0.245483	0.382612	2.430182

	Overall rank	Country or region	Score	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices	Generosity	Perceptions of corruption
0	1	Finland	7.632	1.305	1.592	0.874	0.681	0.202	0.393
1	2	Norway	7.594	1.456	1.582	0.861	0.686	0.286	0.340
2	3	Denmark	7.555	1.351	1.590	0.868	0.683	0.284	0.408
3	4	Iceland	7.495	1.343	1.644	0.914	0.677	0.353	0.138
4	5	Switzerland	7.487	1.420	1.549	0.927	0.660	0.256	0.357

	Overall rank	Country or region	Score	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices	Generosity	Perceptions of corruption
0	1	Finland	7.769	1.340	1.587	0.986	0.596	0.153	0.393
1	2	Denmark	7.600	1.383	1.573	0.996	0.592	0.252	0.410
2	3	Norway	7.554	1.488	1.582	1.028	0.603	0.271	0.341
3	4	Iceland	7.494	1.380	1.624	1.026	0.591	0.354	0.118
4	5	Netherlands	7.488	1.396	1.522	0.999	0.557	0.322	0.298

	Country	Year	Rank	GDP	Family	Life_Expectancy	Freedom	Generosity	Gov_Trustworthiness	Happiness_Score	Target	Target_n
0	Switzerland	2015	1	1.39651	1.34951	0.94143	0.66557	0.29678	0.41978	7.587	Top	3
1	Iceland	2015	2	1.30232	1.40223	0.94784	0.62877	0.43630	0.14145	7.561	Top	3
2	Denmark	2015	3	1.32548	1.36058	0.87464	0.64938	0.34139	0.48357	7.527	Top	3
3	Norway	2015	4	1.45900	1.33095	0.88521	0.66973	0.34699	0.36503	7.522	Top	3
4	Canada	2015	5	1.32629	1.32261	0.90563	0.63297	0.45811	0.32957	7.427	Top	3

	Year	Rank	GDP	Family	Life_Expectancy	Freedom	Generosity	Gov_Trustworthiness	Happiness_Score
count	782.000000	782.000000	782.000000	782.000000	782.000000	782.000000	782.000000	781.000000	782.000000
mean	2016.993606	78.698210	0.916047	1.078392	0.612416	0.411091	0.218576	0.125436	5.379018
std	1.417364	45.182384	0.407340	0.329548	0.248309	0.152880	0.122321	0.105816	1.127456
min	2015.000000	1.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	2.693000
25%	2016.000000	40.000000	0.606500	0.869363	0.440183	0.309767	0.130000	0.054000	4.509750
50%	2017.000000	79.000000	0.982205	1.124735	0.647310	0.431000	0.201980	0.091000	5.322000
75%	2018.000000	118.000000	1.236187	1.327250	0.808000	0.531000	0.278832	0.156030	6.189500
max	2019.000000	158.000000	2.096000	1.644000	1.141000	0.724000	0.838080	0.551910	7.769000

	Country	Year	Rank	GDP	Family	Life_Expectancy	Freedom	Generosity	Gov_Trustworthiness	Happiness_Score	Target	Target_n
489	United Arab Emirates	2018	20	2.09600	0.77600	0.67000	0.28400	0.18600	NaN	6.774	Top	3
349	Qatar	2017	35	1.87077	1.27430	0.71010	0.60413	0.33047	0.43930	6.375	Top	3
193	Qatar	2016	36	1.82427	0.87964	0.71723	0.56679	0.32388	0.48049	6.375	Top	3
332	Luxembourg	2017	18	1.74194	1.45758	0.84509	0.59663	0.28318	0.31883	6.863	Top	3
177	Luxembourg	2016	20	1.69752	1.03999	0.84542	0.54870	0.27571	0.35329	6.871	Top	3

	Country	Year	Rank	Family	Life_Expectancy	Freedom	Generosity	Gov_Trustworthiness	Happiness_Score	Target	Target_n
567	Somalia	2018	98	0.71200	0.11500	0.67400	0.23800	0.28200	4.982	Low-Mid	1
233	Somalia	2016	76	0.33613	0.11466	0.56778	0.27225	0.31180	5.440	Top-Mid	2
469	Central African Republic	2017	155	0.00000	0.01877	0.27084	0.28088	0.05657	2.693	Low	0
737	Somalia	2019	112	0.69800	0.26800	0.55900	0.24300	0.27000	4.668	Low-Mid	1
119	Congo (Kinshasa)	2015	120	1.00120	0.09806	0.22605	0.24834	0.07625	4.517	Low	0

	Country	Year	Rank	GDP	Family	Life_Expectancy	Freedom	Generosity	Gov_Trustworthiness	Happiness_Score	Target	Target_n
473	Iceland	2018	4	1.34300	1.64400	0.91400	0.67700	0.35300	0.13800	7.495	Top	3
629	Iceland	2019	4	1.38000	1.62400	1.02600	0.59100	0.35400	0.11800	7.494	Top	3
317	Iceland	2017	3	1.48063	1.61057	0.83355	0.62716	0.47554	0.15353	7.504	Top	3
477	New Zealand	2018	8	1.26800	1.60100	0.87600	0.66900	0.36500	0.38900	7.324	Top	3
470	Finland	2018	1	1.30500	1.59200	0.87400	0.68100	0.20200	0.39300	7.632	Top	3

	Country	Year	Rank	GDP	Life_Expectancy	Freedom	Generosity	Gov_Trustworthiness	Happiness_Score	Target
312	Togo	2016	155	0.28123	0.24811	0.34678	0.17517	0.11587	3.303	Low
624	Central African Republic	2018	155	0.02400	0.01000	0.30500	0.21800	0.03800	3.083	Low
469	Central African Republic	2017	155	0.00000	0.01877	0.27084	0.28088	0.05657	2.693	Low
780	Central African Republic	2019	155	0.02600	0.10500	0.22500	0.23500	0.03500	3.083	Low
147	Central African Republic	2015	148	0.07850	0.06699	0.48879	0.23835	0.08289	3.678	Low

	Country	Year	Rank	GDP	Family	Life_Expectancy	Freedom	Generosity	Gov_Trustworthiness	Happiness_Score	Target	Target_n
659	Singapore	2019	34	1.572	1.463	1.141	0.556	0.271	0.453	6.262	Top	3
701	Hong Kong	2019	76	1.438	1.277	1.122	0.440	0.258	0.287	5.430	Top-Mid	2
683	Japan	2019	58	1.327	1.419	1.088	0.445	0.069	0.140	5.886	Top-Mid	2
655	Spain	2019	30	1.286	1.484	1.062	0.362	0.153	0.079	6.354	Top	3
631	Switzerland	2019	6	1.452	1.526	1.052	0.572	0.263	0.343	7.480	Top	3

	Country	Year	Rank	GDP	Family	Freedom	Generosity	Gov_Trustworthiness	Happiness_Score	Target	Target_n
760	Swaziland	2019	135	0.81100	1.14900	0.31300	0.07400	0.13500	4.212	Low	0
122	Sierra Leone	2015	123	0.33024	0.95571	0.40840	0.21488	0.08786	4.507	Low	0
268	Sierra Leone	2016	111	0.36485	0.62800	0.30685	0.23897	0.08196	4.635	Low-Mid	1
453	Lesotho	2017	139	0.52102	1.19010	0.39066	0.15750	0.11909	3.808	Low	0
582	Sierra Leone	2018	113	0.25600	0.81300	0.35500	0.23800	0.05300	4.571	Low-Mid	1

	Country	Year	Rank	GDP	Family	Life_Expectancy	Freedom	Generosity	Gov_Trustworthiness	Happiness_Score	Target	Target_n
513	Uzbekistan	2018	44	0.719	1.584	0.605	0.724	0.328	0.259	6.096	Top-Mid	2
589	Cambodia	2018	120	0.549	1.088	0.457	0.696	0.256	0.065	4.433	Low	0
471	Norway	2018	2	1.456	1.582	0.861	0.686	0.286	0.340	7.594	Top	3
472	Denmark	2018	3	1.351	1.590	0.868	0.683	0.284	0.408	7.555	Top	3
470	Finland	2018	1	1.305	1.592	0.874	0.681	0.202	0.393	7.632	Top	3

	Country	Year	Rank	GDP	Family	Life_Expectancy	Generosity	Gov_Trustworthiness	Happiness_Score	Target	Target_n
290	Sudan	2016	133	0.63069	0.81928	0.29759	0.18077	0.10039	4.139	Low	0
779	Afghanistan	2019	154	0.35000	0.51700	0.36100	0.15800	0.02500	3.203	Low	0
454	Angola	2017	140	0.85843	1.10441	0.04987	0.09793	0.06972	3.795	Low	0
611	Angola	2018	142	0.73000	1.12500	0.26900	0.07900	0.06100	3.795	Low	0
111	Iraq	2015	112	0.98549	0.81889	0.60237	0.17922	0.13788	4.677	Low-Mid	1

	Country	Year	Rank	GDP	Family	Life_Expectancy	Freedom	Generosity	Gov_Trustworthiness	Happiness_Score	Target	Target_n
428	Myanmar	2017	114	0.36711	1.12324	0.39752	0.51449	0.83808	0.18882	4.545	Low-Mid	1
276	Myanmar	2016	119	0.34112	0.69981	0.39880	0.42692	0.81971	0.20243	4.395	Low	0
128	Myanmar	2015	129	0.27108	0.70905	0.48246	0.44017	0.79588	0.19034	4.307	Low	0
395	Indonesia	2017	81	0.99554	1.27444	0.49235	0.44332	0.61170	0.01532	5.262	Low-Mid	1
599	Myanmar	2018	130	0.68200	1.17400	0.42900	0.58000	0.59800	0.17800	4.308	Low	0

	Country	Year	Rank	GDP	Family	Life_Expectancy	Freedom	Gov_Trustworthiness	Happiness_Score	Target	Target_n
548	Greece	2018	79	1.15400	1.20200	0.87900	0.13100	0.04400	5.358	Low-Mid	1
101	Greece	2015	102	1.15406	0.92933	0.88213	0.07699	0.01397	4.857	Low-Mid	1
256	Greece	2016	99	1.24886	0.75473	0.80029	0.05822	0.04127	5.033	Low-Mid	1
707	Greece	2019	82	1.18100	1.15600	0.99900	0.06700	0.03400	5.287	Low-Mid	1
401	Greece	2017	87	1.28949	1.23941	0.81020	0.09573	0.04329	5.227	Low-Mid	1

	Country	Year	Rank	GDP	Family	Life_Expectancy	Freedom	Generosity	Gov_Trustworthiness	Happiness_Score	Target	Target_n
153	Rwanda	2015	154	0.22208	0.77370	0.42864	0.59201	0.22628	0.55191	3.465	Low	0
27	Qatar	2015	28	1.69042	1.07860	0.79733	0.64040	0.32573	0.52208	6.611	Top	3
309	Rwanda	2016	152	0.32846	0.61586	0.31865	0.54320	0.23552	0.50521	3.515	Low	0
23	Singapore	2015	24	1.52186	1.02000	1.02525	0.54252	0.31105	0.49210	6.798	Top	3
2	Denmark	2015	3	1.32548	1.36058	0.87464	0.64938	0.34139	0.48357	7.527	Top	3

	Country	Year	Rank	GDP	Family	Life_Expectancy	Freedom	Generosity	Happiness_Score	Target	Target_n
536	Moldova	2018	67	0.65700	1.30100	0.62000	0.23200	0.17100	5.640	Top-Mid	2
244	Bosnia and Herzegovina	2016	87	0.93383	0.64367	0.70766	0.09511	0.29889	5.163	Low-Mid	1
696	Moldova	2019	71	0.68500	1.32800	0.73900	0.24500	0.18100	5.529	Top-Mid	2
404	Bosnia and Herzegovina	2017	90	0.98241	1.06934	0.70519	0.20440	0.32887	5.182	Low-Mid	1
73	Indonesia	2015	74	0.82827	1.08708	0.63793	0.46611	0.51535	5.399	Top-Mid	2

	Country	Year	Rank	GDP	Family	Life_Expectancy	Freedom	Generosity	Gov_Trustworthiness	Happiness_Score	Target	Target_n
626	Finland	2019	1	1.34000	1.58700	0.98600	0.59600	0.15300	0.39300	7.769	Top	3
470	Finland	2018	1	1.30500	1.59200	0.87400	0.68100	0.20200	0.39300	7.632	Top	3
627	Denmark	2019	2	1.38300	1.57300	0.99600	0.59200	0.25200	0.41000	7.600	Top	3
471	Norway	2018	2	1.45600	1.58200	0.86100	0.68600	0.28600	0.34000	7.594	Top	3
0	Switzerland	2015	1	1.39651	1.34951	0.94143	0.66557	0.29678	0.41978	7.587	Top	3

	World_Country	ISO_a2	ISO_a3	Country
0	Afghanistan	AF	AFG	Afghanistan
1	Åland Islands	AX	ALA	Åland Islands
2	Albania	AL	ALB	Albania
3	Algeria	DZ	DZA	Algeria
4	American Samoa	AS	ASM	American Samoa

	Country_Name	Capital_Name	Capital_Latitude	Capital_Longitude	Country_Code	Continent_Name
1	Afghanistan	Kabul	34.516667	69.183333	AF	Asia
2	Aland Islands	Mariehamn	60.116667	19.900000	AX	Europe
3	Albania	Tirana	41.316667	19.816667	AL	Europe
4	Algeria	Algiers	36.750000	3.050000	DZ	Africa
5	American Samoa	Pago Pago	-14.266667	-170.700000	AS	Australia

	Country	Year	Rank	GDP	Family	Life_Expectancy	Freedom	Generosity	Gov_Trustworthiness	Happiness_Score	Target	ISO_a2	ISO_a3	Capital_Name	Capital_Latitude	Capital_Longitude
0	Afghanistan	2015	153	0.31982	0.30285	0.30335	0.23414	0.36510	0.09719	3.575	Low	AF	AFG	Kabul	34.516667	69.183333
1	Afghanistan	2016	154	0.38227	0.11037	0.17344	0.16430	0.31268	0.07112	3.360	Low	AF	AFG	Kabul	34.516667	69.183333
2	Afghanistan	2017	141	0.40148	0.58154	0.18075	0.10618	0.31187	0.06116	3.794	Low	AF	AFG	Kabul	34.516667	69.183333
3	Afghanistan	2018	145	0.33200	0.53700	0.25500	0.08500	0.19100	0.03600	3.632	Low	AF	AFG	Kabul	34.516667	69.183333
4	Afghanistan	2019	154	0.35000	0.51700	0.36100	0.00000	0.15800	0.02500	3.203	Low	AF	AFG	Kabul	34.516667	69.183333