Air Pollution Data Scraping and Analysis of Seoul (Districts), South Korea

Overview #

This project focused on collection and analysis air pollution data from the districts in Seoul, South Korea from multiple different data sources and merges them together.

The AQI (Air Quality Index) is also calculated and appended to the dataset based on the PM2.5 measurements in accordance with the different levels specified by the EPA (Environmental Protection Agency).

This project has four parts:

Part 1 - Data Collection from CSV #

This part deals with parsing a CSV dataset (from Kaggle) containing air pollution data from different districts in Seoul.

View Jupyter Notebook Part-1 #

The notebook goes step-by-step through the project, please follow the directions and cell order if you would like to replicate the results.

  • Click the "View Notebook" button to open the rendered notebook in a new tab
  • Click the "GitHub" button to view the project in the GitHub portfolio repo
View Notebook GitHub

Part 2 - Data Collection from Web-Scraping #

This section scrapes air pollution data from the KOSIS (KOrean Statistical Information Service), which is the official source of statistical information for South Korea.

View Jupyter Notebook Part-2 #

The notebook goes step-by-step through the project, please follow the directions and cell order if you would like to replicate the results.

  • Click the "View Notebook" button to open the rendered notebook in a new tab
  • Click the "GitHub" button to view the project in the GitHub portfolio repo
View Notebook GitHub

Part 3 - Data Collection from an API #

The data in this section is collected from the AQICN API (docs), which is an API for retrieving air quality data from countries around the world. This organization works in conjunction with WAQI (World Air Quality Index) and was originally founded in Beijing, China.

View Jupyter Notebook Part-3 #

The notebook goes step-by-step through the project, please follow the directions and cell order if you would like to replicate the results.

  • Click the "View Notebook" button to open the rendered notebook in a new tab
  • Click the "GitHub" button to view the project in the GitHub portfolio repo
View Notebook GitHub

Part 4 - Merging, Querying and Graphing Data from SQLite Database #

This final section merges all the different datasets together into a SQLite database to query and create graphs from.

Below is a graph of all the merged data, color coordinated by the AQI for that time period:

NOTE: The gap in the graph is the gap in time between the CSV/Web and API’s data collection because the API data was collected far more recently than the collected CSV and Web data.

View Jupyter Notebook Part-4 #

The notebook goes step-by-step through the project, please follow the directions and cell order if you would like to replicate the results.

  • Click the "View Notebook" button to open the rendered notebook in a new tab
  • Click the "GitHub" button to view the project in the GitHub portfolio repo
View Notebook GitHub