Air Pollution Data Scraping and Analysis of Seoul (Districts), South Korea
Date: 31 Jul 2020 Tag(s): Jupyter Notebook, Python, Pandas, SQLite, Database, Air Pollution, South Korea, AQI Categories: Data Analysis, Data Scraping Download Project: 25 MB - Zip ArchiveOverview #
This project focused on collection and analysis air pollution data from the districts in Seoul, South Korea from multiple different data sources and merges them together.
The AQI (Air Quality Index) is also calculated and appended to the dataset based on the PM2.5 measurements in accordance with the different levels specified by the EPA (Environmental Protection Agency).
This project has four parts:
- Part 1 - Data Collection from CSV
- Part 2 - Data Collection from Web-Scraping
- Part 3 - Data Collection from an API
- Part 4 - Merging, Querying and Graphing Data from SQlite Database
Part 1 - Data Collection from CSV #
This part deals with parsing a CSV dataset (from Kaggle) containing air pollution data from different districts in Seoul.
View Jupyter Notebook Part-1 #
The notebook goes step-by-step through the project, please follow the directions and cell order if you would like to replicate the results.
- Click the "View Notebook" button to open the rendered notebook in a new tab
- Click the "GitHub" button to view the project in the GitHub portfolio repo
Part 2 - Data Collection from Web-Scraping #
This section scrapes air pollution data from the KOSIS (KOrean Statistical Information Service), which is the official source of statistical information for South Korea.
View Jupyter Notebook Part-2 #
The notebook goes step-by-step through the project, please follow the directions and cell order if you would like to replicate the results.
- Click the "View Notebook" button to open the rendered notebook in a new tab
- Click the "GitHub" button to view the project in the GitHub portfolio repo
Part 3 - Data Collection from an API #
The data in this section is collected from the AQICN API (docs), which is an API for retrieving air quality data from countries around the world. This organization works in conjunction with WAQI (World Air Quality Index) and was originally founded in Beijing, China.
View Jupyter Notebook Part-3 #
The notebook goes step-by-step through the project, please follow the directions and cell order if you would like to replicate the results.
- Click the "View Notebook" button to open the rendered notebook in a new tab
- Click the "GitHub" button to view the project in the GitHub portfolio repo
Part 4 - Merging, Querying and Graphing Data from SQLite Database #
This final section merges all the different datasets together into a SQLite database to query and create graphs from.
Below is a graph of all the merged data, color coordinated by the AQI for that time period:
NOTE: The gap in the graph is the gap in time between the CSV/Web and API’s data collection because the API data was collected far more recently than the collected CSV and Web data.
View Jupyter Notebook Part-4 #
The notebook goes step-by-step through the project, please follow the directions and cell order if you would like to replicate the results.
- Click the "View Notebook" button to open the rendered notebook in a new tab
- Click the "GitHub" button to view the project in the GitHub portfolio repo