Machine Learning Based Air Quality Prediction:

New Delhi Okhla Phase II Area Case Study

Sahil Negi

Computers can now learn without being explicitly programmed thanks to the discipline of machine learning.One of the most intriguing technologies ever is machine learning. As the name suggests, it gives computers what makes them more human: the ability to learn.

Air pollution has been one of the most important problems in human evolution over the last century because of its detrimental impact on the ecosystem of humans. The research presented in this article focused on using machine learning to predict behavior related to air quality.

Central Pollution Control Board: CPCB is a source from where data is collected to validate the model. In this paper, the Okhla Phase II area in New Delhi was selected as the study target, which suffers from severe air pollution. Based on CPCB data on major air pollutants from February 2018 to November 2022. The paper initially included monthly air quality assessments. The findings show that air quality in the Okhla Phase II region generally follows the same trend with respect to historical assessments during the study period. According to this study, a significant proportion of Okhla phase II air pollution is attributed to the pollutants PM2.5, PM10, NO2, and CO. Therefore, the study was conducted to determine air quality based on Particulate Matter 2.5, Particulate Matter 10, Nitrogen Dioxide, and Carbon Monoxide pollution concentrations in Okhla phase II. Air quality prediction used data based on CPCB from February 2018 to November 2022 for key air pollutants and machine learning techniques were used. These methods include linear regression, decision tree regressor, random forest regressor, support vector regressors, and the K Nearest Neighbour method for air quality prediction. The Study found that Random Forest Regressor was the most reliable algorithm for predicting air pollution, with a result of 99.3%.

Combined PM2.5, PM10, NO2, and CO

Check the Pairplot Graph.

Sources of pollution

We know that there are 5 key sources of pollution:

a. Vehicles – There is an increasing number of highly polluting vehicles such as trucks and diesel vehicles, and vehicles that negate the impact of clean fuel and emission technologies.

b. Combustion in power plants and industries using dirty fuels such as petroleum coke, FO (and its variants), coal,  biomass

c. Waste incineration, landfill, and other collection, and treatment.

d. Dust management on roads, construction sites, etc. causes fine dust pollution.

e. Farmers burn crop residues as they have no choice but to use straws.

The Air Quality Index

It Shows the Air Category wrt AQI Range

By CPCB

One of the biggest issues affecting individuals in metropolitan areas is air pollution. The issue is caused by a huge number of motor vehicles, industrial output emissions, and the combustion of petroleum products for power generation and transportation.

Over the past decades, two general approaches of have been used to predict air pollution: deterministic and probabilistic. One of the deterministic techniques created in diverse places to analyze and monitor air pollution is diffusion modeling. These models’ results are influenced by their input data, and their use requires access to data on the distribution and diffusion of pollutants in the atmosphere Therefore, it is sufficient to use these models. Since the data collecting necessary for diffusion models is challenging and impractical on a broad scale, researchers have moved to more effective techniques like statistical modeling. Statistical approaches have more uses for forecasting air pollution than deterministic methods do.

Download Research Paper

Download Python Code