Table 3 Dataset detail and count before and after Data Pre-processing.

Pre-processing step	Before Pre-processing	After Pre-processing	Details
Total records	1,200,000	1,150,000	50,000 records were removed due to duplicates or incomplete timestamps
Missing values	15% of the total data	0%	Missing values are addressed using linear interpolation and advanced imputation techniques
Outliers	20,000	0	Outliers detected using IQR and z-score methods were either removed or replaced with median values
Categorical data (e.g., region)	10 unique values	10 one-hot encoded vectors	Regions are converted to binary vectors using one-hot encoding
Spatial features (e.g., AOD)	200,000 rows with missing AOD	200,000 rows completed	Missing satellite features interpolated using geospatial mapping techniques
Time-series records	1,200,000	1,150,000	Temporal inconsistencies were corrected by aligning timestamps across all data sources
Noise reduction (PM2.5)	High variability	Smoothed trends	Noise is reduced using wavelet transforms, retaining meaningful patterns
Augmented data	0	50,000 additional samples	Time series augmented with jittering and synthetic transformations
Dimensionality reduction	500 features	250 features	PCA was applied to reduce the redundant dimensions of satellite and meteorological data

Quick links

Search