Predicting Air Quality in Wildfire Zones
Project Overview
- Developing machine learning models to predict air quality using diverse data sources
- Created two supporting R packages to streamline research workflow
- Processing and analyzing massive environmental and traffic datasets
Components
-
Air Quality Prediction Model
- Data cleansing and preparation
- Exploratory data analysis
- Geospatial operations
- Dataset integration
- Feature engineering
- Processing 100M+ rows of environmental data
- View Project
-
Custom R Packages
- purpleAirAPI: Streamlined PurpleAir data collection
- DataOverviewR: Automated EDA and reporting
Technical Stack
- Languages: R
- Key Libraries: sf, dplyr, ggplot2, randomForest
- Data Sources: PurpleAir sensors, OpenStreetMap, Uber Movement, Air Quality System (US EPA), Wildland Fire Perimeters (CalFire), Weather (Iowa Environmental Mesonet)