Skip to content

Data Preparation

The pre-analysis/ workspace sits upstream of EPM — it turns raw external datasets into clean, model-ready CSV inputs. It is split into two stages:

Stage Folder Role
Open data pre-analysis/open-data/ Download, QA, and harmonize external datasets (APIs, shapefiles, atlases)
Prepare data pre-analysis/prepare-data/ Reshape curated datasets into EPM-ready CSVs (demand profiles, hydro availability, VRE profiles, representative days)

Convention: drop raw and intermediate files in input/, keep notebook outputs in output/. Only copy vetted deliverables into epm/input/data_<region>/ to keep the model folder clean.


prepare-data workflows

Notebook Purpose Key outputs
climatic_overview.ipynb Profiles ERA5-Land climate to define seasons, wet/dry periods, and representative years per zone Climate diagnostics and summary CSVs
load_profile.ipynb Builds hourly demand profiles from monthly means and hourly shapes load_profile.csv
load_profile_treatment.ipynb Cleans historical load data (outlier removal, gap filling) load_profile_treated.csv
load_plot.ipynb QA plots for demand forecasts (peak vs average, growth trends) PNG/HTML dashboards
representative_days.ipynb Clusters climate and load time series into reduced time slices pHours.csv · pDemandProfile.csv · pVREProfile.csv
supply_demand_balance.ipynb Checks that supply meets demand before running GAMS; flags deficits Balance tables and plots
hydro_availability.ipynb Converts monthly hydro shapes into reservoir availability and ROR profiles pAvailabilityCustom.csv · pVREgenProfile.csv
hydro_representative_years.ipynb Selects representative hydropower years (dry/baseline/wet) Candidate pAvailability_*.csv (review manually)
utils_climatic.py Shared helpers for ERA5 extraction, aggregation, and plotting
legacy_to_new_format/ Migrates legacy SPLAT/EPM spreadsheets to the current column format Intermediate CSVs

open-data notebooks

Notebook Focus Outputs
get_renewables_irena_data.ipynb IRENA wind/solar capacity-factor profiles by zone and season Hourly CF CSVs
get_renewable_ninja_data.ipynb Renewable Ninja API — solar/wind profiles from plant coordinates Hourly CF CSVs
get_renewables_coordinate.ipynb Builds coordinate list (lat/lon) from generation catalog for Renewable Ninja Coordinate CSV
get_generation_maps.ipynb Interactive maps to verify generation database coverage and technology tagging HTML/PNG maps
hydro_atlas_comparison.ipynb Compares utility capacity factors with the African Hydropower Atlas QA plots and comparison tables
hydro_basins.ipynb Links plants to upstream GRDC catchments via HydroRIVERS shapefiles GeoDataFrames and maps
hydro_capacity_factors.ipynb (WIP) Merges African Hydropower Atlas with Global Hydropower Tracker Draft merged tables
hydro_inflow.ipynb Processes GRDC NetCDF inflow data and exports runoff diagnostics Cleaned CSVs and Folium maps

Once exploratory outputs look correct, feed them into the deterministic routines in prepare-data/.