Preanalysis Folder Overview#

This preanalysis folder contains all scripts, notebooks, and utilities used to prepare input data for the EPM (Electricity Planning Model).

The structure is organized into thematic subfolders for different areas of pre-processing:

  • climatic data

  • generation data

  • hydro data

  • load data

  • representative days

Each thematic folder contains:

  • Jupyter notebooks for data analysis or processing

  • Python utility modules for reusable functions

  • input/ folders for raw or intermediate data inputs

  • output/ folders for storing processed results


Objective#

The preanalysis step’s main objective is to produce clean, consistent input datasets compatible with the EPM model. Each data area prepares specific inputs:

  • climatic → time series of renewables and climate conditions

  • generation → installed capacities and plant data

  • hydro → inflow, capacity, and basin-level data

  • load → demand profiles

  • representative days → reduced time slices for model efficiency

By organizing preanalysis this way, the workflow ensures efficient updates and traceability when new data or regions are introduced into EPM.


Folder Structure Overview#

Folder / File

Description

climatic/

Prepares climate and renewable resource data, including retrieval from Renewable Ninja.

    ├─ climatic_overview.ipynb

Overview of climatic datasets and statistics.

    ├─ get_renewable_ninja_data.ipynb

Downloads and processes data from Renewable Ninja API.

    ├─ utils_climatic.py

Python functions for climate data manipulation.

    ├─ utils_ninja.py

Utilities for accessing Renewable Ninja API.

    ├─ input/

Folder for raw climate-related input data.

    └─ output/

Folder for processed climatic outputs.

generation/

Handles generation capacity, coordinates, and global datasets for power plants.

    ├─ clean_generation_epm.ipynb

Cleans generation data for EPM input format.

    ├─ get_renewables_coordinate.ipynb

Retrieves geocoordinates for renewable plants.

    ├─ global_database_overview.ipynb

Summarizes global generation databases.

    ├─ input/

Folder for generation-related raw inputs.

    └─ output/

Folder for generation data outputs.

hydro/

Focused on hydropower capacity, inflows, and atlas comparisons.

    ├─ hydro_atlas_comparison.ipynb

Compares hydropower datasets (e.g. Hydro Atlas).

    ├─ hydro_basins_maps.ipynb

Maps hydro basins and resources.

    ├─ hydro_capacity (in progress).ipynb

Under development; processes hydro capacity data.

    ├─ hydro_capacity_factor.ipynb

Computes hydro capacity factors for EPM.

    ├─ hydro_inflow_analysis.ipynb

Analyses historical inflows for hydro modeling.

    ├─ input/

Folder for hydro raw inputs.

    └─ output/

Folder for processed hydro outputs.

load/

Placeholder for load-related preanalysis scripts and data.

representative_days/

Manages clustering and creation of representative days for EPM simulations.

    ├─ representative_days.ipynb

Notebook to compute representative days from time series data.

    ├─ utils_reprdays.py

Python utilities for clustering and representative days calculations.

    ├─ gams/

GAMS-specific resources related to representative days.

    ├─ input/

Folder for raw data used for clustering.

    └─ output/

Folder for outputs like cluster assignments or representative days timeseries.


Rationale: Input / Output Folders#

Each thematic subfolder follows a consistent pattern:

  • input/ — stores:

    • raw external datasets

    • intermediate cleaned datasets

    • files downloaded from APIs or third-party tools

  • output/ — stores:

    • processed data ready to feed into the EPM model

    • summary statistics

    • visualizations and derived indicators

This separation ensures:

  • reproducibility of data pipelines

  • clarity in tracking data provenance

  • easy integration of updates from new input data sources