Get Renewables Coordinates

Get Renewables Coordinates#

Objective Identify the most representative solar and wind project coordinates per Central African country so other notebooks can reuse consistent lat/long inputs.

Data Requirements & Methods

Provide Global-Integrated-Power-April-2025.xlsx under pre-analysis/open-data/input/.
The workflow filters the Global Integrated Power dataset for focus countries and renewable technologies, prioritizes projects by status and capacity, and saves both tabular and dictionary outputs.

Overview of Steps

Step 1 - Import core libraries.
Step 2 - List the target countries.
Step 3 - Set up input and output folders.
Step 4 - Load and filter facility data.
Step 5 - Select the most relevant solar and wind projects.
Step 6 - Display and export the summary table.
Step 7 - Format coordinates for downstream scripts.

Step 1 - Import required libraries#

import os
import pandas as pd

Step 2 - Define target countries#

countries = ['Angola', 'Burundi', 'Cameroon', 'Central African Republic', 'Chad', 'Republic of the Congo', 'DR Congo', 'Equatorial Guinea', 'Gabon']

Step 3 - Prepare input and output folders#

folder_input = 'input'
if not os.path.exists(folder_input): os.mkdir(folder_input)
folder_output = 'output'
if not os.path.exists(folder_output): os.mkdir(folder_output)

Step 4 - Load and filter the power facilities data#

path = os.path.join(folder_input, 'Global-Integrated-Power-April-2025.xlsx')
if not os.path.exists(path):
    raise FileNotFoundError(f"File not found: {path}. Please ensure the file exists in the specified directory.")

data_power = pd.read_excel(path, sheet_name='Power facilities', index_col=None, header=[0])

# Print if missing countries
for c in countries:
    if c not in data_power['Country/area'].unique():
        print(f"Warning: {c} not found in the dataset.")

data_power = data_power[data_power['Country/area'].isin(countries)]
print(data_power['Country/area'].unique())
data_power = data_power[data_power['Type'].isin(['solar', 'wind'])]

/opt/anaconda3/envs/HydroData/lib/python3.10/site-packages/openpyxl/worksheet/_read_only.py:85: UserWarning: Unknown extension is not supported and will be removed
  for idx, row in parser.parse():

Step 5 - Extract the most relevant projects#

def get_most_relevant_projects(df, countries, tech_types):
    """
    Extract the most relevant energy project for each country and technology.

    Parameters:
    - df: pandas DataFrame with columns ['Country/area', 'Type', 'Status', 'Capacity (MW)',
                                         'Plant / Project name', 'Latitude', 'Longitude', 'City']
    - countries: list of country names to filter
    - tech_types: list of technology types to check (values from 'Type' column)

    Returns:
    - A DataFrame with the most relevant project per country and technology.
    """

    results = []

    status_priority = ['operating', 'construction', 'pre-construction', 'announced']

    for country in countries:
        for tech in tech_types:
            filtered = df[(df['Country/area'] == country) & (df['Type'] == tech)]

            found = False
            for status in status_priority:
                sub = filtered[filtered['Status'] == status]
                if not sub.empty:
                    top_project = sub.loc[sub['Capacity (MW)'].idxmax()]
                    results.append({
                        'Country': country,
                        'Type': tech,
                        'Plant / Project name': top_project['Plant / Project name'],
                        'Capacity (MW)': top_project['Capacity (MW)'],
                        'Latitude': top_project['Latitude'],
                        'Longitude': top_project['Longitude'],
                        'City': top_project['City']
                    })
                    found = True
                    break  # stop at the first status with a result
                    # Fallback: search all remaining statuses

            if not found:
                remaining_statuses = set(df['Status'].unique()) - set(status_priority)
                sub = filtered[filtered['Status'].isin(remaining_statuses)]
                if not sub.empty:
                    top_project = sub.loc[sub['Capacity (MW)'].idxmax()]
                    results.append({
                        'Country': country,
                        'Type': tech,
                        'Plant / Project name': top_project['Plant / Project name'],
                        'Capacity (MW)': top_project['Capacity (MW)'],
                        'Latitude': top_project['Latitude'],
                        'Longitude': top_project['Longitude'],
                        'City': top_project['City']
                    })

    return pd.DataFrame(results)

projects = get_most_relevant_projects(data_power, countries, ['solar', 'wind'])

Step 6 - Display and save results#

display(projects)
projects.to_csv(os.path.join(folder_output, 'most_relevant_projects_solar_wind.csv'), index=False, encoding='utf-8')

# Print missing countries and technologies
for tech in ['solar', 'wind']:
    temp = projects[projects['Type'] == tech]
    for c in countries:
        if c not in temp['Country'].unique():
            print(f"Warning: No projects found for {c} for {tech}.")

	Country	Type	Plant / Project name	Capacity (MW)	Latitude	Longitude	City
0	Angola	solar	Biópio solar farm	189.0	-12.4667	15.7442	Biópio
1	Angola	wind	Morro do Ouro wind farm	50.0	-15.2198	12.7258	NaN
2	Burundi	solar	Bubanza SB solar project	8.0	-3.1479	29.3625	Bubanza
3	Cameroon	solar	Garoua solar farm	30.0	9.3271	13.3621	Garoua
4	Cameroon	wind	Cameroon wind farm	100.0	4.8918	12.3598	NaN
5	Central African Republic	solar	Enerca Danzi solar farm	25.0	4.5167	18.4667	Danzi
6	Chad	solar	N'Djamena (Qair) solar farm	21.5	12.0905	15.1331	Gassi District
7	Chad	wind	Centrales d’Energie Renouvelable de N’Djamena ...	100.0	12.1232	15.0712	N'Djamena
8	Republic of the Congo	solar	District De Kabo solar project	2.0	1.3863	16.3364	NaN
9	DR Congo	solar	Goma Nyiragongo solar project	1.5	-1.6414	29.1938	Goma
10	Gabon	solar	Ayémé Plaine solar farm	60.0	0.2293	9.5315	Ayémé Plaine

Warning: No projects found for Equatorial Guinea for solar.
Warning: No projects found for Burundi for wind.
Warning: No projects found for Central African Republic for wind.
Warning: No projects found for Republic of the Congo for wind.
Warning: No projects found for DR Congo for wind.
Warning: No projects found for Equatorial Guinea for wind.
Warning: No projects found for Gabon for wind.

Step 7 - Format project coordinates for downstream scripts#

Transform the summary table into the dictionary structure expected by other notebooks (e.g., get_renewable_ninja_data.ipynb).

def format_project_locations(df_relevant):
    """
    Convert the most relevant projects DataFrame into a dictionary formatted like the original 'locations' dict.

    Returns:
        dict with structure: {tech: {country_key: (lat, lon)}}
    """
    result = {}

    for _, row in df_relevant.iterrows():
        tech = row['Type'].lower()  # e.g., 'solar', 'wind'
        #country_key = row['Country'].lower().replace(' ', '').replace('-', '')  # match the original format
        if tech not in result:
            result[tech] = {}

        result[tech][row['Country']] = (row['Latitude'], row['Longitude'])

    return result

# Format the projects into the desired structure
formatted_projects = format_project_locations(projects)
display(formatted_projects)

{'solar': {'Angola': (-12.4667, 15.7442),
  'Burundi': (-3.1479, 29.3625),
  'Cameroon': (9.3271, 13.3621),
  'Central African Republic': (4.5167, 18.4667),
  'Chad': (12.0905, 15.1331),
  'Republic of the Congo': (1.3863, 16.3364),
  'DR Congo': (-1.6414, 29.1938),
  'Gabon': (0.2293, 9.5315)},
 'wind': {'Angola': (-15.2198, 12.7258),
  'Cameroon': (4.8918, 12.3598),
  'Chad': (12.1232, 15.0712)}}