Time Series Clustering with `cluster()`¶

Accelerate investment optimization using typical periods (clustering).

This notebook demonstrates:

Typical periods: Cluster similar time segments (e.g., days) and solve only representative ones
Weighted costs: Automatically weight operational costs by cluster occurrence
Two-stage workflow: Fast sizing with clustering, accurate dispatch at full resolution

!!! note "Requirements" This notebook requires the tsam package with ClusterConfig and ExtremeConfig support. Install with: pip install "flixopt[full]"

In [1]:

  Copied!     
 
import timeit

import pandas as pd
import xarray as xr

import flixopt as fx

fx.CONFIG.notebook()
import timeit import pandas as pd import xarray as xr import flixopt as fx fx.CONFIG.notebook()

Out[1]:

flixopt.config.CONFIG

Create the FlowSystem¶

We use a district heating system with real-world time series data (one month at 15-min resolution):

In [2]:

  Copied!     
 
from data.generate_example_systems import create_district_heating_system

flow_system = create_district_heating_system()
flow_system.connect_and_transform()

timesteps = flow_system.timesteps

flow_system
from data.generate_example_systems import create_district_heating_system flow_system = create_district_heating_system() flow_system.connect_and_transform() timesteps = flow_system.timesteps flow_system

Out[2]:

FlowSystem
==========
Timesteps: 744 (Hour) [2020-01-01 to 2020-01-31]
Periods: None
Scenarios: None
Status: ✓

Components (9 items)
--------------------
 * Boiler
 * CHP
 * CoalSupply
 * ElecDemand
 * GasGrid
 * GridBuy
 * GridSell
 * HeatDemand
 * Storage

Buses (4 items)
---------------
 * Coal
 * Electricity
 * Gas
 * Heat

Effects (2 items)
-----------------
 * CO2
 * costs

Flows (13 items)
----------------
 * Boiler(Q_fu)
 * Boiler(Q_th)
 * CHP(P_el)
 * CHP(Q_fu)
 * CHP(Q_th)
 * CoalSupply(Q_Coal)
 * ElecDemand(P_el)
 * GasGrid(Q_Gas)
 * GridBuy(P_el)
 * GridSell(P_el)
 ... (+3 more)

In [3]:

  Copied!     
 
# Visualize input data
input_ds = xr.Dataset(
    {
        'Heat Demand': flow_system.components['HeatDemand'].inputs[0].fixed_relative_profile,
        'Electricity Price': flow_system.components['GridBuy'].outputs[0].effects_per_flow_hour['costs'],
    }
)
input_ds.plotly.line(x='time', facet_row='variable', title='One Month of Input Data')
# Visualize input data input_ds = xr.Dataset( { 'Heat Demand': flow_system.components['HeatDemand'].inputs[0].fixed_relative_profile, 'Electricity Price': flow_system.components['GridBuy'].outputs[0].effects_per_flow_hour['costs'], } ) input_ds.plotly.line(x='time', facet_row='variable', title='One Month of Input Data')

Method 1: Full Optimization (Baseline)¶

First, solve the complete problem with all 2976 timesteps:

In [4]:

  Copied!     
 
solver = fx.solvers.HighsSolver(mip_gap=0.01)

start = timeit.default_timer()
fs_full = flow_system.copy()
fs_full.name = 'Full Optimization'
fs_full.optimize(solver)
time_full = timeit.default_timer() - start
solver = fx.solvers.HighsSolver(mip_gap=0.01) start = timeit.default_timer() fs_full = flow_system.copy() fs_full.name = 'Full Optimization' fs_full.optimize(solver) time_full = timeit.default_timer() - start

Method 2: Clustering with `cluster()`¶

The cluster() method:

Clusters similar days using the TSAM (Time Series Aggregation Module) package
Reduces timesteps to only typical periods (e.g., 8 typical days = 768 timesteps)
Weights costs by how many original days each typical day represents
Handles storage with configurable behavior via storage_mode

!!! warning "Peak Forcing" Always use extremes=ExtremeConfig(max_value=[...]) to ensure extreme demand days are captured. Without this, clustering may miss peak periods, causing undersized components.

In [5]:

  Copied!     
 
from tsam import ExtremeConfig

start = timeit.default_timer()

# IMPORTANT: Force inclusion of peak demand periods!
peak_series = ['HeatDemand(Q_th)|fixed_relative_profile']

# Create reduced FlowSystem with 8 typical days
fs_clustered = flow_system.transform.cluster(
    n_clusters=8,  # 8 typical days
    cluster_duration='1D',  # Daily clustering
    extremes=ExtremeConfig(
        method='new_cluster', max_value=peak_series, preserve_n_clusters=True
    ),  # Capture peak demand day
)
fs_clustered.name = 'Clustered (8 days)'

time_clustering = timeit.default_timer() - start
from tsam import ExtremeConfig start = timeit.default_timer() # IMPORTANT: Force inclusion of peak demand periods! peak_series = ['HeatDemand(Q_th)|fixed_relative_profile'] # Create reduced FlowSystem with 8 typical days fs_clustered = flow_system.transform.cluster( n_clusters=8, # 8 typical days cluster_duration='1D', # Daily clustering extremes=ExtremeConfig( method='new_cluster', max_value=peak_series, preserve_n_clusters=True ), # Capture peak demand day ) fs_clustered.name = 'Clustered (8 days)' time_clustering = timeit.default_timer() - start

In [6]:

  Copied!     
 
# Optimize the reduced system
start = timeit.default_timer()
fs_clustered.optimize(solver)
time_clustered = timeit.default_timer() - start
# Optimize the reduced system start = timeit.default_timer() fs_clustered.optimize(solver) time_clustered = timeit.default_timer() - start

Understanding the Clustering¶

The clustering algorithm groups similar days together. Access all metadata via fs.clustering:

In [7]:

  Copied!     
 
# Access clustering metadata directly
clustering = fs_clustered.clustering.results
clustering
# Access clustering metadata directly clustering = fs_clustered.clustering.results clustering

Out[7]:

ClusteringResults(n_clusters=8)

In [8]:

  Copied!     
 
# Show clustering info using __repr__
fs_clustered.clustering
# Show clustering info using __repr__ fs_clustered.clustering

Out[8]:

Clustering(
  31 periods → 8 clusters
  timesteps_per_cluster=24
  dims=[]
)

In [9]:

  Copied!     
 
# Quality metrics - how well do the clusters represent the original data?
# Lower RMSE/MAE = better representation
fs_clustered.clustering.metrics.to_dataframe().style.format('{:.3f}')
# Quality metrics - how well do the clusters represent the original data? # Lower RMSE/MAE = better representation fs_clustered.clustering.metrics.to_dataframe().style.format('{:.3f}')

Out[9]:

	RMSE	MAE	RMSE_duration
time_series
ElecDemand(P_el)\|fixed_relative_profile	0.056	0.016	0.030
GasGrid(Q_Gas)\|costs\|per_flow_hour	0.109	0.079	0.079
GridBuy(P_el)\|costs\|per_flow_hour	0.108	0.070	0.030
GridSell(P_el)\|costs\|per_flow_hour	0.108	0.070	0.029
HeatDemand(Q_th)\|fixed_relative_profile	0.081	0.050	0.017

In [10]:

  Copied!     
 
# Visual comparison: original vs clustered time series
fs_clustered.clustering.plot.compare()
# Visual comparison: original vs clustered time series fs_clustered.clustering.plot.compare()

Out[10]:

Inspect Clustering Input Data¶

Before clustering, you can inspect which time-varying data will be used. The clustering_data() method returns only the arrays that vary over time (constant arrays are excluded since they don't affect clustering):

In [11]:

  Copied!     
 
# See what data will be used for clustering
clustering_data = flow_system.transform.clustering_data()
print(f'Variables used for clustering ({len(clustering_data.data_vars)} total):')
for var in clustering_data.data_vars:
    print(f'  - {var}')
# See what data will be used for clustering clustering_data = flow_system.transform.clustering_data() print(f'Variables used for clustering ({len(clustering_data.data_vars)} total):') for var in clustering_data.data_vars: print(f' - {var}')

Variables used for clustering (5 total):
  - GasGrid(Q_Gas)|costs|per_flow_hour
  - GridBuy(P_el)|costs|per_flow_hour
  - GridSell(P_el)|costs|per_flow_hour
  - HeatDemand(Q_th)|fixed_relative_profile
  - ElecDemand(P_el)|fixed_relative_profile

In [12]:

  Copied!     
 
# Visualize the time-varying data (select a few key variables)
key_vars = [v for v in clustering_data.data_vars if 'fixed_relative_profile' in v or 'effects_per_flow_hour' in v]
clustering_data[key_vars].plotly.line(facet_row='variable', title='Time-Varying Data Used for Clustering')
# Visualize the time-varying data (select a few key variables) key_vars = [v for v in clustering_data.data_vars if 'fixed_relative_profile' in v or 'effects_per_flow_hour' in v] clustering_data[key_vars].plotly.line(facet_row='variable', title='Time-Varying Data Used for Clustering')

Selective Clustering with `data_vars`¶

By default, clustering uses all time-varying data to determine typical periods. However, you may want to cluster based on only a subset of variables while still applying the clustering to all data.

Use the data_vars parameter to specify which variables determine the clustering:

Cluster based on subset: Only the specified variables affect which days are grouped together
Apply to all data: The resulting clustering is applied to ALL time-varying data

This is useful when:

You want to cluster based on demand patterns only (ignoring price variations)
You have dominant time series that should drive the clustering
You want to ensure certain patterns are well-represented in typical periods

In [13]:

  Copied!     
 
# Cluster based ONLY on heat demand pattern (ignore electricity prices)
demand_var = 'HeatDemand(Q_th)|fixed_relative_profile'

fs_demand_only = flow_system.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
    data_vars=[demand_var],  # Only this variable determines clustering
    extremes=ExtremeConfig(method='new_cluster', max_value=[demand_var], preserve_n_clusters=True),
)

# Verify: clustering was determined by demand but applied to all data
print(f'Clustered using: {demand_var}')
print(f'But all {len(clustering_data.data_vars)} variables are included in the result')
# Cluster based ONLY on heat demand pattern (ignore electricity prices) demand_var = 'HeatDemand(Q_th)|fixed_relative_profile' fs_demand_only = flow_system.transform.cluster( n_clusters=8, cluster_duration='1D', data_vars=[demand_var], # Only this variable determines clustering extremes=ExtremeConfig(method='new_cluster', max_value=[demand_var], preserve_n_clusters=True), ) # Verify: clustering was determined by demand but applied to all data print(f'Clustered using: {demand_var}') print(f'But all {len(clustering_data.data_vars)} variables are included in the result')

Clustered using: HeatDemand(Q_th)|fixed_relative_profile
But all 5 variables are included in the result

In [14]:

  Copied!     
 
# Compare metrics: clustering with all data vs. demand-only
pd.DataFrame(
    {
        'All Variables': fs_clustered.clustering.metrics.to_dataframe().iloc[0],
        'Demand Only': fs_demand_only.clustering.metrics.to_dataframe().iloc[0],
    }
).round(4)
# Compare metrics: clustering with all data vs. demand-only pd.DataFrame( { 'All Variables': fs_clustered.clustering.metrics.to_dataframe().iloc[0], 'Demand Only': fs_demand_only.clustering.metrics.to_dataframe().iloc[0], } ).round(4)

Out[14]:

	All Variables	Demand Only
RMSE	0.0563	0.1262
MAE	0.0157	0.0447
RMSE_duration	0.0303	0.0295

Advanced Clustering Options¶

The cluster() method exposes many parameters for fine-tuning:

In [15]:

  Copied!     
 
from tsam import ClusterConfig

# Try different clustering algorithms
fs_kmeans = flow_system.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
    cluster=ClusterConfig(method='kmeans'),  # Alternative: 'hierarchical' (default), 'kmedoids', 'averaging'
)

fs_kmeans.clustering
from tsam import ClusterConfig # Try different clustering algorithms fs_kmeans = flow_system.transform.cluster( n_clusters=8, cluster_duration='1D', cluster=ClusterConfig(method='kmeans'), # Alternative: 'hierarchical' (default), 'kmedoids', 'averaging' ) fs_kmeans.clustering

Out[15]:

Clustering(
  31 periods → 8 clusters
  timesteps_per_cluster=24
  dims=[]
)

In [16]:

  Copied!     
 
# Compare quality metrics between algorithms
pd.DataFrame(
    {
        'hierarchical': fs_clustered.clustering.metrics.to_dataframe().iloc[0],
        'kmeans': fs_kmeans.clustering.metrics.to_dataframe().iloc[0],
    }
)
# Compare quality metrics between algorithms pd.DataFrame( { 'hierarchical': fs_clustered.clustering.metrics.to_dataframe().iloc[0], 'kmeans': fs_kmeans.clustering.metrics.to_dataframe().iloc[0], } )

Out[16]:

	hierarchical	kmeans
RMSE	0.056259	0.047047
MAE	0.015673	0.012596
RMSE_duration	0.030335	0.022595

In [17]:

  Copied!     
 
# Visualize cluster structure with heatmap
fs_clustered.clustering.plot.heatmap()
# Visualize cluster structure with heatmap fs_clustered.clustering.plot.heatmap()

Out[17]:

Apply Existing Clustering¶

When comparing design variants or performing sensitivity analysis, you often want to use the same cluster structure across different FlowSystem configurations. Use apply_clustering() to reuse a clustering from another FlowSystem:

# First, create a reference clustering
fs_reference = flow_system.transform.cluster(n_clusters=8, cluster_duration='1D')

# Modify the FlowSystem (e.g., different storage size)
flow_system_modified = flow_system.copy()
flow_system_modified.components['Storage'].capacity_in_flow_hours.maximum_size = 2000

# Apply the SAME clustering for fair comparison
fs_modified = flow_system_modified.transform.apply_clustering(fs_reference.clustering)

This ensures both systems use identical typical periods for fair comparison.

Method 3: Two-Stage Workflow (Recommended)¶

The recommended approach for investment optimization:

Stage 1: Fast sizing with cluster()
Stage 2: Fix sizes (with safety margin) and dispatch at full resolution

!!! tip "Safety Margin" Typical periods aggregate similar days, so individual days may have higher demand than the typical day. Adding a 5-10% margin ensures feasibility.

In [18]:

  Copied!     
 
# Apply safety margin to sizes
SAFETY_MARGIN = 1.05  # 5% buffer
sizes_with_margin = {name: float(size.item()) * SAFETY_MARGIN for name, size in fs_clustered.stats.sizes.items()}
# Apply safety margin to sizes SAFETY_MARGIN = 1.05 # 5% buffer sizes_with_margin = {name: float(size.item()) * SAFETY_MARGIN for name, size in fs_clustered.stats.sizes.items()}

In [19]:

  Copied!     
 
# Stage 2: Fix sizes and optimize at full resolution
start = timeit.default_timer()

fs_dispatch = flow_system.transform.fix_sizes(sizes_with_margin)
fs_dispatch.name = 'Two-Stage'
fs_dispatch.optimize(solver)

time_dispatch = timeit.default_timer() - start

# Total two-stage time
total_two_stage = time_clustering + time_clustered + time_dispatch
# Stage 2: Fix sizes and optimize at full resolution start = timeit.default_timer() fs_dispatch = flow_system.transform.fix_sizes(sizes_with_margin) fs_dispatch.name = 'Two-Stage' fs_dispatch.optimize(solver) time_dispatch = timeit.default_timer() - start # Total two-stage time total_two_stage = time_clustering + time_clustered + time_dispatch

Compare Results¶

In [20]:

  Copied!     
 
results = {
    'Full (baseline)': {
        'Time [s]': time_full,
        'Cost [€]': fs_full.solution['costs'].item(),
        'CHP': fs_full.stats.sizes['CHP(Q_th)'].item(),
        'Boiler': fs_full.stats.sizes['Boiler(Q_th)'].item(),
        'Storage': fs_full.stats.sizes['Storage'].item(),
    },
    'Clustered (8 days)': {
        'Time [s]': time_clustering + time_clustered,
        'Cost [€]': fs_clustered.solution['costs'].item(),
        'CHP': fs_clustered.stats.sizes['CHP(Q_th)'].item(),
        'Boiler': fs_clustered.stats.sizes['Boiler(Q_th)'].item(),
        'Storage': fs_clustered.stats.sizes['Storage'].item(),
    },
    'Two-Stage': {
        'Time [s]': total_two_stage,
        'Cost [€]': fs_dispatch.solution['costs'].item(),
        'CHP': sizes_with_margin['CHP(Q_th)'],
        'Boiler': sizes_with_margin['Boiler(Q_th)'],
        'Storage': sizes_with_margin['Storage'],
    },
}

comparison = pd.DataFrame(results).T
baseline_cost = comparison.loc['Full (baseline)', 'Cost [€]']
baseline_time = comparison.loc['Full (baseline)', 'Time [s]']
comparison['Cost Gap [%]'] = ((comparison['Cost [€]'] - baseline_cost) / abs(baseline_cost) * 100).round(2)
comparison['Speedup'] = (baseline_time / comparison['Time [s]']).round(1)

comparison.style.format(
    {
        'Time [s]': '{:.1f}',
        'Cost [€]': '{:,.0f}',
        'CHP': '{:.1f}',
        'Boiler': '{:.1f}',
        'Storage': '{:.0f}',
        'Cost Gap [%]': '{:.2f}',
        'Speedup': '{:.1f}x',
    }
)
results = { 'Full (baseline)': { 'Time [s]': time_full, 'Cost [€]': fs_full.solution['costs'].item(), 'CHP': fs_full.stats.sizes['CHP(Q_th)'].item(), 'Boiler': fs_full.stats.sizes['Boiler(Q_th)'].item(), 'Storage': fs_full.stats.sizes['Storage'].item(), }, 'Clustered (8 days)': { 'Time [s]': time_clustering + time_clustered, 'Cost [€]': fs_clustered.solution['costs'].item(), 'CHP': fs_clustered.stats.sizes['CHP(Q_th)'].item(), 'Boiler': fs_clustered.stats.sizes['Boiler(Q_th)'].item(), 'Storage': fs_clustered.stats.sizes['Storage'].item(), }, 'Two-Stage': { 'Time [s]': total_two_stage, 'Cost [€]': fs_dispatch.solution['costs'].item(), 'CHP': sizes_with_margin['CHP(Q_th)'], 'Boiler': sizes_with_margin['Boiler(Q_th)'], 'Storage': sizes_with_margin['Storage'], }, } comparison = pd.DataFrame(results).T baseline_cost = comparison.loc['Full (baseline)', 'Cost [€]'] baseline_time = comparison.loc['Full (baseline)', 'Time [s]'] comparison['Cost Gap [%]'] = ((comparison['Cost [€]'] - baseline_cost) / abs(baseline_cost) * 100).round(2) comparison['Speedup'] = (baseline_time / comparison['Time [s]']).round(1) comparison.style.format( { 'Time [s]': '{:.1f}', 'Cost [€]': '{:,.0f}', 'CHP': '{:.1f}', 'Boiler': '{:.1f}', 'Storage': '{:.0f}', 'Cost Gap [%]': '{:.2f}', 'Speedup': '{:.1f}x', } )

Out[20]:

	Time [s]	Cost [€]	CHP	Storage	Cost Gap [%]	Speedup
Full (baseline)	17.1	-148,912	165.7	1000	0.00	1.0x
Clustered (8 days)	6.7	-137,579	171.7	1000	7.61	2.5x
Two-Stage	13.7	-150,096	180.3	1050	-0.79	1.2x

Expand Solution to Full Resolution¶

Use expand() to map the clustered solution back to all original timesteps. This repeats the typical period values for all days belonging to that cluster:

In [21]:

  Copied!     
 
# Expand the clustered solution to full resolution
fs_expanded = fs_clustered.transform.expand()
# Expand the clustered solution to full resolution fs_expanded = fs_clustered.transform.expand()

In [22]:

  Copied!     
 
# Compare heat production: Full vs Expanded
heat_flows = ['CHP(Q_th)|flow_rate', 'Boiler(Q_th)|flow_rate']

# Create comparison dataset
comparison_ds = xr.Dataset(
    {
        name.replace('|flow_rate', ''): xr.concat(
            [fs_full.solution[name], fs_expanded.solution[name]], dim=pd.Index(['Full', 'Expanded'], name='method')
        )
        for name in heat_flows
    }
)

comparison_ds.plotly.line(x='time', facet_col='variable', color='method', title='Heat Production Comparison')
# Compare heat production: Full vs Expanded heat_flows = ['CHP(Q_th)|flow_rate', 'Boiler(Q_th)|flow_rate'] # Create comparison dataset comparison_ds = xr.Dataset( { name.replace('|flow_rate', ''): xr.concat( [fs_full.solution[name], fs_expanded.solution[name]], dim=pd.Index(['Full', 'Expanded'], name='method') ) for name in heat_flows } ) comparison_ds.plotly.line(x='time', facet_col='variable', color='method', title='Heat Production Comparison')

Visualize Clustered Heat Balance¶

In [23]:

  Copied!     
 
fs_clustered.stats.plot.storage('Storage')
fs_clustered.stats.plot.storage('Storage')

Out[23]:

In [24]:

  Copied!     
 
fs_expanded.stats.plot.storage('Storage')
fs_expanded.stats.plot.storage('Storage')

Out[24]:

API Reference¶

`transform.cluster()` Parameters¶

Parameter	Type	Default	Description
`n_clusters`	`int`	-	Number of typical periods (e.g., 8 typical days)
`cluster_duration`	`str \\| float`	-	Duration per cluster ('1D', '24h') or hours
`data_vars`	`list[str]`	None	Variables to cluster on (applies result to all)
`weights`	`dict[str, float]`	None	Optional weights for time series in clustering
`cluster`	`ClusterConfig`	None	Clustering algorithm configuration
`extremes`	`ExtremeConfig`	None	Essential: Force inclusion of peak/min periods
`**tsam_kwargs`	-	-	Additional tsam parameters

`transform.clustering_data()` Method¶

Inspect which time-varying data will be used for clustering:

# Get all time-varying variables
clustering_data = flow_system.transform.clustering_data()
print(list(clustering_data.data_vars))

# Get data for a specific period (multi-period systems)
clustering_data = flow_system.transform.clustering_data(period=2024)

Clustering Object Properties¶

After clustering, access metadata via fs.clustering:

Property	Description
`n_clusters`	Number of clusters
`n_original_clusters`	Number of original time segments (e.g., 365 days)
`timesteps_per_cluster`	Timesteps in each cluster (e.g., 24 for daily)
`cluster_assignments`	xr.DataArray mapping original segment → cluster ID
`cluster_occurrences`	How many original segments each cluster represents
`metrics`	xr.Dataset with RMSE, MAE per time series
`results`	`ClusteringResults` with xarray-like interface
`plot.compare()`	Compare original vs clustered time series
`plot.heatmap()`	Visualize cluster structure

ClusteringResults (xarray-like)¶

Access the underlying tsam results via clustering.results:

# Dimension info (like xarray)
clustering.results.dims      # ('period', 'scenario') or ()
clustering.results.coords    # {'period': [2020, 2030], 'scenario': ['high', 'low']}

# Select specific result (like xarray)
clustering.results.sel(period=2020, scenario='high')   # Label-based
clustering.results.isel(period=0, scenario=1)          # Index-based

# Apply existing clustering to new data
agg_results = clustering.results.apply(dataset)  # Returns AggregationResults

Storage Behavior¶

Each Storage component has a cluster_mode parameter:

Mode	Description
`'intercluster_cyclic'`	Links storage across clusters + yearly cyclic (default)
`'intercluster'`	Links storage across clusters, free start/end
`'cyclic'`	Each cluster is independent but cyclic (start = end)
`'independent'`	Each cluster is independent, free start/end

For a detailed comparison of storage modes, see 08c2-clustering-storage-modes.

Peak Forcing with ExtremeConfig¶

from tsam import ExtremeConfig

extremes = ExtremeConfig(
    method='new_cluster',  # Creates new cluster for extremes
    max_value=['ComponentName(FlowName)|fixed_relative_profile'],  # Capture peak demand
    preserve_n_clusters=True,  # Keep total cluster count unchanged
)

Recommended Workflow¶

from tsam import ExtremeConfig

# Stage 1: Fast sizing
fs_sizing = flow_system.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
    extremes=ExtremeConfig(method='new_cluster', max_value=['Demand(Flow)|fixed_relative_profile'], preserve_n_clusters=True),
)
fs_sizing.optimize(solver)

# Apply safety margin
sizes = {k: v.item() * 1.05 for k, v in fs_sizing.stats.sizes.items()}

# Stage 2: Accurate dispatch
fs_dispatch = flow_system.transform.fix_sizes(sizes)
fs_dispatch.optimize(solver)

Summary¶

You learned how to:

Use cluster() to reduce time series into typical periods
Inspect clustering data with clustering_data() before clustering
Use data_vars to cluster based on specific variables only
Apply peak forcing with ExtremeConfig to capture extreme demand days
Use two-stage optimization for fast yet accurate investment decisions
Expand solutions back to full resolution with expand()
Access clustering metadata via fs.clustering (metrics, cluster_assignments, cluster_occurrences)
Use advanced options like different algorithms with ClusterConfig
Apply existing clustering to other FlowSystems using apply_clustering()

Key Takeaways¶

Always use peak forcing (extremes=ExtremeConfig(max_value=[...])) for demand time series
Inspect data first with clustering_data() to see available variables
Use data_vars to cluster on specific variables (e.g., demand only, ignoring prices)
Add safety margin (5-10%) when fixing sizes from clustering
Two-stage is recommended: clustering for sizing, full resolution for dispatch
Storage handling is configurable via cluster_mode
Check metrics to evaluate clustering quality
Use apply_clustering() to apply the same clustering to different FlowSystem variants

Next Steps¶

08c2-clustering-storage-modes: Compare storage modes using a seasonal storage system
08d-clustering-multiperiod: Clustering with multiple periods and scenarios

Time Series Clustering with cluster()¶