Multi-Period Clustering with cluster()¶
Combine time series clustering with multi-period investment optimization.
This notebook demonstrates:
- Multi-period modeling: Optimize investments across multiple planning periods (years)
- Scenario analysis: Handle demand uncertainty with weighted scenarios
- Clustering per period: Apply typical-period clustering independently for each period/scenario
- Scalability: Reduce computational complexity for long-horizon planning
!!! note "Requirements" This notebook requires the tsam package with ExtremeConfig support. Install with: pip install "flixopt[full]"
import timeit
import numpy as np
import pandas as pd
import plotly.express as px
import flixopt as fx
fx.CONFIG.notebook()
flixopt.config.CONFIG
Create the Multi-Period System¶
We use a multi-period heating system with:
- 3 planning periods (years 2024, 2025, 2026)
- 2 scenarios (high demand 30%, low demand 70%)
- 2 weeks at hourly resolution (336 timesteps)
This represents a capacity expansion problem where we optimize component sizes once, but operations are simulated across multiple future years and demand scenarios.
from data.generate_example_systems import create_multiperiod_system
flow_system = create_multiperiod_system()
print(f'Timesteps: {len(flow_system.timesteps)} ({len(flow_system.timesteps) // 24} days)')
print(f'Periods: {list(flow_system.periods.values)}')
print(f'Scenarios: {list(flow_system.scenarios.values)}')
print(f'Scenario weights: {flow_system.scenario_weights.values}')
print(f'\nComponents: {list(flow_system.components.keys())}')
Timesteps: 336 (14 days) Periods: [np.int64(2024), np.int64(2025), np.int64(2026)] Scenarios: ['high_demand', 'low_demand'] Scenario weights: [0.3 0.7] Components: ['GasGrid', 'Boiler', 'ThermalStorage', 'Building']
Selecting a Subset with transform.isel()¶
For demonstration purposes, we'll use only the first week of data. The isel() method (index select) lets you slice FlowSystems by time:
# Select first week only (168 hours)
flow_system = flow_system.transform.isel(time=slice(0, 168))
print(f'After isel: {len(flow_system.timesteps)} timesteps ({len(flow_system.timesteps) // 24} days)')
After isel: 168 timesteps (7 days)
# Visualize demand scenarios (equal across periods)
heat_demand = flow_system.components['Building'].inputs[0].fixed_relative_profile
fig = px.line(heat_demand.to_dataframe('value').reset_index(), x='time', y='value', facet_row='scenario')
fig.update_layout(
height=350,
title='Heat Demand by Scenario (One Week)',
xaxis_title='Time',
yaxis_title='Heat Demand [kW]',
)
fig.show()
Full Optimization (Baseline)¶
First, solve the complete problem with all timesteps across all periods and scenarios:
solver = fx.solvers.HighsSolver(mip_gap=0.01)
start = timeit.default_timer()
fs_full = flow_system.copy()
fs_full.name = 'Full Optimization'
fs_full.optimize(solver)
time_full = timeit.default_timer() - start
print(f'Full optimization: {time_full:.2f} seconds')
print(f'Total cost (objective): {fs_full.solution["objective"].item():,.0f} €')
print('\nOptimized sizes:')
for name, size in fs_full.stats.sizes.items():
print(f' {name}: {size.max().item():.1f}')
Full optimization: 7.18 seconds Total cost (objective): 31,141 € Optimized sizes: Boiler(Heat): 146.1 ThermalStorage: 53.7
Multi-Period Clustering with cluster()¶
When applied to a multi-period system, cluster() clusters each period/scenario combination independently. This is because demand patterns and optimal operations may differ across:
- Periods: Different years may have different characteristics
- Scenarios: High vs low demand scenarios need different representative days
The investment decisions (sizes) remain consistent across all periods and scenarios, while the operational patterns are optimized for each cluster.
from tsam import ExtremeConfig
start = timeit.default_timer()
# Force inclusion of peak demand periods
peak_series = ['Building(Heat)|fixed_relative_profile']
# Cluster to 3 typical days (from 7 days)
fs_clustered = flow_system.transform.cluster(
n_clusters=3,
cluster_duration='1D',
extremes=ExtremeConfig(method='replace', max_value=peak_series, preserve_n_clusters=True),
)
time_clustering = timeit.default_timer() - start
print(f'Clustering time: {time_clustering:.2f} seconds')
print(f'Reduced: {len(flow_system.timesteps)} → {len(fs_clustered.timesteps)} timesteps per period')
print('Total problem reduction: 7 days × 3 periods × 2 scenarios → 3 days × 3 × 2')
Clustering time: 0.86 seconds Reduced: 168 → 24 timesteps per period Total problem reduction: 7 days × 3 periods × 2 scenarios → 3 days × 3 × 2
# Optimize the reduced system
start = timeit.default_timer()
fs_clustered.optimize(solver)
time_clustered = timeit.default_timer() - start
print(f'Clustered optimization: {time_clustered:.2f} seconds')
print(f'Total cost (objective): {fs_clustered.solution["objective"].item():,.0f} €')
print(f'\nSpeedup vs full: {time_full / (time_clustering + time_clustered):.1f}x')
print('\nOptimized sizes:')
for name, size in fs_clustered.stats.sizes.items():
print(f' {name}: {size.max().item():.1f}')
HighsMipSolverData::transformNewIntegerFeasibleSolution tmpSolver.run();
Clustered optimization: 7.66 seconds Total cost (objective): 29,656 € Speedup vs full: 0.8x Optimized sizes: Boiler(Heat): 143.8 ThermalStorage: 46.0
Visualize Clustering Quality¶
The .plot accessor provides built-in visualizations with automatic faceting by period and scenario:
# Duration curves show how well the distribution is preserved per period/scenario
fs_clustered.clustering.plot.compare(
kind='duration_curve',
)
# Heatmap shows cluster assignments - faceted by period and scenario
fs_clustered.clustering.plot.heatmap()
Understand the Cluster Structure¶
Let's inspect how days were grouped into clusters:
clustering = fs_clustered.clustering
print('Clustering Configuration:')
print(f' Typical periods (clusters): {clustering.n_clusters}')
print(f' Timesteps per cluster: {clustering.timesteps_per_cluster}')
# Access underlying results via xarray-like interface
print(f'\nClusteringResults dimensions: {clustering.results.dims}')
print(f'ClusteringResults coords: {clustering.results.coords}')
# The cluster_assignments shows which cluster each original day belongs to
# For multi-period systems, select a specific period/scenario combination
cluster_assignments = clustering.cluster_assignments.isel(period=0, scenario=0).values
day_names = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
print('\nCluster assignments per day (period=2024, scenario=High):')
for i, cluster_id in enumerate(cluster_assignments):
print(f' {day_names[i]}: Cluster {cluster_id}')
# Cluster occurrences (how many original days each cluster represents)
unique, counts = np.unique(cluster_assignments, return_counts=True)
print('\nCluster weights (days represented):')
for cluster_id, count in zip(unique, counts, strict=True):
print(f' Cluster {cluster_id}: {count} days')
Clustering Configuration:
Typical periods (clusters): 3
Timesteps per cluster: 24
ClusteringResults dimensions: ('period', 'scenario')
ClusteringResults coords: {'period': [2024, 2025, 2026], 'scenario': ['high_demand', 'low_demand']}
Cluster assignments per day (period=2024, scenario=High):
Mon: Cluster 0
Tue: Cluster 0
Wed: Cluster 0
Thu: Cluster 2
Fri: Cluster 0
Sat: Cluster 1
Sun: Cluster 1
Cluster weights (days represented):
Cluster 0: 4 days
Cluster 1: 2 days
Cluster 2: 1 days
Two-Stage Workflow for Multi-Period¶
For investment optimization across multiple periods, the recommended workflow is:
- Stage 1: Fast sizing with clustering (reduced timesteps)
- Stage 2: Fix sizes and run full-resolution dispatch
This gives accurate investment decisions while maintaining computational tractability.
Safety Margin Rationale¶
A 10% safety margin is applied to compensate for:
- Peak underestimation: Clustering averages similar days, potentially underestimating true peak demands
- Temporal detail loss: Representative periods may miss short-duration extreme events
- Scenario averaging: Weighted scenarios smooth out worst-case conditions
For critical applications, consider 15-20% margins or validate with full-resolution runs.
# Stage 1 already done - apply safety margin
SAFETY_MARGIN = 1.10 # 10% buffer for multi-period uncertainty
sizes_with_margin = {name: size.max().item() * SAFETY_MARGIN for name, size in fs_clustered.stats.sizes.items()}
print('Stage 1: Sizing with clustering')
print(f' Time: {time_clustering + time_clustered:.2f} seconds')
print(f' Cost estimate: {fs_clustered.solution["objective"].item():,.0f} €')
print(f'\nSizes with {(SAFETY_MARGIN - 1) * 100:.0f}% safety margin:')
for name, size in sizes_with_margin.items():
original = fs_clustered.stats.sizes[name].max().item()
print(f' {name}: {original:.1f} → {size:.1f}')
Stage 1: Sizing with clustering Time: 8.52 seconds Cost estimate: 29,656 € Sizes with 10% safety margin: Boiler(Heat): 143.8 → 158.2 ThermalStorage: 46.0 → 50.6
# Stage 2: Full resolution dispatch with fixed sizes
print('Stage 2: Full resolution dispatch')
start = timeit.default_timer()
fs_dispatch = flow_system.transform.fix_sizes(sizes_with_margin)
fs_dispatch.name = 'Two-Stage'
fs_dispatch.optimize(solver)
time_dispatch = timeit.default_timer() - start
print(f' Time: {time_dispatch:.2f} seconds')
print(f' Actual cost: {fs_dispatch.solution["objective"].item():,.0f} €')
# Total comparison
total_two_stage = time_clustering + time_clustered + time_dispatch
print(f'\nTotal two-stage time: {total_two_stage:.2f} seconds')
print(f'Speedup vs full: {time_full / total_two_stage:.1f}x')
Stage 2: Full resolution dispatch
HighsMipSolverData::transformNewIntegerFeasibleSolution tmpSolver.run(); HighsMipSolverData::transformNewIntegerFeasibleSolution tmpSolver.run();
Time: 4.28 seconds Actual cost: 32,816 € Total two-stage time: 12.79 seconds Speedup vs full: 0.6x
Compare Results Across Methods¶
results = {
'Full (baseline)': {
'Time [s]': time_full,
'Cost [€]': fs_full.solution['objective'].item(),
'Boiler': fs_full.stats.sizes['Boiler(Heat)'].max().item(),
'Storage': fs_full.stats.sizes['ThermalStorage'].max().item(),
},
'Clustered (3 days)': {
'Time [s]': time_clustering + time_clustered,
'Cost [€]': fs_clustered.solution['objective'].item(),
'Boiler': fs_clustered.stats.sizes['Boiler(Heat)'].max().item(),
'Storage': fs_clustered.stats.sizes['ThermalStorage'].max().item(),
},
'Two-Stage': {
'Time [s]': total_two_stage,
'Cost [€]': fs_dispatch.solution['objective'].item(),
'Boiler': sizes_with_margin['Boiler(Heat)'],
'Storage': sizes_with_margin['ThermalStorage'],
},
}
comparison = pd.DataFrame(results).T
baseline_cost = comparison.loc['Full (baseline)', 'Cost [€]']
baseline_time = comparison.loc['Full (baseline)', 'Time [s]']
comparison['Cost Gap [%]'] = ((comparison['Cost [€]'] - baseline_cost) / abs(baseline_cost) * 100).round(2)
comparison['Speedup'] = (baseline_time / comparison['Time [s]']).round(1)
comparison.style.format(
{
'Time [s]': '{:.2f}',
'Cost [€]': '{:,.0f}',
'Boiler': '{:.1f}',
'Storage': '{:.0f}',
'Cost Gap [%]': '{:.2f}',
'Speedup': '{:.1f}x',
}
)
| Time [s] | Cost [€] | Boiler | Storage | Cost Gap [%] | Speedup | |
|---|---|---|---|---|---|---|
| Full (baseline) | 7.18 | 31,141 | 146.1 | 54 | 0.00 | 1.0x |
| Clustered (3 days) | 8.52 | 29,656 | 143.8 | 46 | -4.77 | 0.8x |
| Two-Stage | 12.79 | 32,816 | 158.2 | 51 | 5.38 | 0.6x |
Visualize Optimization Results¶
Use the built-in statistics plotting to compare results across periods and scenarios:
# Plot flow rates with automatic faceting by period and scenario
fs_full.stats.plot.flows(component='Boiler')
# Side-by-side comparison using the Comparison class
comp = fx.Comparison([fs_full, fs_dispatch])
comp.stats.plot.balance('Heat')
Expand Clustered Solution to Full Resolution¶
Use expand() to map the clustered results back to all original timesteps:
# Expand the clustered solution
fs_expanded = fs_clustered.transform.expand()
print(f'Expanded: {len(fs_clustered.timesteps)} → {len(fs_expanded.timesteps)} timesteps')
print(f'Cost (objective): {fs_expanded.solution["objective"].item():,.0f} €')
Expanded: 24 → 168 timesteps Cost (objective): 29,656 €
# Compare expanded solution - shows the repeated cluster patterns
fs_expanded.stats.plot.flows(component='Boiler')
Key Considerations for Multi-Period Clustering¶
1. Independent Clustering per Period/Scenario¶
Each period and scenario combination is clustered independently because:
- Demand patterns may differ across years (growth, seasonality)
- Scenarios represent different futures that shouldn't be mixed
- Investment decisions must be robust across all combinations
2. Safety Margins¶
Multi-period systems often warrant larger safety margins (10-15%) because:
- More uncertainty across multiple years
- Investments made once must work for all periods
- Scenario weights may not perfectly represent actual outcomes
3. Computational Benefits¶
Clustering becomes more valuable as problem size grows:
| Scenario | Full Problem | With Clustering |
|---|---|---|
| 1 period, 1 scenario, 365 days | 8,760 timesteps | ~730 (10 typical days) |
| 3 periods, 2 scenarios, 365 days | 52,560 timesteps | ~4,380 |
| 10 periods, 3 scenarios, 365 days | 262,800 timesteps | ~21,900 |
The speedup factor increases with problem size.
Summary¶
You learned how to:
- Load multi-period systems with periods and scenarios
- Use
transform.isel()to select time subsets - Apply
cluster()to multi-dimensional FlowSystems - Visualize clustering with the
.plotaccessor (compare, duration curves, heatmaps) - Use the two-stage workflow for robust investment optimization
- Expand solutions back to full resolution with
expand()
Key Takeaways¶
- Clustering is applied per period/scenario: Each combination gets independent typical periods
- Investments are shared: Component sizes are optimized once across all periods/scenarios
- Use larger safety margins: Multi-period uncertainty warrants 10-15% buffers
- Two-stage is recommended: Fast sizing with clustering, accurate dispatch at full resolution
- Built-in plotting: Use
.plotaccessor for automatic faceting by period/scenario
API Reference¶
from tsam import ExtremeConfig
# Load multi-period system
fs = fx.FlowSystem.from_netcdf('multiperiod_system.nc4')
# Select time subset (optional)
fs = fs.transform.isel(time=slice(0, 168)) # First 168 timesteps
# Cluster (applies per period/scenario)
fs_clustered = fs.transform.cluster(
n_clusters=10,
cluster_duration='1D',
extremes=ExtremeConfig(method='replace', max_value=['Demand(Flow)|fixed_relative_profile'], preserve_n_clusters=True),
)
# Visualize clustering quality
fs_clustered.clustering.plot.compare(variable='Demand(Flow)|profile')
fs_clustered.clustering.plot.heatmap()
# Access underlying results (xarray-like interface)
fs_clustered.clustering.results.dims # ('period', 'scenario')
fs_clustered.clustering.results.coords # {'period': [...], 'scenario': [...]}
fs_clustered.clustering.results.sel(period=2024, scenario='High') # Label-based
fs_clustered.clustering.results.isel(period=0, scenario=0) # Index-based
# Two-stage workflow
fs_clustered.optimize(solver)
sizes = {k: v.max().item() * 1.10 for k, v in fs_clustered.stats.sizes.items()}
fs_dispatch = fs.transform.fix_sizes(sizes)
fs_dispatch.optimize(solver)
# Visualize results
fs_dispatch.stats.plot.flows(component='Boiler')