Intra-Period Segmentation with cluster()¶
Reduce timesteps within each typical period using segmentation.
This notebook demonstrates:
- Segmentation: Aggregate timesteps within each cluster into fewer segments
- Variable durations: Each segment can have different duration (hours)
- Combined reduction: Use clustering AND segmentation for maximum speedup
- Expansion: Map segmented results back to original timesteps
!!! note "Requirements" This notebook requires the tsam package with SegmentConfig and ExtremeConfig support. Install with: pip install "flixopt[full]"
import timeit
import pandas as pd
import plotly.express as px
import flixopt as fx
fx.CONFIG.notebook()
flixopt.config.CONFIG
What is Segmentation?¶
Clustering groups similar time periods (e.g., days) into representative clusters.
Segmentation goes further by aggregating timesteps within each cluster into fewer segments with variable durations.
Original: | Day 1 (24h) | Day 2 (24h) | Day 3 (24h) | ... | Day 365 (24h) |
↓ ↓ ↓ ↓
Clustered: | Typical Day A (24h) | Typical Day B (24h) | Typical Day C (24h) |
↓ ↓ ↓
Segmented: | Seg1 (4h) | Seg2 (8h) | Seg3 (8h) | Seg4 (4h) | (per typical day)
This can dramatically reduce problem size:
- Original: 365 days × 24 hours = 8,760 timesteps
- Clustered (8 days): 8 × 24 = 192 timesteps
- Segmented (6 segments): 8 × 6 = 48 timesteps
Create the FlowSystem¶
We use a district heating system with one month of data at 15-min resolution:
from data.generate_example_systems import create_district_heating_system
flow_system = create_district_heating_system()
flow_system.connect_and_transform()
print(f'Timesteps: {len(flow_system.timesteps)}')
print(f'Duration: {(flow_system.timesteps[-1] - flow_system.timesteps[0]).days + 1} days')
Timesteps: 744 Duration: 31 days
# Visualize input data
heat_demand = flow_system.components['HeatDemand'].inputs[0].fixed_relative_profile
heat_demand.plotly.line(title='Heat Demand Profile')
Full Optimization (Baseline)¶
solver = fx.solvers.HighsSolver(mip_gap=0.01)
start = timeit.default_timer()
fs_full = flow_system.copy()
fs_full.name = 'Full Optimization'
fs_full.optimize(solver)
time_full = timeit.default_timer() - start
print(f'Full optimization: {time_full:.2f} seconds')
print(f'Total cost: {fs_full.solution["costs"].item():,.0f} €')
Full optimization: 17.14 seconds Total cost: -148,912 €
Clustering with Segmentation¶
Use SegmentConfig to enable intra-period segmentation:
from tsam import ExtremeConfig, SegmentConfig
start = timeit.default_timer()
# Cluster into 8 typical days with 6 segments each
fs_segmented = flow_system.transform.cluster(
n_clusters=8,
cluster_duration='1D',
segments=SegmentConfig(n_segments=6), # 6 segments per day instead of 96 quarter-hours
extremes=ExtremeConfig(
method='replace', max_value=['HeatDemand(Q_th)|fixed_relative_profile'], preserve_n_clusters=True
),
)
time_clustering = timeit.default_timer() - start
print(f'Clustering time: {time_clustering:.2f} seconds')
print(f'Original timesteps: {len(flow_system.timesteps)}')
print(
f'Segmented timesteps: {len(fs_segmented.timesteps)} × {len(fs_segmented.clusters)} clusters = {len(fs_segmented.timesteps) * len(fs_segmented.clusters)}'
)
Clustering time: 0.38 seconds Original timesteps: 744 Segmented timesteps: 6 × 8 clusters = 48
Understanding Segmentation Properties¶
After segmentation, the clustering object has additional properties:
clustering = fs_segmented.clustering
print('Segmentation Properties:')
print(f' is_segmented: {clustering.is_segmented}')
print(f' n_segments: {clustering.n_segments}')
print(f' n_clusters: {clustering.n_clusters}')
print(f' timesteps_per_cluster (original): {clustering.timesteps_per_cluster}')
print(f'\nTime dimension uses RangeIndex: {type(fs_segmented.timesteps)}')
Segmentation Properties: is_segmented: True n_segments: 6 n_clusters: 8 timesteps_per_cluster (original): 24 Time dimension uses RangeIndex: <class 'pandas.core.indexes.range.RangeIndex'>
Variable Timestep Durations¶
Each segment has a different duration, determined by how many original timesteps it represents:
# Timestep duration is now a DataArray with (cluster, time) dimensions
timestep_duration = fs_segmented.timestep_duration
print(f'Timestep duration shape: {dict(timestep_duration.sizes)}')
print('\nSegment durations for cluster 0:')
cluster_0_durations = timestep_duration.sel(cluster=0).values
for i, dur in enumerate(cluster_0_durations):
print(f' Segment {i}: {dur:.2f} hours')
print(f' Total: {cluster_0_durations.sum():.2f} hours (should be 24h)')
Timestep duration shape: {'cluster': 8, 'time': 6}
Segment durations for cluster 0:
Segment 0: 6.00 hours
Segment 1: 2.00 hours
Segment 2: 5.00 hours
Segment 3: 6.00 hours
Segment 4: 4.00 hours
Segment 5: 1.00 hours
Total: 24.00 hours (should be 24h)
# Visualize segment durations across clusters
duration_df = timestep_duration.to_dataframe('duration').reset_index()
fig = px.bar(
duration_df,
x='time',
y='duration',
facet_col='cluster',
facet_col_wrap=4,
title='Segment Durations by Cluster',
labels={'time': 'Segment', 'duration': 'Duration [hours]'},
)
fig.update_layout(height=400)
fig.show()
Optimize the Segmented System¶
start = timeit.default_timer()
fs_segmented.optimize(solver)
time_segmented = timeit.default_timer() - start
print(f'Segmented optimization: {time_segmented:.2f} seconds')
print(f'Total cost: {fs_segmented.solution["costs"].item():,.0f} €')
print(f'\nSpeedup vs full: {time_full / (time_clustering + time_segmented):.1f}x')
Segmented optimization: 5.65 seconds Total cost: -135,857 € Speedup vs full: 2.8x
Compare Clustering Quality¶
View how well the segmented data represents the original:
# Duration curves show how well the distribution is preserved
fs_segmented.clustering.plot.compare(kind='duration_curve')
# Clustering quality metrics
fs_segmented.clustering.metrics.to_dataframe().style.format('{:.3f}')
| RMSE | MAE | RMSE_duration | |
|---|---|---|---|
| time_series | |||
| ElecDemand(P_el)|fixed_relative_profile | 0.085 | 0.061 | 0.055 |
| GasGrid(Q_Gas)|costs|per_flow_hour | 0.089 | 0.064 | 0.060 |
| GridBuy(P_el)|costs|per_flow_hour | 0.114 | 0.080 | 0.033 |
| GridSell(P_el)|costs|per_flow_hour | 0.116 | 0.081 | 0.030 |
| HeatDemand(Q_th)|fixed_relative_profile | 0.100 | 0.072 | 0.032 |
Expand to Original Timesteps¶
Use expand() to map the segmented solution back to all original timesteps:
start = timeit.default_timer()
fs_expanded = fs_segmented.transform.expand()
time_expand = timeit.default_timer() - start
print(f'Expansion time: {time_expand:.3f} seconds')
print(f'Expanded timesteps: {len(fs_expanded.timesteps)}')
print(f'Objective preserved: {fs_expanded.solution["costs"].item():,.0f} €')
Expansion time: 0.169 seconds Expanded timesteps: 744 Objective preserved: -135,857 €
# Compare flow rates: Full vs Expanded
import xarray as xr
flow_var = 'CHP(Q_th)|flow_rate'
comparison_ds = xr.concat(
[fs_full.solution[flow_var], fs_expanded.solution[flow_var]],
dim=pd.Index(['Full', 'Expanded'], name='method'),
)
comparison_ds.plotly.line(color='method', title='CHP Heat Output Comparison')
Two-Stage Workflow with Segmentation¶
For investment optimization, use segmentation for fast sizing, then dispatch at full resolution:
# Stage 1: Sizing with segmentation (already done)
SAFETY_MARGIN = 1.05
sizes_with_margin = {name: float(size.item()) * SAFETY_MARGIN for name, size in fs_segmented.stats.sizes.items()}
print('Optimized sizes with safety margin:')
for name, size in sizes_with_margin.items():
print(f' {name}: {size:.1f}')
Optimized sizes with safety margin: CHP(Q_th): 181.5 Boiler(Q_th): 0.0 Storage: 1050.0
# Stage 2: Full resolution dispatch with fixed sizes
start = timeit.default_timer()
fs_dispatch = flow_system.transform.fix_sizes(sizes_with_margin)
fs_dispatch.name = 'Two-Stage'
fs_dispatch.optimize(solver)
time_dispatch = timeit.default_timer() - start
print(f'Dispatch time: {time_dispatch:.2f} seconds')
print(f'Final cost: {fs_dispatch.solution["costs"].item():,.0f} €')
Dispatch time: 7.45 seconds Final cost: -150,083 €
Compare Results¶
total_segmented = time_clustering + time_segmented
total_two_stage = total_segmented + time_dispatch
results = {
'Full (baseline)': {
'Time [s]': time_full,
'Cost [€]': fs_full.solution['costs'].item(),
'CHP': fs_full.stats.sizes['CHP(Q_th)'].item(),
'Boiler': fs_full.stats.sizes['Boiler(Q_th)'].item(),
'Storage': fs_full.stats.sizes['Storage'].item(),
},
'Segmented (8×6)': {
'Time [s]': total_segmented,
'Cost [€]': fs_segmented.solution['costs'].item(),
'CHP': fs_segmented.stats.sizes['CHP(Q_th)'].item(),
'Boiler': fs_segmented.stats.sizes['Boiler(Q_th)'].item(),
'Storage': fs_segmented.stats.sizes['Storage'].item(),
},
'Two-Stage': {
'Time [s]': total_two_stage,
'Cost [€]': fs_dispatch.solution['costs'].item(),
'CHP': sizes_with_margin['CHP(Q_th)'],
'Boiler': sizes_with_margin['Boiler(Q_th)'],
'Storage': sizes_with_margin['Storage'],
},
}
comparison = pd.DataFrame(results).T
baseline_cost = comparison.loc['Full (baseline)', 'Cost [€]']
baseline_time = comparison.loc['Full (baseline)', 'Time [s]']
comparison['Cost Gap [%]'] = ((comparison['Cost [€]'] - baseline_cost) / abs(baseline_cost) * 100).round(2)
comparison['Speedup'] = (baseline_time / comparison['Time [s]']).round(1)
comparison.style.format(
{
'Time [s]': '{:.2f}',
'Cost [€]': '{:,.0f}',
'CHP': '{:.1f}',
'Boiler': '{:.1f}',
'Storage': '{:.0f}',
'Cost Gap [%]': '{:.2f}',
'Speedup': '{:.1f}x',
}
)
| Time [s] | Cost [€] | CHP | Boiler | Storage | Cost Gap [%] | Speedup | |
|---|---|---|---|---|---|---|---|
| Full (baseline) | 17.14 | -148,912 | 165.7 | 0.0 | 1000 | 0.00 | 1.0x |
| Segmented (8×6) | 6.03 | -135,857 | 172.9 | 0.0 | 1000 | 8.77 | 2.8x |
| Two-Stage | 13.48 | -150,083 | 181.5 | 0.0 | 1050 | -0.79 | 1.3x |
Segmentation with Multi-Period Systems¶
Segmentation works with multi-period systems (multiple years, scenarios). Each period/scenario combination is segmented independently:
from data.generate_example_systems import create_multiperiod_system
fs_multi = create_multiperiod_system()
# Use first week only for faster demo
fs_multi = fs_multi.transform.isel(time=slice(0, 168))
print(f'Periods: {list(fs_multi.periods.values)}')
print(f'Scenarios: {list(fs_multi.scenarios.values)}')
Periods: [np.int64(2024), np.int64(2025), np.int64(2026)] Scenarios: ['high_demand', 'low_demand']
# Cluster with segmentation
fs_multi_seg = fs_multi.transform.cluster(
n_clusters=3,
cluster_duration='1D',
segments=SegmentConfig(n_segments=6),
extremes=ExtremeConfig(
method='replace', max_value=['Building(Heat)|fixed_relative_profile'], preserve_n_clusters=True
),
)
print(f'Original: {len(fs_multi.timesteps)} timesteps')
print(f'Segmented: {len(fs_multi_seg.timesteps)} × {len(fs_multi_seg.clusters)} clusters')
print(f'is_segmented: {fs_multi_seg.clustering.is_segmented}')
Original: 168 timesteps Segmented: 6 × 3 clusters is_segmented: True
# Cluster assignments have period/scenario dimensions
fs_multi_seg.clustering.cluster_assignments
<xarray.DataArray 'cluster_assignments' (original_cluster: 7, period: 3,
scenario: 2)> Size: 336B
array([[[0, 0],
[0, 0],
[0, 0]],
[[0, 0],
[0, 0],
[0, 0]],
[[0, 0],
[0, 0],
[0, 0]],
[[2, 2],
[2, 2],
[2, 2]],
[[0, 0],
[0, 0],
[0, 0]],
[[1, 1],
[1, 1],
[1, 1]],
[[1, 1],
[1, 1],
[1, 1]]])
Coordinates:
* period (period) int64 24B 2024 2025 2026
* scenario (scenario) object 16B 'high_demand' 'low_demand'
Dimensions without coordinates: original_cluster# Optimize and expand
fs_multi_seg.optimize(solver)
fs_multi_expanded = fs_multi_seg.transform.expand()
print(f'Expanded timesteps: {len(fs_multi_expanded.timesteps)}')
print(f'Objective: {fs_multi_expanded.solution["objective"].item():,.0f} €')
Expanded timesteps: 168 Objective: 29,674 €
API Reference¶
SegmentConfig Parameters¶
from tsam import SegmentConfig
segments = SegmentConfig(
n_segments=6, # Number of segments per cluster period
representation_method='mean', # How to represent segment values ('mean', 'medoid', etc.)
)
Segmentation Properties¶
After segmentation, fs.clustering has additional properties:
| Property | Description |
|---|---|
is_segmented | True if segmentation was used |
n_segments | Number of segments per cluster |
timesteps_per_cluster | Original timesteps per cluster (before segmentation) |
Timestep Duration¶
For segmented systems, fs.timestep_duration is a DataArray with (cluster, time) dimensions:
# Each segment has different duration
fs_segmented.timestep_duration # Shape: (n_clusters, n_segments)
# Sum should equal original period duration
fs_segmented.timestep_duration.sum('time') # Should be 24h for daily clusters
Example Workflow¶
from tsam import ExtremeConfig, SegmentConfig
# Cluster with segmentation
fs_segmented = flow_system.transform.cluster(
n_clusters=8,
cluster_duration='1D',
segments=SegmentConfig(n_segments=6),
extremes=ExtremeConfig(method='new_cluster', max_value=['Demand|profile'], preserve_n_clusters=True),
)
# Optimize
fs_segmented.optimize(solver)
# Expand back to original timesteps
fs_expanded = fs_segmented.transform.expand()
# Two-stage workflow
sizes = {k: v.item() * 1.05 for k, v in fs_segmented.stats.sizes.items()}
fs_dispatch = flow_system.transform.fix_sizes(sizes)
fs_dispatch.optimize(solver)
Summary¶
You learned how to:
- Use
SegmentConfigto enable intra-period segmentation - Work with variable timestep durations for each segment
- Combine clustering and segmentation for maximum problem size reduction
- Expand segmented solutions back to original timesteps
- Use segmentation with multi-period systems
Key Takeaways¶
- Segmentation reduces problem size further: From 8×24=192 to 8×6=48 timesteps
- Variable durations preserve accuracy: Important periods get more timesteps
- Works with multi-period: Each period/scenario is segmented independently
- expand() works correctly: Maps segment values to all original timesteps
- Two-stage is still recommended: Use segmentation for sizing, full resolution for dispatch
Trade-offs¶
| More Segments | Fewer Segments |
|---|---|
| Higher accuracy | Lower accuracy |
| Slower solve | Faster solve |
| More memory | Less memory |
Start with 6-12 segments and adjust based on your accuracy needs.