Time-Series Clustering¶

Time-series clustering reduces large optimization problems by aggregating timesteps into representative typical periods. This enables fast investment optimization while preserving key system dynamics.

When to Use Clustering¶

Use clustering when:

Optimizing over a full year or longer
Investment sizing is the primary goal (not detailed dispatch)
You need faster solve times and can accept approximation
The system has repeating patterns (daily, weekly, seasonal)

Skip clustering for:

Short optimization horizons (days to weeks)
Dispatch-only problems without investments
Systems requiring exact temporal sequences

Two-Stage Workflow¶

The recommended approach: cluster for fast sizing, then validate at full resolution.

import flixopt as fx
from tsam import ExtremeConfig

# Load or create your FlowSystem
flow_system = fx.FlowSystem(timesteps)
flow_system.add_elements(...)

# Stage 1: Cluster and optimize (fast)
fs_clustered = flow_system.transform.cluster(
    n_clusters=12,
    cluster_duration='1D',
    extremes=ExtremeConfig(method='new_cluster', max_value=['HeatDemand(Q)|fixed_relative_profile']),
)
fs_clustered.optimize(fx.solvers.HighsSolver())

# Stage 2: Expand back to full resolution
fs_expanded = fs_clustered.transform.expand()

# Access full-resolution results
charge_state = fs_expanded.solution['Storage|charge_state']
flow_rates = fs_expanded.solution['Boiler(Q_th)|flow_rate']

Clustering Parameters¶

Parameter	Description	Example
`n_clusters`	Number of typical periods	`12` (typical days for a year)
`cluster_duration`	Duration of each cluster	`'1D'`, `'24h'`, or `24` (hours)
`weights`	Clustering weights per time series	`{'demand': 2.0, 'solar': 1.0}`
`cluster`	tsam `ClusterConfig` for clustering options	`ClusterConfig(method='k_medoids')`
`extremes`	tsam `ExtremeConfig` for peak preservation	`ExtremeConfig(method='new_cluster', max_value=[...])`
`predef_cluster_assignments`	Manual cluster assignments	Array of cluster indices

Peak Selection with ExtremeConfig¶

Use ExtremeConfig to ensure extreme conditions are represented:

from tsam import ExtremeConfig

# Ensure the peak demand day is included
fs_clustered = flow_system.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
    extremes=ExtremeConfig(
        method='new_cluster',  # Create new cluster for extremes
        max_value=['HeatDemand(Q)|fixed_relative_profile'],  # Capture peak demand
    ),
)

Without peak selection, the clustering algorithm might average out extreme days, leading to undersized equipment.

ExtremeConfig options:

Field	Description
`method`	How extremes are handled: `'new_cluster'`, `'append'`, `'replace_cluster_center'`
`max_value`	Time series where maximum values should be preserved
`min_value`	Time series where minimum values should be preserved
`max_period`	Time series where period with maximum sum should be preserved
`min_period`	Time series where period with minimum sum should be preserved

Advanced Clustering Options with ClusterConfig¶

Fine-tune the clustering algorithm with ClusterConfig:

from tsam import ClusterConfig, ExtremeConfig

fs_clustered = flow_system.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
    cluster=ClusterConfig(
        method='hierarchical',  # Clustering algorithm
        representation='medoid',  # Use actual periods, not averages
    ),
    extremes=ExtremeConfig(method='new_cluster', max_value=['demand']),
)

Available clustering algorithms (ClusterConfig.method):

Method	Description
`'hierarchical'`	Produces consistent hierarchical groupings (default)
`'kmeans'`	Fast, good for most cases
`'kmedoids'`	Uses actual periods as representatives
`'kmaxoids'`	Maximizes representativeness
`'averaging'`	Simple averaging of similar periods

Representation methods (ClusterConfig.representation):

Method	Description
`'medoid'`	Use actual periods as representatives (default)
`'mean'`	Average of all periods in cluster
`'distribution'`	Preserve value distribution (duration curves)

For additional tsam parameters, pass them as keyword arguments:

# Pass any tsam.aggregate() parameter
fs_clustered = flow_system.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
    normalize_column_means=True,  # Normalize all time series to same mean
    preserve_column_means=True,  # Rescale results to match original means
)

Clustering Quality Metrics¶

After clustering, access quality metrics to evaluate the aggregation accuracy:

fs_clustered = flow_system.transform.cluster(n_clusters=8, cluster_duration='1D')

# Access clustering metrics (xr.Dataset)
metrics = fs_clustered.clustering.metrics
print(metrics)  # Shows RMSE, MAE, etc. per time series

# Access specific metric
rmse = metrics['RMSE']  # xr.DataArray with dims [time_series, period?, scenario?]

Storage Modes¶

Storage behavior during clustering is controlled via the cluster_mode parameter:

storage = fx.Storage(
    'SeasonalPit',
    capacity_in_flow_hours=5000,
    cluster_mode='intercluster_cyclic',  # Default
    ...
)

Available Modes¶

Mode	Behavior	Best For
`'intercluster_cyclic'`	Links storage across clusters + yearly cycling	Seasonal storage (pit, underground)
`'intercluster'`	Links storage across clusters, free start/end	Multi-year optimization
`'cyclic'`	Each cluster independent, but start = end	Daily storage (battery, hot water tank)
`'independent'`	Each cluster fully independent	Quick estimates, debugging

How Inter-Cluster Linking Works¶

For 'intercluster' and 'intercluster_cyclic' modes, the optimizer tracks:

SOC_boundary: Absolute state-of-charge at the start of each original period
charge_state: Relative change (ΔE) within each typical period

During expansion, these combine with self-discharge decay:

actual_SOC(t) = SOC_boundary[period] × (1 - loss)^t + ΔE(t)

This enables accurate modeling of seasonal storage that charges in summer and discharges in winter.

Choosing the Right Mode¶

# Seasonal pit storage - needs yearly linking
pit_storage = fx.Storage(
    'SeasonalPit',
    cluster_mode='intercluster_cyclic',
    capacity_in_flow_hours=10000,
    relative_loss_per_hour=0.0001,
    ...
)

# Daily hot water tank - only needs daily cycling
tank = fx.Storage(
    'HotWaterTank',
    cluster_mode='cyclic',
    capacity_in_flow_hours=50,
    ...
)

# Battery with quick estimate
battery = fx.Storage(
    'Battery',
    cluster_mode='independent',  # Fastest, ignores long-term effects
    ...
)

Multi-Dimensional Support¶

Clustering works with periods and scenarios:

# FlowSystem with multiple periods and scenarios
flow_system = fx.FlowSystem(
    timesteps,
    periods=pd.Index([2025, 2030, 2035], name='period'),
    scenarios=pd.Index(['low', 'base', 'high'], name='scenario'),
)

# Cluster - dimensions are preserved
fs_clustered = flow_system.transform.cluster(
    n_clusters=8,
    cluster_duration='1D',
)

# Solution has all dimensions
# Dims: (time, cluster, period, scenario)
flow_rate = fs_clustered.solution['Boiler(Q_th)|flow_rate']

Expanding Solutions¶

After optimization, expand results back to full resolution:

fs_expanded = fs_clustered.transform.expand()

# Full timesteps are restored
print(f"Original: {len(flow_system.timesteps)} timesteps")
print(f"Clustered: {len(fs_clustered.timesteps)} timesteps")
print(f"Expanded: {len(fs_expanded.timesteps)} timesteps")

# Storage charge state correctly reconstructed
charge_state = fs_expanded.solution['Storage|charge_state']

The expansion:

Maps each original timestep to its assigned cluster
For storage with inter-cluster linking, combines SOC_boundary with within-cluster charge_state
Applies self-discharge decay factors

Performance Tips¶

Cluster Count Selection¶

Time Horizon	Cluster Duration	Suggested n_clusters
1 year	1 day	8-16
1 year	1 week	4-8
Multiple years	1 day	12-24

Speed vs Accuracy Trade-off¶

# Fast (less accurate) - for quick estimates
fs_fast = flow_system.transform.cluster(n_clusters=4, cluster_duration='1D')

# Balanced - typical production use
fs_balanced = flow_system.transform.cluster(n_clusters=12, cluster_duration='1D')

# Accurate (slower) - for final results
fs_accurate = flow_system.transform.cluster(n_clusters=24, cluster_duration='1D')