Skip to content

flixopt.clustering.base

Clustering classes for time series aggregation.

This module provides wrapper classes around tsam's clustering functionality: - ClusteringResults: Collection of tsam ClusteringResult objects for multi-dim (period, scenario) data - Clustering: Top-level class stored on FlowSystem after clustering

Attributes

Classes

ClusteringResults

ClusteringResults(results: dict[tuple, ClusteringResult], dim_names: list[str])

Collection of tsam ClusteringResult objects for multi-dimensional data.

Manages multiple ClusteringResult objects keyed by (period, scenario) tuples and provides convenient access and multi-dimensional DataArray building.

Follows xarray-like patterns with .dims, .coords, .sel(), and .isel().

Attributes:

Name Type Description
dims tuple[str, ...]

Tuple of dimension names, e.g., ('period', 'scenario').

coords dict[str, list]

Dict mapping dimension names to their coordinate values.

Example

results = ClusteringResults({(): cr}, dim_names=[]) results.n_clusters 2 results.cluster_assignments # Returns DataArray

Multi-dimensional case

results = ClusteringResults( ... {(2024, 'high'): cr1, (2024, 'low'): cr2}, ... dim_names=['period', 'scenario'], ... ) results.dims ('period', 'scenario') results.coords {'period': [2024], 'scenario': ['high', 'low']} results.sel(period=2024, scenario='high') # Label-based results.isel(period=0, scenario=1) # Index-based

Initialize ClusteringResults.

Parameters:

Name Type Description Default
results dict[tuple, ClusteringResult]

Dict mapping (period, scenario) tuples to tsam ClusteringResult objects. For simple cases without periods/scenarios, use {(): result}.

required
dim_names list[str]

Names of extra dimensions, e.g., ['period', 'scenario'].

required

Attributes

dims property
dims: tuple[str, ...]

Dimension names as tuple (xarray-like).

dim_names property
dim_names: list[str]

Dimension names as list (backwards compatibility).

coords property
coords: dict[str, list]

Coordinate values for each dimension (xarray-like).

Returns:

Type Description
dict[str, list]

Dict mapping dimension names to lists of coordinate values.

n_clusters property
n_clusters: int

Number of clusters (same for all results).

timesteps_per_cluster property
timesteps_per_cluster: int

Number of timesteps per cluster (same for all results).

n_original_periods property
n_original_periods: int

Number of original periods (same for all results).

n_segments property
n_segments: int | None

Number of segments per cluster, or None if not segmented.

cluster_assignments property
cluster_assignments: DataArray

Maps each original cluster to its typical cluster index.

Returns:

Type Description
DataArray

DataArray with dims [original_cluster, period?, scenario?].

cluster_occurrences property
cluster_occurrences: DataArray

How many original clusters map to each typical cluster.

Returns:

Type Description
DataArray

DataArray with dims [cluster, period?, scenario?].

cluster_centers property
cluster_centers: DataArray

Which original cluster is the representative (center) for each typical cluster.

Returns:

Type Description
DataArray

DataArray with dims [cluster, period?, scenario?].

segment_assignments property
segment_assignments: DataArray | None

For each timestep within a cluster, which segment it belongs to.

Returns:

Type Description
DataArray | None

DataArray with dims [cluster, time, period?, scenario?], or None if not segmented.

segment_durations property
segment_durations: DataArray | None

Duration of each segment in timesteps.

Returns:

Type Description
DataArray | None

DataArray with dims [cluster, segment, period?, scenario?], or None if not segmented.

segment_centers property
segment_centers: DataArray | None

Center of each intra-period segment.

Only available if segmentation was configured during clustering.

Returns:

Type Description
DataArray | None

DataArray or None if no segmentation.

position_within_segment property
position_within_segment: DataArray | None

Position of each timestep within its segment (0-indexed).

For each (cluster, time) position, returns how many timesteps into the segment that position is. Used for interpolation within segments.

Returns:

Type Description
DataArray | None

DataArray with dims [cluster, time] or [cluster, time, period?, scenario?].

DataArray | None

Returns None if no segmentation.

Functions

sel
sel(**kwargs: Any) -> TsamClusteringResult

Select result by dimension labels (xarray-like).

Parameters:

Name Type Description Default
**kwargs Any

Dimension name=value pairs, e.g., period=2024, scenario='high'.

{}

Returns:

Type Description
ClusteringResult

The tsam ClusteringResult for the specified combination.

Raises:

Type Description
KeyError

If no result found for the specified combination.

Example

results.sel(period=2024, scenario='high')

isel
isel(**kwargs: int) -> TsamClusteringResult

Select result by dimension indices (xarray-like).

Parameters:

Name Type Description Default
**kwargs int

Dimension name=index pairs, e.g., period=0, scenario=1.

{}

Returns:

Type Description
ClusteringResult

The tsam ClusteringResult for the specified combination.

Raises:

Type Description
IndexError

If index is out of range for a dimension.

Example

results.isel(period=0, scenario=1)

items
items()

Iterate over (key, ClusteringResult) pairs.

keys
keys()

Iterate over keys.

values
values()

Iterate over ClusteringResult objects.

to_dict
to_dict() -> dict

Serialize to dict.

The dict can be used to reconstruct via from_dict().

from_dict classmethod
from_dict(d: dict) -> ClusteringResults

Reconstruct from dict.

Parameters:

Name Type Description Default
d dict

Dict from to_dict().

required

Returns:

Type Description
ClusteringResults

Reconstructed ClusteringResults.

apply
apply(data: Dataset) -> AggregationResults

Apply clustering to dataset for all (period, scenario) combinations.

Parameters:

Name Type Description Default
data Dataset

Dataset with time-varying data. Must have 'time' dimension. May have 'period' and/or 'scenario' dimensions matching this object.

required

Returns:

Type Description
AggregationResults

AggregationResults with full access to aggregated data.

AggregationResults

Use .clustering on the result to get ClusteringResults for IO.

Example

agg_results = clustering_results.apply(dataset) agg_results.clustering # Get ClusteringResults for IO for key, result in agg_results: ... print(result.cluster_representatives)

Clustering

Clustering(results: ClusteringResults | dict | None = None, original_timesteps: DatetimeIndex | list[str] | None = None, original_data: Dataset | None = None, aggregated_data: Dataset | None = None, _metrics: Dataset | None = None, _original_data_refs: list[str] | None = None, _metrics_refs: list[str] | None = None, _aggregation_results: dict[tuple, AggregationResult] | None = None, _dim_names: list[str] | None = None)

Clustering information for a FlowSystem.

Thin wrapper around tsam 3.0's AggregationResult objects, providing: 1. Multi-dimensional access for (period, scenario) combinations 2. Structure properties (n_clusters, dims, coords, cluster_assignments) 3. JSON persistence via ClusteringResults

Use sel() to access individual tsam AggregationResult objects for detailed analysis (cluster_representatives, accuracy, plotting).

Attributes:

Name Type Description
results ClusteringResults

ClusteringResults for structure access (works after JSON load).

original_timesteps

Original timesteps before clustering.

dims tuple[str, ...]

Dimension names, e.g., ('period', 'scenario').

coords dict[str, list]

Coordinate values, e.g., {'period': [2024, 2025]}.

Example

clustering = fs_clustered.clustering clustering.n_clusters 8 clustering.dims ('period',)

Access tsam AggregationResult for detailed analysis

result = clustering.sel(period=2024) result.cluster_representatives # DataFrame result.accuracy # AccuracyMetrics result.plot.compare() # tsam's built-in plotting

Initialize Clustering object.

Parameters:

Name Type Description Default
results ClusteringResults | dict | None

ClusteringResults instance, or dict from to_dict() (for deserialization). Not needed if _aggregation_results is provided.

None
original_timesteps DatetimeIndex | list[str] | None

Original timesteps before clustering.

None
original_data Dataset | None

Original dataset before clustering (for expand/plotting).

None
aggregated_data Dataset | None

Aggregated dataset after clustering (for plotting). After loading from file, this is reconstructed from FlowSystem data.

None
_metrics Dataset | None

Pre-computed metrics dataset.

None
_original_data_refs list[str] | None

Internal: resolved DataArrays from serialization.

None
_metrics_refs list[str] | None

Internal: resolved DataArrays from serialization.

None
_aggregation_results dict[tuple, AggregationResult] | None

Internal: dict of AggregationResult for full data access.

None
_dim_names list[str] | None

Internal: dimension names when using _aggregation_results.

None

Attributes

n_clusters property
n_clusters: int

Number of clusters (typical periods).

timesteps_per_cluster property
timesteps_per_cluster: int

Number of timesteps in each cluster.

timesteps_per_period property
timesteps_per_period: int

Alias for timesteps_per_cluster.

n_original_clusters property
n_original_clusters: int

Number of original periods (before clustering).

dim_names property
dim_names: list[str]

Names of extra dimensions, e.g., ['period', 'scenario'].

dims property
dims: tuple[str, ...]

Dimension names as tuple (xarray-like).

coords property
coords: dict[str, list]

Coordinate values for each dimension (xarray-like).

Returns:

Type Description
dict[str, list]

Dict mapping dimension names to lists of coordinate values.

Example

clustering.coords

is_segmented property
is_segmented: bool

Whether intra-period segmentation was used.

Segmented systems have variable timestep durations within each cluster, where each segment represents a different number of original timesteps.

n_segments property
n_segments: int | None

Number of segments per cluster, or None if not segmented.

cluster_assignments property
cluster_assignments: DataArray

Mapping from original periods to cluster IDs.

Returns:

Type Description
DataArray

DataArray with dims [original_cluster] or [original_cluster, period?, scenario?].

n_representatives property
n_representatives: int

Number of representative timesteps after clustering.

cluster_occurrences property
cluster_occurrences: DataArray

Count of how many original periods each cluster represents.

Returns:

Type Description
DataArray

DataArray with dims [cluster] or [cluster, period?, scenario?].

representative_weights property
representative_weights: DataArray

Weight for each cluster (number of original periods it represents).

This is the same as cluster_occurrences but named for API consistency. Used as cluster_weight in FlowSystem.

timestep_mapping cached property
timestep_mapping: DataArray

Mapping from original timesteps to representative timestep indices.

Each value indicates which representative timestep index (0 to n_representatives-1) corresponds to each original timestep.

Note: This property is cached for performance since it's accessed frequently during expand() operations.

metrics property
metrics: Dataset

Clustering quality metrics (RMSE, MAE, etc.).

Returns:

Type Description
Dataset

Dataset with dims [time_series, period?, scenario?], or empty Dataset if no metrics.

cluster_start_positions property
cluster_start_positions: ndarray

Integer positions where clusters start in reduced timesteps.

Returns:

Type Description
ndarray

1D array: [0, T, 2T, ...] where T = timesteps_per_cluster (or n_segments if segmented).

cluster_centers property
cluster_centers: DataArray

Which original period is the representative (center) for each cluster.

Returns:

Type Description
DataArray

DataArray with dims [cluster] containing original period indices.

segment_assignments property
segment_assignments: DataArray | None

For each timestep within a cluster, which intra-period segment it belongs to.

Only available if segmentation was configured during clustering.

Returns:

Type Description
DataArray | None

DataArray with dims [cluster, time] or None if no segmentation.

segment_durations property
segment_durations: DataArray | None

Duration of each intra-period segment in hours.

Only available if segmentation was configured during clustering.

Returns:

Type Description
DataArray | None

DataArray with dims [cluster, segment] or None if no segmentation.

segment_centers property
segment_centers: DataArray | None

Center of each intra-period segment.

Only available if segmentation was configured during clustering.

Returns:

Type Description
DataArray | None

DataArray with dims [cluster, segment] or None if no segmentation.

plot property
plot: ClusteringPlotAccessor

Access plotting methods for clustering visualization.

Returns:

Type Description
ClusteringPlotAccessor

ClusteringPlotAccessor with compare(), heatmap(), and clusters() methods.

results property
results: ClusteringResults

ClusteringResults for structure access (derived from AggregationResults or cached).

Functions

sel
sel(period: int | str | None = None, scenario: str | None = None) -> AggregationResult

Select AggregationResult by period and/or scenario.

Access individual tsam AggregationResult objects for detailed analysis.

Note

This method is only available before saving/loading the FlowSystem. After IO (to_dataset/from_dataset or to_json), the full AggregationResult data is not preserved. Use results.sel() for structure-only access after loading.

Parameters:

Name Type Description Default
period int | str | None

Period value (e.g., 2024). Required if clustering has periods.

None
scenario str | None

Scenario name (e.g., 'high'). Required if clustering has scenarios.

None

Returns:

Type Description
AggregationResult

The tsam AggregationResult for the specified combination.

AggregationResult

Access its properties like cluster_representatives, accuracy, etc.

Raises:

Type Description
KeyError

If no result found for the specified combination.

ValueError

If accessed on a Clustering loaded from JSON/NetCDF.

Example

result = clustering.sel(period=2024, scenario='high') result.cluster_representatives # DataFrame with aggregated data result.accuracy # AccuracyMetrics result.plot.compare() # tsam's built-in comparison plot

expand_data
expand_data(aggregated: DataArray, original_time: DatetimeIndex | None = None) -> xr.DataArray

Expand aggregated data back to original timesteps.

Uses the timestep_mapping to map each original timestep to its representative value from the aggregated data. Fully vectorized using xarray's advanced indexing - no loops over period/scenario dimensions.

Parameters:

Name Type Description Default
aggregated DataArray

DataArray with aggregated (cluster, time) or (time,) dimension.

required
original_time DatetimeIndex | None

Original time coordinates. Defaults to self.original_timesteps.

None

Returns:

Type Description
DataArray

DataArray expanded to original timesteps.

build_expansion_divisor
build_expansion_divisor(original_time: DatetimeIndex | None = None) -> xr.DataArray

Build divisor for correcting segment totals when expanding to hourly.

For segmented systems, each segment value is a total that gets repeated N times when expanded to hourly resolution (where N = segment duration in timesteps). This divisor allows converting those totals back to hourly rates during expansion.

For each original timestep, returns the number of original timesteps that map to the same (cluster, segment) - i.e., the segment duration in timesteps.

Fully vectorized using xarray's advanced indexing - no loops over period/scenario.

Parameters:

Name Type Description Default
original_time DatetimeIndex | None

Original time coordinates. Defaults to self.original_timesteps.

None

Returns:

Type Description
DataArray

DataArray with dims ['time'] or ['time', 'period'?, 'scenario'?] containing

DataArray

the number of timesteps in each segment, aligned to original timesteps.

get_result
get_result(period: Any = None, scenario: Any = None) -> TsamClusteringResult

Get the tsam ClusteringResult for a specific (period, scenario).

Parameters:

Name Type Description Default
period Any

Period label (if applicable).

None
scenario Any

Scenario label (if applicable).

None

Returns:

Type Description
ClusteringResult

The tsam ClusteringResult for the specified combination.

apply
apply(data: DataFrame, period: Any = None, scenario: Any = None) -> AggregationResult

Apply the saved clustering to new data.

Parameters:

Name Type Description Default
data DataFrame

DataFrame with time series data to cluster.

required
period Any

Period label (if applicable).

None
scenario Any

Scenario label (if applicable).

None

Returns:

Type Description
AggregationResult

tsam AggregationResult with the clustering applied.

to_json
to_json(path: str | Path) -> None

Save the clustering for reuse.

Uses ClusteringResults.to_dict() which preserves full tsam ClusteringResult. Can be loaded later with Clustering.from_json() and used with flow_system.transform.apply_clustering().

Parameters:

Name Type Description Default
path str | Path

Path to save the JSON file.

required
from_json classmethod
from_json(path: str | Path, original_timesteps: DatetimeIndex | None = None) -> Clustering

Load a clustering from JSON.

The loaded Clustering has full apply() support because ClusteringResult is fully preserved via tsam's serialization.

Parameters:

Name Type Description Default
path str | Path

Path to the JSON file.

required
original_timesteps DatetimeIndex | None

Original timesteps for the new FlowSystem. If None, uses the timesteps stored in the JSON.

None

Returns:

Type Description
Clustering

A Clustering that can be used with apply_clustering().

items
items()

Iterate over (key, AggregationResult) pairs.

Raises:

Type Description
ValueError

If accessed on a Clustering loaded from JSON.

keys
keys()

Iterate over (period, scenario) keys.

values
values()

Iterate over AggregationResult objects.

Raises:

Type Description
ValueError

If accessed on a Clustering loaded from JSON.

ClusteringPlotAccessor

ClusteringPlotAccessor(clustering: Clustering)

Plot accessor for Clustering objects.

Provides visualization methods for comparing original vs aggregated data and understanding the clustering structure.

Functions

compare
compare(kind: str = 'timeseries', variables: str | list[str] | None = None, *, select: SelectType | None = None, colors: ColorType | None = None, show: bool | None = None, data_only: bool = False, **plotly_kwargs: Any) -> PlotResult

Compare original vs aggregated data.

Parameters:

Name Type Description Default
kind str

Type of comparison plot. - 'timeseries': Time series comparison (default) - 'duration_curve': Sorted duration curve comparison

'timeseries'
variables str | list[str] | None

Variable(s) to plot. Can be a string, list of strings, or None to plot all time-varying variables.

None
select SelectType | None

xarray-style selection dict, e.g. {'scenario': 'Base Case'}.

None
colors ColorType | None

Color specification (colorscale name, color list, or label-to-color dict).

None
show bool | None

Whether to display the figure. Defaults to CONFIG.Plotting.default_show.

None
data_only bool

If True, skip figure creation and return only data.

False
**plotly_kwargs Any

Additional arguments passed to plotly (e.g., color, line_dash, facet_col, facet_row). Defaults: x='time'/'duration', color='variable', line_dash='representation', symbol=None.

{}

Returns:

Type Description
PlotResult

PlotResult containing the comparison figure and underlying data.

heatmap
heatmap(*, select: SelectType | None = None, colors: str | list[str] | None = None, show: bool | None = None, data_only: bool = False, **plotly_kwargs: Any) -> PlotResult

Plot cluster assignments over time as a heatmap timeline.

Shows which cluster each timestep belongs to as a horizontal color bar. The x-axis is time, color indicates cluster assignment. This visualization aligns with time series data, making it easy to correlate cluster assignments with other plots.

For multi-period/scenario data, uses faceting and/or animation.

Parameters:

Name Type Description Default
select SelectType | None

xarray-style selection dict, e.g. {'scenario': 'Base Case'}.

None
colors str | list[str] | None

Colorscale name (str) or list of colors for heatmap coloring. Dicts are not supported for heatmaps. Defaults to plotly template's sequential colorscale.

None
show bool | None

Whether to display the figure. Defaults to CONFIG.Plotting.default_show.

None
data_only bool

If True, skip figure creation and return only data.

False
**plotly_kwargs Any

Additional arguments passed to plotly (e.g., facet_col, animation_frame).

{}

Returns:

Type Description
PlotResult

PlotResult containing the heatmap figure and cluster assignment data.

PlotResult

The data has 'cluster' variable with time dimension, matching original timesteps.

clusters
clusters(variables: str | list[str] | None = None, *, select: SelectType | None = None, colors: ColorType | None = None, show: bool | None = None, data_only: bool = False, **plotly_kwargs: Any) -> PlotResult

Plot each cluster's typical period profile.

Shows each cluster as a separate faceted subplot with all variables colored differently. Useful for understanding what each cluster represents.

Parameters:

Name Type Description Default
variables str | list[str] | None

Variable(s) to plot. Can be a string, list of strings, or None to plot all time-varying variables.

None
select SelectType | None

xarray-style selection dict, e.g. {'scenario': 'Base Case'}.

None
colors ColorType | None

Color specification (colorscale name, color list, or label-to-color dict).

None
show bool | None

Whether to display the figure. Defaults to CONFIG.Plotting.default_show.

None
data_only bool

If True, skip figure creation and return only data.

False
**plotly_kwargs Any

Additional arguments passed to plotly (e.g., color, facet_col, facet_col_wrap). Defaults: x='time', color='variable', symbol=None.

{}

Returns:

Type Description
PlotResult

PlotResult containing the figure and underlying data.