flixopt.clustering.base ¶

Clustering classes for time series aggregation.

This module provides wrapper classes around tsam's clustering functionality: - ClusteringResults: Collection of tsam ClusteringResult objects for multi-dim (period, scenario) data - Clustering: Top-level class stored on FlowSystem after clustering

Attributes¶

Classes¶

ClusteringResults ¶

ClusteringResults(results: dict[tuple, ClusteringResult], dim_names: list[str])

Collection of tsam ClusteringResult objects for multi-dimensional data.

Manages multiple ClusteringResult objects keyed by (period, scenario) tuples and provides convenient access and multi-dimensional DataArray building.

Follows xarray-like patterns with .dims, .coords, .sel(), and .isel().

Attributes:

Name	Type	Description
`dims`	`tuple[str, ...]`	Tuple of dimension names, e.g., ('period', 'scenario').
`coords`	`dict[str, list]`	Dict mapping dimension names to their coordinate values.

Example

results = ClusteringResults({(): cr}, dim_names=[]) results.n_clusters 2 results.cluster_assignments # Returns DataArray

Multi-dimensional case¶

results = ClusteringResults( ... {(2024, 'high'): cr1, (2024, 'low'): cr2}, ... dim_names=['period', 'scenario'], ... ) results.dims ('period', 'scenario') results.coords {'period': [2024], 'scenario': ['high', 'low']} results.sel(period=2024, scenario='high') # Label-based results.isel(period=0, scenario=1) # Index-based

Initialize ClusteringResults.

Parameters:

Name	Type	Description	Default
`results`	`dict[tuple, ClusteringResult]`	Dict mapping (period, scenario) tuples to tsam ClusteringResult objects. For simple cases without periods/scenarios, use {(): result}.	required
`dim_names`	`list[str]`	Names of extra dimensions, e.g., ['period', 'scenario'].	required

Attributes¶

dims `property` ¶

dims: tuple[str, ...]

Dimension names as tuple (xarray-like).

dim_names `property` ¶

dim_names: list[str]

Dimension names as list (backwards compatibility).

coords `property` ¶

coords: dict[str, list]

Coordinate values for each dimension (xarray-like).

Returns:

Type	Description
`dict[str, list]`	Dict mapping dimension names to lists of coordinate values.

n_clusters `property` ¶

n_clusters: int

Number of clusters (same for all results).

timesteps_per_cluster `property` ¶

timesteps_per_cluster: int

Number of timesteps per cluster (same for all results).

n_original_periods `property` ¶

n_original_periods: int

Number of original periods (same for all results).

n_segments `property` ¶

n_segments: int | None

Number of segments per cluster, or None if not segmented.

cluster_assignments `property` ¶

cluster_assignments: DataArray

Maps each original cluster to its typical cluster index.

Returns:

Type	Description
`DataArray`	DataArray with dims [original_cluster, period?, scenario?].

cluster_occurrences `property` ¶

cluster_occurrences: DataArray

How many original clusters map to each typical cluster.

Returns:

Type	Description
`DataArray`	DataArray with dims [cluster, period?, scenario?].

cluster_centers `property` ¶

cluster_centers: DataArray

Which original cluster is the representative (center) for each typical cluster.

Returns:

Type	Description
`DataArray`	DataArray with dims [cluster, period?, scenario?].

segment_assignments `property` ¶

segment_assignments: DataArray | None

For each timestep within a cluster, which segment it belongs to.

Returns:

Type	Description
`DataArray \| None`	DataArray with dims [cluster, time, period?, scenario?], or None if not segmented.

segment_durations `property` ¶

segment_durations: DataArray | None

Duration of each segment in timesteps.

Returns:

Type	Description
`DataArray \| None`	DataArray with dims [cluster, segment, period?, scenario?], or None if not segmented.

segment_centers `property` ¶

segment_centers: DataArray | None

Center of each intra-period segment.

Only available if segmentation was configured during clustering.

Returns:

Type	Description
`DataArray \| None`	DataArray or None if no segmentation.

position_within_segment `property` ¶

position_within_segment: DataArray | None

Position of each timestep within its segment (0-indexed).

For each (cluster, time) position, returns how many timesteps into the segment that position is. Used for interpolation within segments.

Returns:

Type	Description
`DataArray \| None`	DataArray with dims [cluster, time] or [cluster, time, period?, scenario?].
`DataArray \| None`	Returns None if no segmentation.

Functions¶

sel ¶

sel(**kwargs: Any) -> TsamClusteringResult

Select result by dimension labels (xarray-like).

Parameters:

Name	Type	Description	Default
`**kwargs`	`Any`	Dimension name=value pairs, e.g., period=2024, scenario='high'.	`{}`

Returns:

Type	Description
`ClusteringResult`	The tsam ClusteringResult for the specified combination.

Raises:

Type	Description
`KeyError`	If no result found for the specified combination.

Example

results.sel(period=2024, scenario='high')

isel ¶

isel(**kwargs: int) -> TsamClusteringResult

Select result by dimension indices (xarray-like).

Parameters:

Name	Type	Description	Default
`**kwargs`	`int`	Dimension name=index pairs, e.g., period=0, scenario=1.	`{}`

Returns:

Type	Description
`ClusteringResult`	The tsam ClusteringResult for the specified combination.

Raises:

Type	Description
`IndexError`	If index is out of range for a dimension.

Example

results.isel(period=0, scenario=1)

items ¶

items()

Iterate over (key, ClusteringResult) pairs.

keys ¶

keys()

Iterate over keys.

values ¶

values()

Iterate over ClusteringResult objects.

to_dict ¶

to_dict() -> dict

Serialize to dict.

The dict can be used to reconstruct via from_dict().

from_dict `classmethod` ¶

from_dict(d: dict) -> ClusteringResults

Reconstruct from dict.

Parameters:

Name	Type	Description	Default
`d`	`dict`	Dict from to_dict().	required

Returns:

Type	Description
`ClusteringResults`	Reconstructed ClusteringResults.

apply ¶

apply(data: Dataset) -> AggregationResults

Apply clustering to dataset for all (period, scenario) combinations.

Parameters:

Name	Type	Description	Default
`data`	`Dataset`	Dataset with time-varying data. Must have 'time' dimension. May have 'period' and/or 'scenario' dimensions matching this object.	required

Returns:

Type	Description
`AggregationResults`	AggregationResults with full access to aggregated data.
`AggregationResults`	Use `.clustering` on the result to get ClusteringResults for IO.

Example

agg_results = clustering_results.apply(dataset) agg_results.clustering # Get ClusteringResults for IO for key, result in agg_results: ... print(result.cluster_representatives)

Clustering ¶

Clustering(results: ClusteringResults | dict | None = None, original_timesteps: DatetimeIndex | list[str] | None = None, original_data: Dataset | None = None, aggregated_data: Dataset | None = None, _metrics: Dataset | None = None, _original_data_refs: list[str] | None = None, _metrics_refs: list[str] | None = None, _aggregation_results: dict[tuple, AggregationResult] | None = None, _dim_names: list[str] | None = None)

Clustering information for a FlowSystem.

Thin wrapper around tsam 3.0's AggregationResult objects, providing: 1. Multi-dimensional access for (period, scenario) combinations 2. Structure properties (n_clusters, dims, coords, cluster_assignments) 3. JSON persistence via ClusteringResults

Use sel() to access individual tsam AggregationResult objects for detailed analysis (cluster_representatives, accuracy, plotting).

Attributes:

Name	Type	Description
`results`	`ClusteringResults`	ClusteringResults for structure access (works after JSON load).
`original_timesteps`		Original timesteps before clustering.
`dims`	`tuple[str, ...]`	Dimension names, e.g., ('period', 'scenario').
`coords`	`dict[str, list]`	Coordinate values, e.g., {'period': [2024, 2025]}.

Example

clustering = fs_clustered.clustering clustering.n_clusters 8 clustering.dims ('period',)

Access tsam AggregationResult for detailed analysis¶

result = clustering.sel(period=2024) result.cluster_representatives # DataFrame result.accuracy # AccuracyMetrics result.plot.compare() # tsam's built-in plotting

Initialize Clustering object.

Parameters:

Name	Type	Description	Default
`results`	`ClusteringResults \| dict \| None`	ClusteringResults instance, or dict from to_dict() (for deserialization). Not needed if _aggregation_results is provided.	`None`
`original_timesteps`	`DatetimeIndex \| list[str] \| None`	Original timesteps before clustering.	`None`
`original_data`	`Dataset \| None`	Original dataset before clustering (for expand/plotting).	`None`
`aggregated_data`	`Dataset \| None`	Aggregated dataset after clustering (for plotting). After loading from file, this is reconstructed from FlowSystem data.	`None`
`_metrics`	`Dataset \| None`	Pre-computed metrics dataset.	`None`
`_original_data_refs`	`list[str] \| None`	Internal: resolved DataArrays from serialization.	`None`
`_metrics_refs`	`list[str] \| None`	Internal: resolved DataArrays from serialization.	`None`
`_aggregation_results`	`dict[tuple, AggregationResult] \| None`	Internal: dict of AggregationResult for full data access.	`None`
`_dim_names`	`list[str] \| None`	Internal: dimension names when using _aggregation_results.	`None`

Attributes¶

n_clusters `property` ¶

n_clusters: int

Number of clusters (typical periods).

timesteps_per_cluster `property` ¶

timesteps_per_cluster: int

Number of timesteps in each cluster.

timesteps_per_period `property` ¶

timesteps_per_period: int

Alias for timesteps_per_cluster.

n_original_clusters `property` ¶

n_original_clusters: int

Number of original periods (before clustering).

dim_names `property` ¶

dim_names: list[str]

Names of extra dimensions, e.g., ['period', 'scenario'].

dims `property` ¶

dims: tuple[str, ...]

Dimension names as tuple (xarray-like).

coords `property` ¶

coords: dict[str, list]

Coordinate values for each dimension (xarray-like).

Returns:

Type	Description
`dict[str, list]`	Dict mapping dimension names to lists of coordinate values.

Example

clustering.coords

is_segmented `property` ¶

is_segmented: bool

Whether intra-period segmentation was used.

Segmented systems have variable timestep durations within each cluster, where each segment represents a different number of original timesteps.

n_segments `property` ¶

n_segments: int | None

Number of segments per cluster, or None if not segmented.

cluster_assignments `property` ¶

cluster_assignments: DataArray

Mapping from original periods to cluster IDs.

Returns:

Type	Description
`DataArray`	DataArray with dims [original_cluster] or [original_cluster, period?, scenario?].

n_representatives `property` ¶

n_representatives: int

Number of representative timesteps after clustering.

cluster_occurrences `property` ¶

cluster_occurrences: DataArray

Count of how many original periods each cluster represents.

Returns:

Type	Description
`DataArray`	DataArray with dims [cluster] or [cluster, period?, scenario?].

representative_weights `property` ¶

representative_weights: DataArray

Weight for each cluster (number of original periods it represents).

This is the same as cluster_occurrences but named for API consistency. Used as cluster_weight in FlowSystem.

timestep_mapping `cached` `property` ¶

timestep_mapping: DataArray

Mapping from original timesteps to representative timestep indices.

Each value indicates which representative timestep index (0 to n_representatives-1) corresponds to each original timestep.

Note: This property is cached for performance since it's accessed frequently during expand() operations.

metrics `property` ¶

metrics: Dataset

Clustering quality metrics (RMSE, MAE, etc.).

Returns:

Type	Description
`Dataset`	Dataset with dims [time_series, period?, scenario?], or empty Dataset if no metrics.

cluster_start_positions `property` ¶

cluster_start_positions: ndarray

Integer positions where clusters start in reduced timesteps.

Returns:

Type	Description
`ndarray`	1D array: [0, T, 2T, ...] where T = timesteps_per_cluster (or n_segments if segmented).

cluster_centers `property` ¶

cluster_centers: DataArray

Which original period is the representative (center) for each cluster.

Returns:

Type	Description
`DataArray`	DataArray with dims [cluster] containing original period indices.

segment_assignments `property` ¶

segment_assignments: DataArray | None

For each timestep within a cluster, which intra-period segment it belongs to.

Only available if segmentation was configured during clustering.

Returns:

Type	Description
`DataArray \| None`	DataArray with dims [cluster, time] or None if no segmentation.

segment_durations `property` ¶

segment_durations: DataArray | None

Duration of each intra-period segment in hours.

Only available if segmentation was configured during clustering.

Returns:

Type	Description
`DataArray \| None`	DataArray with dims [cluster, segment] or None if no segmentation.

segment_centers `property` ¶

segment_centers: DataArray | None

Center of each intra-period segment.

Only available if segmentation was configured during clustering.

Returns:

Type	Description
`DataArray \| None`	DataArray with dims [cluster, segment] or None if no segmentation.

plot `property` ¶

plot: ClusteringPlotAccessor

Access plotting methods for clustering visualization.

Returns:

Type	Description
`ClusteringPlotAccessor`	ClusteringPlotAccessor with compare(), heatmap(), and clusters() methods.

results `property` ¶

results: ClusteringResults

ClusteringResults for structure access (derived from AggregationResults or cached).

Functions¶

sel ¶

sel(period: int | str | None = None, scenario: str | None = None) -> AggregationResult

Select AggregationResult by period and/or scenario.

Access individual tsam AggregationResult objects for detailed analysis.

Note

This method is only available before saving/loading the FlowSystem. After IO (to_dataset/from_dataset or to_json), the full AggregationResult data is not preserved. Use results.sel() for structure-only access after loading.

Parameters:

Name	Type	Description	Default
`period`	`int \| str \| None`	Period value (e.g., 2024). Required if clustering has periods.	`None`
`scenario`	`str \| None`	Scenario name (e.g., 'high'). Required if clustering has scenarios.	`None`

Returns:

Type	Description
`AggregationResult`	The tsam AggregationResult for the specified combination.
`AggregationResult`	Access its properties like `cluster_representatives`, `accuracy`, etc.

Raises:

Type	Description
`KeyError`	If no result found for the specified combination.
`ValueError`	If accessed on a Clustering loaded from JSON/NetCDF.

Example

result = clustering.sel(period=2024, scenario='high') result.cluster_representatives # DataFrame with aggregated data result.accuracy # AccuracyMetrics result.plot.compare() # tsam's built-in comparison plot

expand_data ¶

expand_data(aggregated: DataArray, original_time: DatetimeIndex | None = None) -> xr.DataArray

Expand aggregated data back to original timesteps.

Uses the timestep_mapping to map each original timestep to its representative value from the aggregated data. Fully vectorized using xarray's advanced indexing - no loops over period/scenario dimensions.

Parameters:

Name	Type	Description	Default
`aggregated`	`DataArray`	DataArray with aggregated (cluster, time) or (time,) dimension.	required
`original_time`	`DatetimeIndex \| None`	Original time coordinates. Defaults to self.original_timesteps.	`None`

Returns:

Type	Description
`DataArray`	DataArray expanded to original timesteps.

build_expansion_divisor ¶

build_expansion_divisor(original_time: DatetimeIndex | None = None) -> xr.DataArray

Build divisor for correcting segment totals when expanding to hourly.

For segmented systems, each segment value is a total that gets repeated N times when expanded to hourly resolution (where N = segment duration in timesteps). This divisor allows converting those totals back to hourly rates during expansion.

For each original timestep, returns the number of original timesteps that map to the same (cluster, segment) - i.e., the segment duration in timesteps.

Fully vectorized using xarray's advanced indexing - no loops over period/scenario.

Parameters:

Name	Type	Description	Default
`original_time`	`DatetimeIndex \| None`	Original time coordinates. Defaults to self.original_timesteps.	`None`

Returns:

Type	Description
`DataArray`	DataArray with dims ['time'] or ['time', 'period'?, 'scenario'?] containing
`DataArray`	the number of timesteps in each segment, aligned to original timesteps.

get_result ¶

get_result(period: Any = None, scenario: Any = None) -> TsamClusteringResult

Get the tsam ClusteringResult for a specific (period, scenario).

Parameters:

Name	Type	Description	Default
`period`	`Any`	Period label (if applicable).	`None`
`scenario`	`Any`	Scenario label (if applicable).	`None`

Returns:

Type	Description
`ClusteringResult`	The tsam ClusteringResult for the specified combination.

apply ¶

apply(data: DataFrame, period: Any = None, scenario: Any = None) -> AggregationResult

Apply the saved clustering to new data.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	DataFrame with time series data to cluster.	required
`period`	`Any`	Period label (if applicable).	`None`
`scenario`	`Any`	Scenario label (if applicable).	`None`

Returns:

Type	Description
`AggregationResult`	tsam AggregationResult with the clustering applied.

to_json ¶

to_json(path: str | Path) -> None

Save the clustering for reuse.

Uses ClusteringResults.to_dict() which preserves full tsam ClusteringResult. Can be loaded later with Clustering.from_json() and used with flow_system.transform.apply_clustering().

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Path to save the JSON file.	required

from_json `classmethod` ¶

from_json(path: str | Path, original_timesteps: DatetimeIndex | None = None) -> Clustering

Load a clustering from JSON.

The loaded Clustering has full apply() support because ClusteringResult is fully preserved via tsam's serialization.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Path to the JSON file.	required
`original_timesteps`	`DatetimeIndex \| None`	Original timesteps for the new FlowSystem. If None, uses the timesteps stored in the JSON.	`None`

Returns:

Type	Description
`Clustering`	A Clustering that can be used with apply_clustering().

items ¶

items()

Iterate over (key, AggregationResult) pairs.

Raises:

Type	Description
`ValueError`	If accessed on a Clustering loaded from JSON.

keys ¶

keys()

Iterate over (period, scenario) keys.

values ¶

values()

Iterate over AggregationResult objects.

Raises:

Type	Description
`ValueError`	If accessed on a Clustering loaded from JSON.

ClusteringPlotAccessor ¶

ClusteringPlotAccessor(clustering: Clustering)

Plot accessor for Clustering objects.

Provides visualization methods for comparing original vs aggregated data and understanding the clustering structure.

Functions¶

compare ¶

compare(kind: str = 'timeseries', variables: str | list[str] | None = None, *, select: SelectType | None = None, colors: ColorType | None = None, show: bool | None = None, data_only: bool = False, **plotly_kwargs: Any) -> PlotResult

Compare original vs aggregated data.

Parameters:

Name	Type	Description	Default
`kind`	`str`	Type of comparison plot. - 'timeseries': Time series comparison (default) - 'duration_curve': Sorted duration curve comparison	`'timeseries'`
`variables`	`str \| list[str] \| None`	Variable(s) to plot. Can be a string, list of strings, or None to plot all time-varying variables.	`None`
`select`	`SelectType \| None`	xarray-style selection dict, e.g. {'scenario': 'Base Case'}.	`None`
`colors`	`ColorType \| None`	Color specification (colorscale name, color list, or label-to-color dict).	`None`
`show`	`bool \| None`	Whether to display the figure. Defaults to CONFIG.Plotting.default_show.	`None`
`data_only`	`bool`	If True, skip figure creation and return only data.	`False`
`**plotly_kwargs`	`Any`	Additional arguments passed to plotly (e.g., color, line_dash, facet_col, facet_row). Defaults: x='time'/'duration', color='variable', line_dash='representation', symbol=None.	`{}`

Returns:

Type	Description
`PlotResult`	PlotResult containing the comparison figure and underlying data.

heatmap ¶

heatmap(*, select: SelectType | None = None, colors: str | list[str] | None = None, show: bool | None = None, data_only: bool = False, **plotly_kwargs: Any) -> PlotResult

Plot cluster assignments over time as a heatmap timeline.

Shows which cluster each timestep belongs to as a horizontal color bar. The x-axis is time, color indicates cluster assignment. This visualization aligns with time series data, making it easy to correlate cluster assignments with other plots.

For multi-period/scenario data, uses faceting and/or animation.

Parameters:

Name	Type	Description	Default
`select`	`SelectType \| None`	xarray-style selection dict, e.g. {'scenario': 'Base Case'}.	`None`
`colors`	`str \| list[str] \| None`	Colorscale name (str) or list of colors for heatmap coloring. Dicts are not supported for heatmaps. Defaults to plotly template's sequential colorscale.	`None`
`show`	`bool \| None`	Whether to display the figure. Defaults to CONFIG.Plotting.default_show.	`None`
`data_only`	`bool`	If True, skip figure creation and return only data.	`False`
`**plotly_kwargs`	`Any`	Additional arguments passed to plotly (e.g., facet_col, animation_frame).	`{}`

Returns:

Type	Description
`PlotResult`	PlotResult containing the heatmap figure and cluster assignment data.
`PlotResult`	The data has 'cluster' variable with time dimension, matching original timesteps.

clusters ¶

clusters(variables: str | list[str] | None = None, *, select: SelectType | None = None, colors: ColorType | None = None, show: bool | None = None, data_only: bool = False, **plotly_kwargs: Any) -> PlotResult

Plot each cluster's typical period profile.

Shows each cluster as a separate faceted subplot with all variables colored differently. Useful for understanding what each cluster represents.

Parameters:

Name	Type	Description	Default
`variables`	`str \| list[str] \| None`	Variable(s) to plot. Can be a string, list of strings, or None to plot all time-varying variables.	`None`
`select`	`SelectType \| None`	xarray-style selection dict, e.g. {'scenario': 'Base Case'}.	`None`
`colors`	`ColorType \| None`	Color specification (colorscale name, color list, or label-to-color dict).	`None`
`show`	`bool \| None`	Whether to display the figure. Defaults to CONFIG.Plotting.default_show.	`None`
`data_only`	`bool`	If True, skip figure creation and return only data.	`False`
`**plotly_kwargs`	`Any`	Additional arguments passed to plotly (e.g., color, facet_col, facet_col_wrap). Defaults: x='time', color='variable', symbol=None.	`{}`

Returns:

Type	Description
`PlotResult`	PlotResult containing the figure and underlying data.

flixopt.clustering.base ¶

Attributes¶

Classes¶

ClusteringResults ¶

Multi-dimensional case¶

Attributes¶

dims property ¶

dim_names property ¶

coords property ¶

n_clusters property ¶

timesteps_per_cluster property ¶

n_original_periods property ¶

n_segments property ¶

cluster_assignments property ¶

cluster_occurrences property ¶

cluster_centers property ¶

segment_assignments property ¶

segment_durations property ¶

segment_centers property ¶

position_within_segment property ¶

Functions¶

sel ¶

isel ¶

items ¶

keys ¶

values ¶

to_dict ¶

from_dict classmethod ¶

apply ¶

Clustering ¶

Access tsam AggregationResult for detailed analysis¶

Attributes¶

n_clusters property ¶

timesteps_per_cluster property ¶

timesteps_per_period property ¶

n_original_clusters property ¶

dim_names property ¶

dims property ¶

coords property ¶

is_segmented property ¶

n_segments property ¶

cluster_assignments property ¶

n_representatives property ¶

cluster_occurrences property ¶

representative_weights property ¶

timestep_mapping cached property ¶

metrics property ¶

cluster_start_positions property ¶

cluster_centers property ¶

segment_assignments property ¶

segment_durations property ¶

segment_centers property ¶

plot property ¶

results property ¶

Functions¶

sel ¶

expand_data ¶

build_expansion_divisor ¶

get_result ¶

apply ¶

to_json ¶

from_json classmethod ¶

items ¶

keys ¶

values ¶

ClusteringPlotAccessor ¶

Functions¶

compare ¶

heatmap ¶

clusters ¶

dims `property` ¶

dim_names `property` ¶

coords `property` ¶

n_clusters `property` ¶

timesteps_per_cluster `property` ¶

n_original_periods `property` ¶

n_segments `property` ¶

cluster_assignments `property` ¶

cluster_occurrences `property` ¶

cluster_centers `property` ¶

segment_assignments `property` ¶

segment_durations `property` ¶

segment_centers `property` ¶

position_within_segment `property` ¶

from_dict `classmethod` ¶

n_clusters `property` ¶

timesteps_per_cluster `property` ¶

timesteps_per_period `property` ¶

n_original_clusters `property` ¶

dim_names `property` ¶

dims `property` ¶

coords `property` ¶

is_segmented `property` ¶

n_segments `property` ¶

cluster_assignments `property` ¶

n_representatives `property` ¶

cluster_occurrences `property` ¶

representative_weights `property` ¶

timestep_mapping `cached` `property` ¶

metrics `property` ¶

cluster_start_positions `property` ¶

cluster_centers `property` ¶

segment_assignments `property` ¶

segment_durations `property` ¶

segment_centers `property` ¶

plot `property` ¶

results `property` ¶

from_json `classmethod` ¶