flixopt.clustering.base ¶
Clustering classes for time series aggregation.
This module provides wrapper classes around tsam's clustering functionality: - ClusteringResults: Collection of tsam ClusteringResult objects for multi-dim (period, scenario) data - Clustering: Top-level class stored on FlowSystem after clustering
Attributes¶
Classes¶
ClusteringResults ¶
Collection of tsam ClusteringResult objects for multi-dimensional data.
Manages multiple ClusteringResult objects keyed by (period, scenario) tuples and provides convenient access and multi-dimensional DataArray building.
Follows xarray-like patterns with .dims, .coords, .sel(), and .isel().
Attributes:
| Name | Type | Description |
|---|---|---|
dims | tuple[str, ...] | Tuple of dimension names, e.g., ('period', 'scenario'). |
coords | dict[str, list] | Dict mapping dimension names to their coordinate values. |
Example
results = ClusteringResults({(): cr}, dim_names=[]) results.n_clusters 2 results.cluster_assignments # Returns DataArray
Multi-dimensional case¶
results = ClusteringResults( ... {(2024, 'high'): cr1, (2024, 'low'): cr2}, ... dim_names=['period', 'scenario'], ... ) results.dims ('period', 'scenario') results.coords {'period': [2024], 'scenario': ['high', 'low']} results.sel(period=2024, scenario='high') # Label-based
results.isel(period=0, scenario=1) # Index-based
Initialize ClusteringResults.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results | dict[tuple, ClusteringResult] | Dict mapping (period, scenario) tuples to tsam ClusteringResult objects. For simple cases without periods/scenarios, use {(): result}. | required |
dim_names | list[str] | Names of extra dimensions, e.g., ['period', 'scenario']. | required |
Attributes¶
coords property ¶
timesteps_per_cluster property ¶
Number of timesteps per cluster (same for all results).
n_original_periods property ¶
Number of original periods (same for all results).
n_segments property ¶
Number of segments per cluster, or None if not segmented.
cluster_assignments property ¶
Maps each original cluster to its typical cluster index.
Returns:
| Type | Description |
|---|---|
DataArray | DataArray with dims [original_cluster, period?, scenario?]. |
cluster_occurrences property ¶
How many original clusters map to each typical cluster.
Returns:
| Type | Description |
|---|---|
DataArray | DataArray with dims [cluster, period?, scenario?]. |
cluster_centers property ¶
Which original cluster is the representative (center) for each typical cluster.
Returns:
| Type | Description |
|---|---|
DataArray | DataArray with dims [cluster, period?, scenario?]. |
segment_assignments property ¶
For each timestep within a cluster, which segment it belongs to.
Returns:
| Type | Description |
|---|---|
DataArray | None | DataArray with dims [cluster, time, period?, scenario?], or None if not segmented. |
segment_durations property ¶
Duration of each segment in timesteps.
Returns:
| Type | Description |
|---|---|
DataArray | None | DataArray with dims [cluster, segment, period?, scenario?], or None if not segmented. |
segment_centers property ¶
Center of each intra-period segment.
Only available if segmentation was configured during clustering.
Returns:
| Type | Description |
|---|---|
DataArray | None | DataArray or None if no segmentation. |
position_within_segment property ¶
Position of each timestep within its segment (0-indexed).
For each (cluster, time) position, returns how many timesteps into the segment that position is. Used for interpolation within segments.
Returns:
| Type | Description |
|---|---|
DataArray | None | DataArray with dims [cluster, time] or [cluster, time, period?, scenario?]. |
DataArray | None | Returns None if no segmentation. |
Functions¶
sel ¶
Select result by dimension labels (xarray-like).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs | Any | Dimension name=value pairs, e.g., period=2024, scenario='high'. | {} |
Returns:
| Type | Description |
|---|---|
ClusteringResult | The tsam ClusteringResult for the specified combination. |
Raises:
| Type | Description |
|---|---|
KeyError | If no result found for the specified combination. |
Example
results.sel(period=2024, scenario='high')
isel ¶
Select result by dimension indices (xarray-like).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs | int | Dimension name=index pairs, e.g., period=0, scenario=1. | {} |
Returns:
| Type | Description |
|---|---|
ClusteringResult | The tsam ClusteringResult for the specified combination. |
Raises:
| Type | Description |
|---|---|
IndexError | If index is out of range for a dimension. |
Example
results.isel(period=0, scenario=1)
from_dict classmethod ¶
Reconstruct from dict.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d | dict | Dict from to_dict(). | required |
Returns:
| Type | Description |
|---|---|
ClusteringResults | Reconstructed ClusteringResults. |
apply ¶
Apply clustering to dataset for all (period, scenario) combinations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | Dataset | Dataset with time-varying data. Must have 'time' dimension. May have 'period' and/or 'scenario' dimensions matching this object. | required |
Returns:
| Type | Description |
|---|---|
AggregationResults | AggregationResults with full access to aggregated data. |
AggregationResults | Use |
Example
agg_results = clustering_results.apply(dataset) agg_results.clustering # Get ClusteringResults for IO for key, result in agg_results: ... print(result.cluster_representatives)
Clustering ¶
Clustering(results: ClusteringResults | dict | None = None, original_timesteps: DatetimeIndex | list[str] | None = None, original_data: Dataset | None = None, aggregated_data: Dataset | None = None, _metrics: Dataset | None = None, _original_data_refs: list[str] | None = None, _metrics_refs: list[str] | None = None, _aggregation_results: dict[tuple, AggregationResult] | None = None, _dim_names: list[str] | None = None)
Clustering information for a FlowSystem.
Thin wrapper around tsam 3.0's AggregationResult objects, providing: 1. Multi-dimensional access for (period, scenario) combinations 2. Structure properties (n_clusters, dims, coords, cluster_assignments) 3. JSON persistence via ClusteringResults
Use sel() to access individual tsam AggregationResult objects for detailed analysis (cluster_representatives, accuracy, plotting).
Attributes:
| Name | Type | Description |
|---|---|---|
results | ClusteringResults | ClusteringResults for structure access (works after JSON load). |
original_timesteps | Original timesteps before clustering. | |
dims | tuple[str, ...] | Dimension names, e.g., ('period', 'scenario'). |
coords | dict[str, list] | Coordinate values, e.g., {'period': [2024, 2025]}. |
Example
clustering = fs_clustered.clustering clustering.n_clusters 8 clustering.dims ('period',)
Access tsam AggregationResult for detailed analysis¶
result = clustering.sel(period=2024) result.cluster_representatives # DataFrame result.accuracy # AccuracyMetrics result.plot.compare() # tsam's built-in plotting
Initialize Clustering object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results | ClusteringResults | dict | None | ClusteringResults instance, or dict from to_dict() (for deserialization). Not needed if _aggregation_results is provided. | None |
original_timesteps | DatetimeIndex | list[str] | None | Original timesteps before clustering. | None |
original_data | Dataset | None | Original dataset before clustering (for expand/plotting). | None |
aggregated_data | Dataset | None | Aggregated dataset after clustering (for plotting). After loading from file, this is reconstructed from FlowSystem data. | None |
_metrics | Dataset | None | Pre-computed metrics dataset. | None |
_original_data_refs | list[str] | None | Internal: resolved DataArrays from serialization. | None |
_metrics_refs | list[str] | None | Internal: resolved DataArrays from serialization. | None |
_aggregation_results | dict[tuple, AggregationResult] | None | Internal: dict of AggregationResult for full data access. | None |
_dim_names | list[str] | None | Internal: dimension names when using _aggregation_results. | None |
Attributes¶
n_original_clusters property ¶
Number of original periods (before clustering).
coords property ¶
is_segmented property ¶
Whether intra-period segmentation was used.
Segmented systems have variable timestep durations within each cluster, where each segment represents a different number of original timesteps.
n_segments property ¶
Number of segments per cluster, or None if not segmented.
cluster_assignments property ¶
Mapping from original periods to cluster IDs.
Returns:
| Type | Description |
|---|---|
DataArray | DataArray with dims [original_cluster] or [original_cluster, period?, scenario?]. |
n_representatives property ¶
Number of representative timesteps after clustering.
cluster_occurrences property ¶
Count of how many original periods each cluster represents.
Returns:
| Type | Description |
|---|---|
DataArray | DataArray with dims [cluster] or [cluster, period?, scenario?]. |
representative_weights property ¶
Weight for each cluster (number of original periods it represents).
This is the same as cluster_occurrences but named for API consistency. Used as cluster_weight in FlowSystem.
timestep_mapping cached property ¶
Mapping from original timesteps to representative timestep indices.
Each value indicates which representative timestep index (0 to n_representatives-1) corresponds to each original timestep.
Note: This property is cached for performance since it's accessed frequently during expand() operations.
metrics property ¶
Clustering quality metrics (RMSE, MAE, etc.).
Returns:
| Type | Description |
|---|---|
Dataset | Dataset with dims [time_series, period?, scenario?], or empty Dataset if no metrics. |
cluster_start_positions property ¶
Integer positions where clusters start in reduced timesteps.
Returns:
| Type | Description |
|---|---|
ndarray | 1D array: [0, T, 2T, ...] where T = timesteps_per_cluster (or n_segments if segmented). |
cluster_centers property ¶
Which original period is the representative (center) for each cluster.
Returns:
| Type | Description |
|---|---|
DataArray | DataArray with dims [cluster] containing original period indices. |
segment_assignments property ¶
For each timestep within a cluster, which intra-period segment it belongs to.
Only available if segmentation was configured during clustering.
Returns:
| Type | Description |
|---|---|
DataArray | None | DataArray with dims [cluster, time] or None if no segmentation. |
segment_durations property ¶
Duration of each intra-period segment in hours.
Only available if segmentation was configured during clustering.
Returns:
| Type | Description |
|---|---|
DataArray | None | DataArray with dims [cluster, segment] or None if no segmentation. |
segment_centers property ¶
Center of each intra-period segment.
Only available if segmentation was configured during clustering.
Returns:
| Type | Description |
|---|---|
DataArray | None | DataArray with dims [cluster, segment] or None if no segmentation. |
plot property ¶
Access plotting methods for clustering visualization.
Returns:
| Type | Description |
|---|---|
ClusteringPlotAccessor | ClusteringPlotAccessor with compare(), heatmap(), and clusters() methods. |
results property ¶
ClusteringResults for structure access (derived from AggregationResults or cached).
Functions¶
sel ¶
Select AggregationResult by period and/or scenario.
Access individual tsam AggregationResult objects for detailed analysis.
Note
This method is only available before saving/loading the FlowSystem. After IO (to_dataset/from_dataset or to_json), the full AggregationResult data is not preserved. Use results.sel() for structure-only access after loading.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
period | int | str | None | Period value (e.g., 2024). Required if clustering has periods. | None |
scenario | str | None | Scenario name (e.g., 'high'). Required if clustering has scenarios. | None |
Returns:
| Type | Description |
|---|---|
AggregationResult | The tsam AggregationResult for the specified combination. |
AggregationResult | Access its properties like |
Raises:
| Type | Description |
|---|---|
KeyError | If no result found for the specified combination. |
ValueError | If accessed on a Clustering loaded from JSON/NetCDF. |
Example
result = clustering.sel(period=2024, scenario='high') result.cluster_representatives # DataFrame with aggregated data result.accuracy # AccuracyMetrics result.plot.compare() # tsam's built-in comparison plot
expand_data ¶
Expand aggregated data back to original timesteps.
Uses the timestep_mapping to map each original timestep to its representative value from the aggregated data. Fully vectorized using xarray's advanced indexing - no loops over period/scenario dimensions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
aggregated | DataArray | DataArray with aggregated (cluster, time) or (time,) dimension. | required |
original_time | DatetimeIndex | None | Original time coordinates. Defaults to self.original_timesteps. | None |
Returns:
| Type | Description |
|---|---|
DataArray | DataArray expanded to original timesteps. |
build_expansion_divisor ¶
Build divisor for correcting segment totals when expanding to hourly.
For segmented systems, each segment value is a total that gets repeated N times when expanded to hourly resolution (where N = segment duration in timesteps). This divisor allows converting those totals back to hourly rates during expansion.
For each original timestep, returns the number of original timesteps that map to the same (cluster, segment) - i.e., the segment duration in timesteps.
Fully vectorized using xarray's advanced indexing - no loops over period/scenario.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
original_time | DatetimeIndex | None | Original time coordinates. Defaults to self.original_timesteps. | None |
Returns:
| Type | Description |
|---|---|
DataArray | DataArray with dims ['time'] or ['time', 'period'?, 'scenario'?] containing |
DataArray | the number of timesteps in each segment, aligned to original timesteps. |
get_result ¶
Get the tsam ClusteringResult for a specific (period, scenario).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
period | Any | Period label (if applicable). | None |
scenario | Any | Scenario label (if applicable). | None |
Returns:
| Type | Description |
|---|---|
ClusteringResult | The tsam ClusteringResult for the specified combination. |
apply ¶
Apply the saved clustering to new data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | DataFrame | DataFrame with time series data to cluster. | required |
period | Any | Period label (if applicable). | None |
scenario | Any | Scenario label (if applicable). | None |
Returns:
| Type | Description |
|---|---|
AggregationResult | tsam AggregationResult with the clustering applied. |
to_json ¶
Save the clustering for reuse.
Uses ClusteringResults.to_dict() which preserves full tsam ClusteringResult. Can be loaded later with Clustering.from_json() and used with flow_system.transform.apply_clustering().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path | str | Path | Path to save the JSON file. | required |
from_json classmethod ¶
Load a clustering from JSON.
The loaded Clustering has full apply() support because ClusteringResult is fully preserved via tsam's serialization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path | str | Path | Path to the JSON file. | required |
original_timesteps | DatetimeIndex | None | Original timesteps for the new FlowSystem. If None, uses the timesteps stored in the JSON. | None |
Returns:
| Type | Description |
|---|---|
Clustering | A Clustering that can be used with apply_clustering(). |
items ¶
Iterate over (key, AggregationResult) pairs.
Raises:
| Type | Description |
|---|---|
ValueError | If accessed on a Clustering loaded from JSON. |
values ¶
Iterate over AggregationResult objects.
Raises:
| Type | Description |
|---|---|
ValueError | If accessed on a Clustering loaded from JSON. |
ClusteringPlotAccessor ¶
Plot accessor for Clustering objects.
Provides visualization methods for comparing original vs aggregated data and understanding the clustering structure.
Functions¶
compare ¶
compare(kind: str = 'timeseries', variables: str | list[str] | None = None, *, select: SelectType | None = None, colors: ColorType | None = None, show: bool | None = None, data_only: bool = False, **plotly_kwargs: Any) -> PlotResult
Compare original vs aggregated data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind | str | Type of comparison plot. - 'timeseries': Time series comparison (default) - 'duration_curve': Sorted duration curve comparison | 'timeseries' |
variables | str | list[str] | None | Variable(s) to plot. Can be a string, list of strings, or None to plot all time-varying variables. | None |
select | SelectType | None | xarray-style selection dict, e.g. {'scenario': 'Base Case'}. | None |
colors | ColorType | None | Color specification (colorscale name, color list, or label-to-color dict). | None |
show | bool | None | Whether to display the figure. Defaults to CONFIG.Plotting.default_show. | None |
data_only | bool | If True, skip figure creation and return only data. | False |
**plotly_kwargs | Any | Additional arguments passed to plotly (e.g., color, line_dash, facet_col, facet_row). Defaults: x='time'/'duration', color='variable', line_dash='representation', symbol=None. | {} |
Returns:
| Type | Description |
|---|---|
PlotResult | PlotResult containing the comparison figure and underlying data. |
heatmap ¶
heatmap(*, select: SelectType | None = None, colors: str | list[str] | None = None, show: bool | None = None, data_only: bool = False, **plotly_kwargs: Any) -> PlotResult
Plot cluster assignments over time as a heatmap timeline.
Shows which cluster each timestep belongs to as a horizontal color bar. The x-axis is time, color indicates cluster assignment. This visualization aligns with time series data, making it easy to correlate cluster assignments with other plots.
For multi-period/scenario data, uses faceting and/or animation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
select | SelectType | None | xarray-style selection dict, e.g. {'scenario': 'Base Case'}. | None |
colors | str | list[str] | None | Colorscale name (str) or list of colors for heatmap coloring. Dicts are not supported for heatmaps. Defaults to plotly template's sequential colorscale. | None |
show | bool | None | Whether to display the figure. Defaults to CONFIG.Plotting.default_show. | None |
data_only | bool | If True, skip figure creation and return only data. | False |
**plotly_kwargs | Any | Additional arguments passed to plotly (e.g., facet_col, animation_frame). | {} |
Returns:
| Type | Description |
|---|---|
PlotResult | PlotResult containing the heatmap figure and cluster assignment data. |
PlotResult | The data has 'cluster' variable with time dimension, matching original timesteps. |
clusters ¶
clusters(variables: str | list[str] | None = None, *, select: SelectType | None = None, colors: ColorType | None = None, show: bool | None = None, data_only: bool = False, **plotly_kwargs: Any) -> PlotResult
Plot each cluster's typical period profile.
Shows each cluster as a separate faceted subplot with all variables colored differently. Useful for understanding what each cluster represents.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variables | str | list[str] | None | Variable(s) to plot. Can be a string, list of strings, or None to plot all time-varying variables. | None |
select | SelectType | None | xarray-style selection dict, e.g. {'scenario': 'Base Case'}. | None |
colors | ColorType | None | Color specification (colorscale name, color list, or label-to-color dict). | None |
show | bool | None | Whether to display the figure. Defaults to CONFIG.Plotting.default_show. | None |
data_only | bool | If True, skip figure creation and return only data. | False |
**plotly_kwargs | Any | Additional arguments passed to plotly (e.g., color, facet_col, facet_col_wrap). Defaults: x='time', color='variable', symbol=None. | {} |
Returns:
| Type | Description |
|---|---|
PlotResult | PlotResult containing the figure and underlying data. |