Skip to content

API Reference

This page provides an overview of the xeries API.

Main Classes

ConditionalPermutationImportance

ConditionalPermutationImportance(model, metric='mse', strategy='auto', partitioner=None, n_repeats=5, n_jobs=-1, random_state=None)

Bases: MetricBasedExplainer

Conditional Permutation Feature Importance calculator.

This implements conditional permutation importance where feature values are only shuffled within defined subgroups, preserving the correlation structure between features.

Supports two strategies: - 'auto': Uses tree-based cs-PFI to automatically learn subgroups - 'manual': Uses pre-defined groups provided by the user

Example

explainer = ConditionalPermutationImportance(model, metric='mse') result = explainer.explain(X, y, features=['lag_1', 'lag_2']) print(result.to_dataframe())

Initialize the conditional permutation importance calculator.

Parameters:

Name Type Description Default
model ModelProtocol

A model with a predict method.

required
metric MetricFunction | str

Scoring metric ('mse', 'mae', 'rmse', 'r2') or callable.

'mse'
strategy str

Grouping strategy ('auto' for tree-based, 'manual' for user-defined).

'auto'
partitioner BasePartitioner | None

Custom partitioner instance. If None, uses TreePartitioner for 'auto'.

None
n_repeats int

Number of times to repeat permutation for each feature.

5
n_jobs int

Number of parallel jobs (-1 for all cores).

-1
random_state int | None

Random seed for reproducibility.

None
Source code in src/xeries/importance/permutation.py
def __init__(
    self,
    model: ModelProtocol,
    metric: MetricFunction | str = "mse",
    strategy: str = "auto",
    partitioner: BasePartitioner | None = None,
    n_repeats: int = 5,
    n_jobs: int = -1,
    random_state: int | None = None,
) -> None:
    """Initialize the conditional permutation importance calculator.

    Args:
        model: A model with a predict method.
        metric: Scoring metric ('mse', 'mae', 'rmse', 'r2') or callable.
        strategy: Grouping strategy ('auto' for tree-based, 'manual' for user-defined).
        partitioner: Custom partitioner instance. If None, uses TreePartitioner for 'auto'.
        n_repeats: Number of times to repeat permutation for each feature.
        n_jobs: Number of parallel jobs (-1 for all cores).
        random_state: Random seed for reproducibility.
    """
    super().__init__(model, metric, random_state)
    self.strategy = strategy
    self.partitioner = partitioner
    self.n_repeats = n_repeats
    self.n_jobs = n_jobs

explain(X, y, features=None, groups=None, *args, **kwargs)

Compute conditional permutation importance for features.

Parameters:

Name Type Description Default
X DataFrame

Input features DataFrame.

required
y ArrayLike

Target values.

required
features list[str] | None

List of features to compute importance for. If None, uses all columns in X.

None
groups GroupLabels | None

Pre-defined group labels for 'manual' strategy. Required when strategy='manual' and no partitioner is provided.

None

Returns:

Type Description
FeatureImportanceResult

FeatureImportanceResult containing importance scores.

Source code in src/xeries/importance/permutation.py
def explain(
    self,
    X: pd.DataFrame,
    y: ArrayLike,
    features: list[str] | None = None,
    groups: GroupLabels | None = None,
    *args: Any,
    **kwargs: Any,
) -> FeatureImportanceResult:  # type: ignore[override]
    """Compute conditional permutation importance for features.

    Args:
        X: Input features DataFrame.
        y: Target values.
        features: List of features to compute importance for.
            If None, uses all columns in X.
        groups: Pre-defined group labels for 'manual' strategy.
            Required when strategy='manual' and no partitioner is provided.

    Returns:
        FeatureImportanceResult containing importance scores.
    """
    y_array = np.asarray(y)
    features = features or list(X.columns)

    baseline_pred = self.model.predict(X)
    baseline_score = self.metric(y_array, baseline_pred)

    results = Parallel(n_jobs=self.n_jobs)(
        delayed(self._compute_feature_importance)(X, y_array, feature, baseline_score, groups)
        for feature in features
    )

    importances = []
    stds = []
    permuted_scores: dict[str, list[float]] = {}

    for feature, scores in zip(features, results, strict=True):
        importance_values = [score - baseline_score for score in scores]
        importances.append(np.mean(importance_values))
        stds.append(np.std(importance_values))
        permuted_scores[feature] = scores

    return FeatureImportanceResult(
        feature_names=features,
        importances=np.array(importances),
        std=np.array(stds),
        baseline_score=baseline_score,
        permuted_scores=permuted_scores,
        method="conditional_permutation",
        n_repeats=self.n_repeats,
    )

explain_per_series(X, y, series_col, features=None, min_samples=10)

Compute conditional permutation importance separately for each series.

This method filters the data by each unique series ID and computes feature importance independently for each series. Permutation is performed only within each individual series.

Parameters:

Name Type Description Default
X DataFrame

Input features DataFrame.

required
y ArrayLike

Target values.

required
series_col str

Name of the column or MultiIndex level containing series IDs.

required
features list[str] | None

List of features to compute importance for. If None, uses all columns except series_col.

None
min_samples int

Minimum number of samples required per series. Series with fewer samples are skipped.

10

Returns:

Type Description
dict[Any, FeatureImportanceResult]

Dictionary mapping series IDs to FeatureImportanceResult objects.

Example

explainer = ConditionalPermutationImportance(model, metric='mse') results = explainer.explain_per_series(X, y, series_col='level') for series_id, result in results.items(): ... print(f"{series_id}: {result.to_dataframe()}")

Source code in src/xeries/importance/permutation.py
def explain_per_series(
    self,
    X: pd.DataFrame,
    y: ArrayLike,
    series_col: str,
    features: list[str] | None = None,
    min_samples: int = 10,
) -> dict[Any, FeatureImportanceResult]:
    """Compute conditional permutation importance separately for each series.

    This method filters the data by each unique series ID and computes
    feature importance independently for each series. Permutation is
    performed only within each individual series.

    Args:
        X: Input features DataFrame.
        y: Target values.
        series_col: Name of the column or MultiIndex level containing series IDs.
        features: List of features to compute importance for.
            If None, uses all columns except series_col.
        min_samples: Minimum number of samples required per series.
            Series with fewer samples are skipped.

    Returns:
        Dictionary mapping series IDs to FeatureImportanceResult objects.

    Example:
        >>> explainer = ConditionalPermutationImportance(model, metric='mse')
        >>> results = explainer.explain_per_series(X, y, series_col='level')
        >>> for series_id, result in results.items():
        ...     print(f"{series_id}: {result.to_dataframe()}")
    """
    y_array = np.asarray(y)

    series_ids = self._get_series_ids_from_data(X, series_col)
    unique_series = series_ids.unique()

    if features is None:
        exclude_cols = {series_col}
        features = [c for c in X.columns if c not in exclude_cols]

    results: dict[Any, FeatureImportanceResult] = {}

    for series_id in unique_series:
        mask = series_ids == series_id
        X_series = X.loc[mask]
        y_series = y_array[mask]

        if len(X_series) < min_samples:
            continue

        result = self._compute_series_importance(X_series, y_series, features)
        results[series_id] = result

    return results

ManualPartitioner

ManualPartitioner(mapping, series_col='level')

Bases: BasePartitioner

Partitioner using a user-defined mapping dictionary.

This partitioner assigns samples to groups based on a predefined mapping from series identifiers (or other categorical values) to group labels. Useful when domain knowledge suggests natural groupings.

Example

mapping = {'MT_001': 'group_A', 'MT_002': 'group_B', 'MT_003': 'group_A'} partitioner = ManualPartitioner(mapping, series_col='level') groups = partitioner.fit_get_groups(X, feature='lag_1')

Initialize the manual partitioner.

Parameters:

Name Type Description Default
mapping dict[Any, Any]

Dictionary mapping series identifiers to group labels.

required
series_col str

Name of the column or index level containing series IDs.

'level'
Source code in src/xeries/partitioners/manual.py
def __init__(
    self,
    mapping: dict[Any, Any],
    series_col: str = "level",
) -> None:
    """Initialize the manual partitioner.

    Args:
        mapping: Dictionary mapping series identifiers to group labels.
        series_col: Name of the column or index level containing series IDs.
    """
    self.mapping = mapping
    self.series_col = series_col
    self._fitted = False
    self._group_encoder: dict[Any, int] = {}

n_groups property

Return the number of unique groups.

fit(X, feature)

Fit the partitioner (encodes group labels to integers).

Parameters:

Name Type Description Default
X DataFrame

Input features DataFrame.

required
feature str

The feature to condition on (not used for manual partitioner).

required

Returns:

Type Description
ManualPartitioner

Self for method chaining.

Source code in src/xeries/partitioners/manual.py
def fit(self, X: pd.DataFrame, feature: str) -> ManualPartitioner:
    """Fit the partitioner (encodes group labels to integers).

    Args:
        X: Input features DataFrame.
        feature: The feature to condition on (not used for manual partitioner).

    Returns:
        Self for method chaining.
    """
    unique_groups = sorted(set(self.mapping.values()))
    self._group_encoder = {g: i for i, g in enumerate(unique_groups)}
    self._fitted = True
    return self

get_groups(X)

Get group labels for each sample based on the mapping.

Parameters:

Name Type Description Default
X DataFrame

Input features DataFrame with series identifiers.

required

Returns:

Type Description
NDArray[intp]

Array of integer group labels.

Raises:

Type Description
ValueError

If partitioner has not been fitted.

KeyError

If series_col is not found in X.

Source code in src/xeries/partitioners/manual.py
def get_groups(self, X: pd.DataFrame) -> NDArray[np.intp]:
    """Get group labels for each sample based on the mapping.

    Args:
        X: Input features DataFrame with series identifiers.

    Returns:
        Array of integer group labels.

    Raises:
        ValueError: If partitioner has not been fitted.
        KeyError: If series_col is not found in X.
    """
    if not self._fitted:
        raise ValueError("Partitioner must be fitted before calling get_groups")

    series_ids = self._get_series_ids(X)
    group_labels = series_ids.map(self.mapping)

    if group_labels.isna().any():
        missing = series_ids[group_labels.isna()].unique()
        raise ValueError(f"Series IDs not found in mapping: {missing.tolist()}")

    encoded = group_labels.map(self._group_encoder)
    return encoded.to_numpy().astype(np.intp)

TreePartitioner

TreePartitioner(max_depth=4, min_samples_leaf=0.05, series_col=None, random_state=None)

Bases: BasePartitioner

Partitioner using decision tree leaf nodes for subgroup discovery.

This implements the Conditional Subgroup Permutation Feature Importance (cs-PFI) algorithm. A decision tree is trained to predict the feature of interest using all other features. The leaf nodes of this tree define homogeneous subgroups for conditional permutation.

Example

partitioner = TreePartitioner(max_depth=4, min_samples_leaf=0.05) groups = partitioner.fit_get_groups(X, feature='lag_1')

Initialize the tree partitioner.

Parameters:

Name Type Description Default
max_depth int | None

Maximum depth of the decision tree.

4
min_samples_leaf int | float

Minimum samples required in a leaf node. Can be int (absolute) or float (fraction of total samples).

0.05
series_col str | None

Column with series identifiers to one-hot encode. None (default) auto-detects _level_skforecast (skforecast 0.21+) or level (MultiIndex / legacy). Set to None explicitly is the same as omitting detection only when neither column exists.

None
random_state int | None

Random seed for reproducibility.

None
Source code in src/xeries/partitioners/tree.py
def __init__(
    self,
    max_depth: int | None = 4,
    min_samples_leaf: int | float = 0.05,
    series_col: str | None = None,
    random_state: int | None = None,
) -> None:
    """Initialize the tree partitioner.

    Args:
        max_depth: Maximum depth of the decision tree.
        min_samples_leaf: Minimum samples required in a leaf node.
            Can be int (absolute) or float (fraction of total samples).
        series_col: Column with series identifiers to one-hot encode.
            ``None`` (default) auto-detects ``_level_skforecast`` (skforecast 0.21+)
            or ``level`` (MultiIndex / legacy). Set to ``None`` explicitly is the same
            as omitting detection only when neither column exists.
        random_state: Random seed for reproducibility.
    """
    self.max_depth = max_depth
    self.min_samples_leaf = min_samples_leaf
    self.series_col = series_col
    self.random_state = random_state

    self._tree: DecisionTreeRegressor | None = None
    self._encoder: OneHotEncoder | None = None
    self._feature: str | None = None
    self._fitted = False

n_groups property

Return the number of leaf nodes (groups).

tree property

Return the fitted decision tree.

fit(X, feature)

Fit the decision tree to predict the feature of interest.

Parameters:

Name Type Description Default
X DataFrame

Input features DataFrame.

required
feature str

The feature to condition on (will be predicted by tree).

required

Returns:

Type Description
TreePartitioner

Self for method chaining.

Source code in src/xeries/partitioners/tree.py
def fit(self, X: pd.DataFrame, feature: str) -> TreePartitioner:
    """Fit the decision tree to predict the feature of interest.

    Args:
        X: Input features DataFrame.
        feature: The feature to condition on (will be predicted by tree).

    Returns:
        Self for method chaining.
    """
    self._feature = feature

    y_tree = X[feature].values
    X_tree = self._prepare_tree_features(X, feature)

    self._tree = DecisionTreeRegressor(
        max_depth=self.max_depth,
        min_samples_leaf=self.min_samples_leaf,
        random_state=self.random_state,
    )
    self._tree.fit(X_tree, y_tree)
    self._fitted = True

    return self

get_groups(X)

Get leaf node indices as group labels.

Parameters:

Name Type Description Default
X DataFrame

Input features DataFrame.

required

Returns:

Type Description
NDArray[intp]

Array of leaf node indices (group labels).

Raises:

Type Description
ValueError

If partitioner has not been fitted.

Source code in src/xeries/partitioners/tree.py
def get_groups(self, X: pd.DataFrame) -> NDArray[np.intp]:
    """Get leaf node indices as group labels.

    Args:
        X: Input features DataFrame.

    Returns:
        Array of leaf node indices (group labels).

    Raises:
        ValueError: If partitioner has not been fitted.
    """
    if not self._fitted or self._tree is None or self._feature is None:
        raise ValueError("Partitioner must be fitted before calling get_groups")

    X_tree = self._prepare_tree_features(X, self._feature)
    return self._tree.apply(X_tree).astype(np.intp)

Result Types

FeatureImportanceResult

FeatureImportanceResult(feature_names, importances, std=None, baseline_score=0.0, permuted_scores=dict(), method='permutation', n_repeats=1) dataclass

Bases: BaseResult

Container for feature importance results.

Attributes:

Name Type Description
feature_names list[str]

List of feature names.

importances NDArray[floating[Any]]

Array of importance scores for each feature.

std NDArray[floating[Any]] | None

Standard deviations of importance scores (from multiple permutations).

baseline_score float

The baseline model score before permutation.

permuted_scores dict[str, list[float]]

Dictionary mapping feature names to their permuted scores.

method str

The method used to compute importance ('permutation', 'shap', etc.).

n_repeats int

Number of permutation repeats used.

to_dataframe()

Convert results to a pandas DataFrame.

Source code in src/xeries/core/types.py
def to_dataframe(self) -> pd.DataFrame:
    """Convert results to a pandas DataFrame."""
    data = {
        "feature": self.feature_names,
        "importance": self.importances,
    }
    if self.std is not None:
        data["std"] = self.std
    return pd.DataFrame(data).sort_values("importance", ascending=False)

Visualization

plot_importance_bar

plot_importance_bar(result, max_features=20, ax=None, figsize=(10, 6), title=None, color='#1f77b4', show_std=True)

Plot feature importance as a horizontal bar chart.

Parameters:

Name Type Description Default
result FeatureImportanceResult

FeatureImportanceResult from an explainer.

required
max_features int | None

Maximum number of features to display (top N).

20
ax Axes | None

Matplotlib axes to plot on. If None, creates new figure.

None
figsize tuple[int, int]

Figure size (width, height) in inches.

(10, 6)
title str | None

Plot title. If None, uses default.

None
color str

Bar color.

'#1f77b4'
show_std bool

Whether to show error bars for standard deviation.

True

Returns:

Type Description
tuple[Figure, Axes]

Tuple of (Figure, Axes).

Source code in src/xeries/visualization/plots.py
def plot_importance_bar(
    result: FeatureImportanceResult,
    max_features: int | None = 20,
    ax: Axes | None = None,
    figsize: tuple[int, int] = (10, 6),
    title: str | None = None,
    color: str = "#1f77b4",
    show_std: bool = True,
) -> tuple[Figure, Axes]:
    """Plot feature importance as a horizontal bar chart.

    Args:
        result: FeatureImportanceResult from an explainer.
        max_features: Maximum number of features to display (top N).
        ax: Matplotlib axes to plot on. If None, creates new figure.
        figsize: Figure size (width, height) in inches.
        title: Plot title. If None, uses default.
        color: Bar color.
        show_std: Whether to show error bars for standard deviation.

    Returns:
        Tuple of (Figure, Axes).
    """
    try:
        import matplotlib.pyplot as plt
    except ImportError as e:
        raise ImportError(
            "matplotlib is required for plotting. Install it with: pip install matplotlib"
        ) from e

    df = result.to_dataframe()
    if max_features is not None:
        df = df.head(max_features)

    if ax is None:
        fig, ax = plt.subplots(figsize=figsize)
    else:
        fig = cast("Figure", ax.get_figure())

    y_pos = np.arange(len(df))
    importances = df["importance"].values

    xerr = df["std"].values if show_std and "std" in df.columns else None

    ax.barh(y_pos, importances, xerr=xerr, color=color, alpha=0.8, capsize=3)
    ax.set_yticks(y_pos)
    ax.set_yticklabels(df["feature"].values)
    ax.invert_yaxis()
    ax.set_xlabel("Importance (increase in error)")
    ax.set_title(title or "Conditional Permutation Feature Importance")

    plt.tight_layout()
    return fig, ax

plot_importance_heatmap

plot_importance_heatmap(results, features=None, ax=None, figsize=(12, 8), cmap='YlOrRd', annot=True, title=None)

Plot importance comparison across multiple conditions as a heatmap.

Parameters:

Name Type Description Default
results dict[str, FeatureImportanceResult]

Dictionary mapping condition names to FeatureImportanceResult.

required
features list[str] | None

List of features to include. If None, uses union of all.

None
ax Axes | None

Matplotlib axes to plot on. If None, creates new figure.

None
figsize tuple[int, int]

Figure size (width, height) in inches.

(12, 8)
cmap str

Colormap name.

'YlOrRd'
annot bool

Whether to annotate cells with values.

True
title str | None

Plot title.

None

Returns:

Type Description
tuple[Figure, Axes]

Tuple of (Figure, Axes).

Source code in src/xeries/visualization/plots.py
def plot_importance_heatmap(
    results: dict[str, FeatureImportanceResult],
    features: list[str] | None = None,
    ax: Axes | None = None,
    figsize: tuple[int, int] = (12, 8),
    cmap: str = "YlOrRd",
    annot: bool = True,
    title: str | None = None,
) -> tuple[Figure, Axes]:
    """Plot importance comparison across multiple conditions as a heatmap.

    Args:
        results: Dictionary mapping condition names to FeatureImportanceResult.
        features: List of features to include. If None, uses union of all.
        ax: Matplotlib axes to plot on. If None, creates new figure.
        figsize: Figure size (width, height) in inches.
        cmap: Colormap name.
        annot: Whether to annotate cells with values.
        title: Plot title.

    Returns:
        Tuple of (Figure, Axes).
    """
    try:
        import matplotlib.pyplot as plt
    except ImportError as e:
        raise ImportError(
            "matplotlib is required for plotting. Install it with: pip install matplotlib"
        ) from e

    data_dict: dict[str, dict[str, float]] = {}
    for condition, result in results.items():
        data_dict[condition] = dict(zip(result.feature_names, result.importances, strict=True))

    df = pd.DataFrame(data_dict)

    if features is not None:
        df = df.loc[df.index.isin(features)]

    df = df.sort_values(by=list(df.columns), ascending=False)

    if ax is None:
        fig, ax = plt.subplots(figsize=figsize)
    else:
        fig = cast("Figure", ax.get_figure())

    im = ax.imshow(df.values, cmap=cmap, aspect="auto")
    ax.set_xticks(np.arange(len(df.columns)))
    ax.set_yticks(np.arange(len(df.index)))
    ax.set_xticklabels(df.columns)
    ax.set_yticklabels(df.index)

    plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")

    if annot:
        for i in range(len(df.index)):
            for j in range(len(df.columns)):
                value = df.iloc[i, j]
                text_color = "white" if value > df.values.max() * 0.5 else "black"
                ax.text(j, i, f"{value:.3f}", ha="center", va="center", color=text_color)

    fig.colorbar(im, ax=ax, label="Importance")
    ax.set_title(title or "Feature Importance Comparison")

    plt.tight_layout()
    return fig, ax

Planned APIs

The following APIs are planned for future releases and are not part of the current release:

  • Conditional SHAP
  • SHAP-IQ
  • Feature Dropping
  • Causal Feature Importance