sctop.utils

Functions

perform_anova_selection(...)

Perform ANOVA feature selection on the basis and optionally standardize.

calculate_metrics(→ Dict)

Calculate comprehensive metrics.

calculate_per_cell_type_accuracy(...)

Calculate per cell type accuracy.

print_metrics(metrics)

Pretty print metrics.

create_basis_optimized(...)

Original function - kept for backwards compatibility.

run_scoring_parallel(→ Tuple[dict, list, list, dict])

OPTIMIZED: Parallel scoring of test cells.

score_chunk_optimized(→ Tuple[dict, list, list, dict])

OPTIMIZED: Score a single chunk of cells.

plot_performance_summary(true_labels, predicted_labels)

Generates and displays a Confusion Matrix and a Per-Cell-Type F1 Score plot.

compute_predictivity(→ sctop.processing.pd.DataFrame)

Compute predictivity matrix from basis.

compute_gene_contributions(→ Dict[str, ...)

Compute gene-level contributions to cell type scores.

find_top_contributing_genes(→ sctop.processing.pd.Series)

Find top contributing genes from contribution matrix.

Module Contents

sctop.utils.perform_anova_selection(basis: sctop.processing.pd.DataFrame, adata: anndata.AnnData, training_IDs: sctop.processing.np.ndarray, cell_type_column: str, n_features: int = 2000, percentile: float | None = None, standardize: bool = True) Tuple[sctop.processing.pd.DataFrame, sctop.processing.np.ndarray][source]

Perform ANOVA feature selection on the basis and optionally standardize.

Parameters:

basispd.DataFrame

The basis matrix (genes x cell types)

adataad.AnnData

The AnnData object

training_IDsnp.ndarray

Training cell IDs

cell_type_columnstr

Column name for cell types

n_featuresint

Number of top features to select (if percentile is None)

percentilefloat, optional

Percentile of features to keep (overrides n_features)

standardizebool

Whether to standardize the basis after selection (default: True)

Returns:

basis_selectedpd.DataFrame

Basis with selected features only (and standardized if requested)

selected_genesnp.ndarray

Array of selected gene names

sctop.utils.calculate_metrics(true_labels: List, predicted_labels: List, total_cells: int, accuracies: Dict) Dict[source]

Calculate comprehensive metrics.

sctop.utils.calculate_per_cell_type_accuracy(cell_accuracies: Dict) sctop.processing.pd.DataFrame[source]

Calculate per cell type accuracy.

sctop.utils.print_metrics(metrics: Dict)[source]

Pretty print metrics.

sctop.utils.create_basis_optimized(adata: anndata.AnnData, cell_type_column: str, threshold: int, test_size: float, random_state: int, n_jobs: int = -1) Tuple[sctop.processing.pd.DataFrame, sctop.processing.np.ndarray, sctop.processing.np.ndarray][source]

Original function - kept for backwards compatibility.

sctop.utils.run_scoring_parallel(adata: anndata.AnnData, basis: sctop.processing.pd.DataFrame, test_IDs: sctop.processing.np.ndarray, cell_type_column: str, spec_value: float, outer_chunks: int, inner_chunk_size: int, n_jobs: int = 4) Tuple[dict, list, list, dict][source]

OPTIMIZED: Parallel scoring of test cells. Uses ThreadPoolExecutor for shared-memory parallel processing.

sctop.utils.score_chunk_optimized(adata: anndata.AnnData, basis: sctop.processing.pd.DataFrame, sample_IDs: sctop.processing.np.ndarray, cell_type_column: str, spec_value: float, inner_chunk_size: int) Tuple[dict, list, list, dict][source]

OPTIMIZED: Score a single chunk of cells. Extracted for parallel processing.

sctop.utils.plot_performance_summary(true_labels: List, predicted_labels: List, f1_df: sctop.processing.pd.DataFrame | None = None, figsize_base: int = 10)[source]

Generates and displays a Confusion Matrix and a Per-Cell-Type F1 Score plot.

sctop.utils.compute_predictivity(basis: sctop.processing.pd.DataFrame) sctop.processing.pd.DataFrame[source]

Compute predictivity matrix from basis.

The predictivity shows how each gene contributes to each cell type’s score. Formula: predictivity = inv(B^T @ B) @ B^T

Parameters:

basis (pd.DataFrame) – Basis matrix (genes x cell_types)

Returns:

predictivity – Predictivity matrix (cell_types x genes) Shows how each gene contributes to each cell type score

Return type:

pd.DataFrame

sctop.utils.compute_gene_contributions(data: sctop.processing.pd.DataFrame | sctop.processing.np.ndarray, basis: sctop.processing.pd.DataFrame, predictivity: sctop.processing.pd.DataFrame | None = None, cell_types: List[str] | None = None, process_data: bool = True) Dict[str, sctop.processing.pd.DataFrame][source]

Compute gene-level contributions to cell type scores.

For each cell type, computes: contribution = expression * predictivity

Parameters:
  • data (DataFrame or array) – Expression data (genes x samples)

  • basis (pd.DataFrame) – Basis matrix

  • predictivity (pd.DataFrame, optional) – Precomputed predictivity matrix. If None, computed from basis

  • cell_types (list, optional) – Cell types to compute contributions for. If None, uses all

  • process_data (bool) – Whether to process the data first (default: True)

Returns:

contributions – Dictionary mapping cell_type -> contribution_matrix (genes x samples)

Return type:

dict

sctop.utils.find_top_contributing_genes(contributions: sctop.processing.pd.DataFrame, n_genes: int = 20, aggregate: str = 'mean') sctop.processing.pd.Series[source]

Find top contributing genes from contribution matrix.

Parameters:
  • contributions (pd.DataFrame) – Gene contributions (genes x samples)

  • n_genes (int) – Number of top genes to return

  • aggregate (str) – How to aggregate across samples: ‘mean’, ‘median’, ‘max’

Returns:

top_genes – Top contributing genes with their aggregated scores

Return type:

pd.Series