sctop.utils¶
Functions¶
Perform ANOVA feature selection on the basis and optionally standardize. |
|
|
Calculate comprehensive metrics. |
Calculate per cell type accuracy. |
|
|
Pretty print metrics. |
Original function - kept for backwards compatibility. |
|
|
OPTIMIZED: Parallel scoring of test cells. |
|
OPTIMIZED: Score a single chunk of cells. |
|
Generates and displays a Confusion Matrix and a Per-Cell-Type F1 Score plot. |
|
Compute predictivity matrix from basis. |
|
Compute gene-level contributions to cell type scores. |
|
Find top contributing genes from contribution matrix. |
Module Contents¶
- sctop.utils.perform_anova_selection(basis: sctop.processing.pd.DataFrame, adata: anndata.AnnData, training_IDs: sctop.processing.np.ndarray, cell_type_column: str, n_features: int = 2000, percentile: float | None = None, standardize: bool = True) Tuple[sctop.processing.pd.DataFrame, sctop.processing.np.ndarray][source]¶
Perform ANOVA feature selection on the basis and optionally standardize.
Parameters:¶
- basispd.DataFrame
The basis matrix (genes x cell types)
- adataad.AnnData
The AnnData object
- training_IDsnp.ndarray
Training cell IDs
- cell_type_columnstr
Column name for cell types
- n_featuresint
Number of top features to select (if percentile is None)
- percentilefloat, optional
Percentile of features to keep (overrides n_features)
- standardizebool
Whether to standardize the basis after selection (default: True)
Returns:¶
- basis_selectedpd.DataFrame
Basis with selected features only (and standardized if requested)
- selected_genesnp.ndarray
Array of selected gene names
- sctop.utils.calculate_metrics(true_labels: List, predicted_labels: List, total_cells: int, accuracies: Dict) Dict[source]¶
Calculate comprehensive metrics.
- sctop.utils.calculate_per_cell_type_accuracy(cell_accuracies: Dict) sctop.processing.pd.DataFrame[source]¶
Calculate per cell type accuracy.
- sctop.utils.create_basis_optimized(adata: anndata.AnnData, cell_type_column: str, threshold: int, test_size: float, random_state: int, n_jobs: int = -1) Tuple[sctop.processing.pd.DataFrame, sctop.processing.np.ndarray, sctop.processing.np.ndarray][source]¶
Original function - kept for backwards compatibility.
- sctop.utils.run_scoring_parallel(adata: anndata.AnnData, basis: sctop.processing.pd.DataFrame, test_IDs: sctop.processing.np.ndarray, cell_type_column: str, spec_value: float, outer_chunks: int, inner_chunk_size: int, n_jobs: int = 4) Tuple[dict, list, list, dict][source]¶
OPTIMIZED: Parallel scoring of test cells. Uses ThreadPoolExecutor for shared-memory parallel processing.
- sctop.utils.score_chunk_optimized(adata: anndata.AnnData, basis: sctop.processing.pd.DataFrame, sample_IDs: sctop.processing.np.ndarray, cell_type_column: str, spec_value: float, inner_chunk_size: int) Tuple[dict, list, list, dict][source]¶
OPTIMIZED: Score a single chunk of cells. Extracted for parallel processing.
- sctop.utils.plot_performance_summary(true_labels: List, predicted_labels: List, f1_df: sctop.processing.pd.DataFrame | None = None, figsize_base: int = 10)[source]¶
Generates and displays a Confusion Matrix and a Per-Cell-Type F1 Score plot.
- sctop.utils.compute_predictivity(basis: sctop.processing.pd.DataFrame) sctop.processing.pd.DataFrame[source]¶
Compute predictivity matrix from basis.
The predictivity shows how each gene contributes to each cell type’s score. Formula: predictivity = inv(B^T @ B) @ B^T
- Parameters:
basis (pd.DataFrame) – Basis matrix (genes x cell_types)
- Returns:
predictivity – Predictivity matrix (cell_types x genes) Shows how each gene contributes to each cell type score
- Return type:
pd.DataFrame
- sctop.utils.compute_gene_contributions(data: sctop.processing.pd.DataFrame | sctop.processing.np.ndarray, basis: sctop.processing.pd.DataFrame, predictivity: sctop.processing.pd.DataFrame | None = None, cell_types: List[str] | None = None, process_data: bool = True) Dict[str, sctop.processing.pd.DataFrame][source]¶
Compute gene-level contributions to cell type scores.
For each cell type, computes: contribution = expression * predictivity
- Parameters:
data (DataFrame or array) – Expression data (genes x samples)
basis (pd.DataFrame) – Basis matrix
predictivity (pd.DataFrame, optional) – Precomputed predictivity matrix. If None, computed from basis
cell_types (list, optional) – Cell types to compute contributions for. If None, uses all
process_data (bool) – Whether to process the data first (default: True)
- Returns:
contributions – Dictionary mapping cell_type -> contribution_matrix (genes x samples)
- Return type:
dict
- sctop.utils.find_top_contributing_genes(contributions: sctop.processing.pd.DataFrame, n_genes: int = 20, aggregate: str = 'mean') sctop.processing.pd.Series[source]¶
Find top contributing genes from contribution matrix.
- Parameters:
contributions (pd.DataFrame) – Gene contributions (genes x samples)
n_genes (int) – Number of top genes to return
aggregate (str) – How to aggregate across samples: ‘mean’, ‘median’, ‘max’
- Returns:
top_genes – Top contributing genes with their aggregated scores
- Return type:
pd.Series