sctop¶

Submodules¶

Functions¶

`list_available_bases`(→ sctop.processing.List[str])	List available premade bases that can be loaded.
`load_basis`(...)	Load a basis from an h5ad file hosted online.
`create_basis`(→ sctop.utils.Dict)	Create basis and evaluate with optional ANOVA selection and cross-validation.
`analyze_sample_contributions`(→ sctop.utils.Dict[str, ...)	Analyze gene contributions for multiple samples/clusters.
`process`(→ pandas.DataFrame)	Process scRNA-seq data with optional chunking.
`score`(→ Union[pandas.DataFrame, List])	Project sample onto basis with optional chunking.
`plot_highest`(projections[, n, ax, color, fontsize])	Plots a horizontal bar chart of the top N projections with a fixed x-axis scale.
`plot_expression_distribution`(scores[, n, ax, ...])	Plots boxplots of expression for top genes with a fixed y-axis scale.
`plot_two`(projections, celltype1, celltype2[, gene, ...])
`plot_all_contributions`(→ None)	Create and save contribution plots for all cell types and samples.
`process`(→ pandas.DataFrame)	Process scRNA-seq data with optional chunking.
`score`(→ Union[pandas.DataFrame, List])	Project sample onto basis with optional chunking.
`process`(→ pandas.DataFrame)	Process scRNA-seq data with optional chunking.
`score`(→ Union[pandas.DataFrame, List])	Project sample onto basis with optional chunking.

Package Contents¶

sctop.list_available_bases() → sctop.processing.List[str][source]¶

List available premade bases that can be loaded.

Returns:¶

basis_keyslist: List of available basis keys

sctop.load_basis(basis_key: str, cache_dir: sctop.processing.Optional[sctop.processing.Union[str, pathlib.Path]] = None, force_download: bool = False) → sctop.processing.Tuple[sctop.processing.pd.DataFrame, sctop.processing.pd.DataFrame][source]¶

Load a basis from an h5ad file hosted online.

Parameters:¶

basis_keystr: Name/key of the basis to load (e.g., “MCKO legacy”)
cache_dirstr, optional: Directory to cache downloaded files. If None, uses system temp directory.
force_downloadbool: If True, re-downloads even if cached file exists

Returns:¶

basispd.DataFrame: Basis matrix (genes x cell types)
metadatapd.DataFrame: Metadata for the basis (cell types x attributes)

Example:¶

>>> basis, metadata = load_basis(
...     basis_key="MCKO legacy"
... )

sctop.create_basis(adata: sctop.utils.ad.AnnData, cell_type_column: str, threshold: int, test_size: float = 0.2, random_state: int = 42, n_jobs: int = -1, do_anova: bool = False, n_features: int = 20000, anova_percentile: sctop.processing.Optional[float] = None, spec_value: float = 0.1, outer_chunks: int = 10, inner_chunk_size: int = 1000, n_scoring_jobs: int = 4, cv_folds: sctop.processing.Optional[int] = None, plot_results: bool = True) → sctop.utils.Dict[source]¶

Create basis and evaluate with optional ANOVA selection and cross-validation.

Parameters:¶

adataad.AnnData: Annotated data object
cell_type_columnstr: Column name for cell types in adata.obs
thresholdint: Minimum number of cells per cell type
test_sizefloat: Fraction of data to use for testing (if cv_folds is None)
random_stateint: Random seed
n_jobsint: Number of parallel jobs for basis creation
do_anovabool: Whether to perform ANOVA feature selection
n_featuresint: Number of features to select with ANOVA
anova_percentilefloat, optional: Percentile of features to keep (overrides n_features)
spec_valuefloat: Threshold for unspecified predictions
outer_chunksint: Number of chunks for parallel scoring
inner_chunk_sizeint: Chunk size for internal processing
n_scoring_jobsint: Number of parallel jobs for scoring
cv_foldsint, optional: Number of cross-validation folds. If None, uses single train-test split

Returns:¶

resultsdict: Dictionary containing: - ‘basis’: final basis - ‘selected_genes’: selected genes (if ANOVA) - ‘metrics’: performance metrics - ‘cv_results’: cross-validation results (if cv_folds is not None) - ‘confusion_matrix’: confusion matrix - ‘per_cell_type’: per cell type accuracy

sctop.analyze_sample_contributions(sample_data_dict: sctop.utils.Dict[str, sctop.processing.Union[sctop.processing.pd.DataFrame, sctop.processing.np.ndarray]], basis: sctop.processing.pd.DataFrame, cell_types: sctop.processing.Optional[sctop.processing.List[str]] = None, n_top_genes: int = 20, process_data: bool = True) → sctop.utils.Dict[str, sctop.utils.Dict][source]¶

Analyze gene contributions for multiple samples/clusters.

Parameters:

sample_data_dict (dict) – Dictionary mapping sample_name -> expression_data
basis (pd.DataFrame) – Basis matrix
cell_types (list, optional) – Cell types to analyze. If None, uses all
n_top_genes (int) – Number of top genes to identify per sample
process_data (bool) – Whether to process the data

Returns:

results – Nested dictionary with structure: {cell_type: {

’contributions’: {sample_name: contribution_matrix}, ‘top_genes’: {sample_name: [gene1, gene2, …]}, ‘expressions’: {sample_name: expression_matrix}

}}

Return type:

dict

sctop.process(df_in: pandas.DataFrame | numpy.ndarray | scipy.sparse.spmatrix, average: bool = False, chunk_size: int | None = None) → pandas.DataFrame[source]¶: Process scRNA-seq data with optional chunking.

sctop.score(basis: pandas.DataFrame, sample: pandas.DataFrame, full_output: bool = False, chunk_size: int | None = None) → pandas.DataFrame | List[source]¶: Project sample onto basis with optional chunking.

sctop.plot_highest(projections, n=10, ax=None, color='olive', fontsize=40, **kwargs)[source]¶: Plots a horizontal bar chart of the top N projections with a fixed x-axis scale.

sctop.plot_expression_distribution(scores, n=10, ax=None, box_color='skyblue', fontsize=30, **kwargs)[source]¶: Plots boxplots of expression for top genes with a fixed y-axis scale.

sctop.plot_two(projections, celltype1, celltype2, gene=None, gene_expressions=None, ax=None, **kwargs)[source]¶

sctop.plot_all_contributions(results: sctop.utils.Dict[str, sctop.utils.Dict], sample_names: sctop.processing.List[str], output_dir: sctop.processing.Optional[str] = None, highlight_genes: sctop.processing.Optional[sctop.utils.Dict[str, sctop.processing.List[str]]] = None, dpi: int = 150, **plot_kwargs) → None[source]¶

Create and save contribution plots for all cell types and samples.

Parameters:

results (dict) – Results from analyze_sample_contributions
sample_names (list) – List of sample names to plot
output_dir (str, optional) – Base directory for saving plots. If None, uses current directory
highlight_genes (dict, optional) – Dictionary mapping cell_type -> [genes_to_highlight]
dpi (int) – DPI for saved images
**plot_kwargs – Additional kwargs passed to plot_gene_contribution_scatter

sctop.process(df_in: pandas.DataFrame | numpy.ndarray | scipy.sparse.spmatrix, average: bool = False, chunk_size: int | None = None) → pandas.DataFrame[source]¶: Process scRNA-seq data with optional chunking.

sctop.score(basis: pandas.DataFrame, sample: pandas.DataFrame, full_output: bool = False, chunk_size: int | None = None) → pandas.DataFrame | List[source]¶: Project sample onto basis with optional chunking.

sctop.process(df_in: pandas.DataFrame | numpy.ndarray | scipy.sparse.spmatrix, average: bool = False, chunk_size: int | None = None) → pandas.DataFrame[source]¶: Process scRNA-seq data with optional chunking.

sctop.score(basis: pandas.DataFrame, sample: pandas.DataFrame, full_output: bool = False, chunk_size: int | None = None) → pandas.DataFrame | List[source]¶: Project sample onto basis with optional chunking.