sctop

Submodules

Functions

list_available_bases(→ sctop.processing.List[str])

List available premade bases that can be loaded.

load_basis(...)

Load a basis from an h5ad file hosted online.

create_basis(→ sctop.utils.Dict)

Create basis and evaluate with optional ANOVA selection and cross-validation.

analyze_sample_contributions(→ sctop.utils.Dict[str, ...)

Analyze gene contributions for multiple samples/clusters.

process(→ pandas.DataFrame)

Process scRNA-seq data with optional chunking.

score(→ Union[pandas.DataFrame, List])

Project sample onto basis with optional chunking.

plot_highest(projections[, n, ax, color, fontsize])

Plots a horizontal bar chart of the top N projections with a fixed x-axis scale.

plot_expression_distribution(scores[, n, ax, ...])

Plots boxplots of expression for top genes with a fixed y-axis scale.

plot_two(projections, celltype1, celltype2[, gene, ...])

plot_all_contributions(→ None)

Create and save contribution plots for all cell types and samples.

process(→ pandas.DataFrame)

Process scRNA-seq data with optional chunking.

score(→ Union[pandas.DataFrame, List])

Project sample onto basis with optional chunking.

process(→ pandas.DataFrame)

Process scRNA-seq data with optional chunking.

score(→ Union[pandas.DataFrame, List])

Project sample onto basis with optional chunking.

Package Contents

sctop.list_available_bases() sctop.processing.List[str][source]

List available premade bases that can be loaded.

Returns:

basis_keyslist

List of available basis keys

sctop.load_basis(basis_key: str, cache_dir: sctop.processing.Optional[sctop.processing.Union[str, pathlib.Path]] = None, force_download: bool = False) sctop.processing.Tuple[sctop.processing.pd.DataFrame, sctop.processing.pd.DataFrame][source]

Load a basis from an h5ad file hosted online.

Parameters:

basis_keystr

Name/key of the basis to load (e.g., “MCKO legacy”)

cache_dirstr, optional

Directory to cache downloaded files. If None, uses system temp directory.

force_downloadbool

If True, re-downloads even if cached file exists

Returns:

basispd.DataFrame

Basis matrix (genes x cell types)

metadatapd.DataFrame

Metadata for the basis (cell types x attributes)

Example:

>>> basis, metadata = load_basis(
...     basis_key="MCKO legacy"
... )
sctop.create_basis(adata: sctop.utils.ad.AnnData, cell_type_column: str, threshold: int, test_size: float = 0.2, random_state: int = 42, n_jobs: int = -1, do_anova: bool = False, n_features: int = 20000, anova_percentile: sctop.processing.Optional[float] = None, spec_value: float = 0.1, outer_chunks: int = 10, inner_chunk_size: int = 1000, n_scoring_jobs: int = 4, cv_folds: sctop.processing.Optional[int] = None, plot_results: bool = True) sctop.utils.Dict[source]

Create basis and evaluate with optional ANOVA selection and cross-validation.

Parameters:

adataad.AnnData

Annotated data object

cell_type_columnstr

Column name for cell types in adata.obs

thresholdint

Minimum number of cells per cell type

test_sizefloat

Fraction of data to use for testing (if cv_folds is None)

random_stateint

Random seed

n_jobsint

Number of parallel jobs for basis creation

do_anovabool

Whether to perform ANOVA feature selection

n_featuresint

Number of features to select with ANOVA

anova_percentilefloat, optional

Percentile of features to keep (overrides n_features)

spec_valuefloat

Threshold for unspecified predictions

outer_chunksint

Number of chunks for parallel scoring

inner_chunk_sizeint

Chunk size for internal processing

n_scoring_jobsint

Number of parallel jobs for scoring

cv_foldsint, optional

Number of cross-validation folds. If None, uses single train-test split

Returns:

resultsdict

Dictionary containing: - ‘basis’: final basis - ‘selected_genes’: selected genes (if ANOVA) - ‘metrics’: performance metrics - ‘cv_results’: cross-validation results (if cv_folds is not None) - ‘confusion_matrix’: confusion matrix - ‘per_cell_type’: per cell type accuracy

sctop.analyze_sample_contributions(sample_data_dict: sctop.utils.Dict[str, sctop.processing.Union[sctop.processing.pd.DataFrame, sctop.processing.np.ndarray]], basis: sctop.processing.pd.DataFrame, cell_types: sctop.processing.Optional[sctop.processing.List[str]] = None, n_top_genes: int = 20, process_data: bool = True) sctop.utils.Dict[str, sctop.utils.Dict][source]

Analyze gene contributions for multiple samples/clusters.

Parameters:
  • sample_data_dict (dict) – Dictionary mapping sample_name -> expression_data

  • basis (pd.DataFrame) – Basis matrix

  • cell_types (list, optional) – Cell types to analyze. If None, uses all

  • n_top_genes (int) – Number of top genes to identify per sample

  • process_data (bool) – Whether to process the data

Returns:

results – Nested dictionary with structure: {cell_type: {

’contributions’: {sample_name: contribution_matrix}, ‘top_genes’: {sample_name: [gene1, gene2, …]}, ‘expressions’: {sample_name: expression_matrix}

}}

Return type:

dict

sctop.process(df_in: pandas.DataFrame | numpy.ndarray | scipy.sparse.spmatrix, average: bool = False, chunk_size: int | None = None) pandas.DataFrame[source]

Process scRNA-seq data with optional chunking.

sctop.score(basis: pandas.DataFrame, sample: pandas.DataFrame, full_output: bool = False, chunk_size: int | None = None) pandas.DataFrame | List[source]

Project sample onto basis with optional chunking.

sctop.plot_highest(projections, n=10, ax=None, color='olive', fontsize=40, **kwargs)[source]

Plots a horizontal bar chart of the top N projections with a fixed x-axis scale.

sctop.plot_expression_distribution(scores, n=10, ax=None, box_color='skyblue', fontsize=30, **kwargs)[source]

Plots boxplots of expression for top genes with a fixed y-axis scale.

sctop.plot_two(projections, celltype1, celltype2, gene=None, gene_expressions=None, ax=None, **kwargs)[source]
sctop.plot_all_contributions(results: sctop.utils.Dict[str, sctop.utils.Dict], sample_names: sctop.processing.List[str], output_dir: sctop.processing.Optional[str] = None, highlight_genes: sctop.processing.Optional[sctop.utils.Dict[str, sctop.processing.List[str]]] = None, dpi: int = 150, **plot_kwargs) None[source]

Create and save contribution plots for all cell types and samples.

Parameters:
  • results (dict) – Results from analyze_sample_contributions

  • sample_names (list) – List of sample names to plot

  • output_dir (str, optional) – Base directory for saving plots. If None, uses current directory

  • highlight_genes (dict, optional) – Dictionary mapping cell_type -> [genes_to_highlight]

  • dpi (int) – DPI for saved images

  • **plot_kwargs – Additional kwargs passed to plot_gene_contribution_scatter

sctop.process(df_in: pandas.DataFrame | numpy.ndarray | scipy.sparse.spmatrix, average: bool = False, chunk_size: int | None = None) pandas.DataFrame[source]

Process scRNA-seq data with optional chunking.

sctop.score(basis: pandas.DataFrame, sample: pandas.DataFrame, full_output: bool = False, chunk_size: int | None = None) pandas.DataFrame | List[source]

Project sample onto basis with optional chunking.

sctop.process(df_in: pandas.DataFrame | numpy.ndarray | scipy.sparse.spmatrix, average: bool = False, chunk_size: int | None = None) pandas.DataFrame[source]

Process scRNA-seq data with optional chunking.

sctop.score(basis: pandas.DataFrame, sample: pandas.DataFrame, full_output: bool = False, chunk_size: int | None = None) pandas.DataFrame | List[source]

Project sample onto basis with optional chunking.