sctop¶
Submodules¶
Functions¶
|
List available premade bases that can be loaded. |
|
Load a basis from an h5ad file hosted online. |
|
Create basis and evaluate with optional ANOVA selection and cross-validation. |
|
Analyze gene contributions for multiple samples/clusters. |
|
Process scRNA-seq data with optional chunking. |
|
Project sample onto basis with optional chunking. |
|
Plots a horizontal bar chart of the top N projections with a fixed x-axis scale. |
|
Plots boxplots of expression for top genes with a fixed y-axis scale. |
|
|
|
Create and save contribution plots for all cell types and samples. |
|
Process scRNA-seq data with optional chunking. |
|
Project sample onto basis with optional chunking. |
|
Process scRNA-seq data with optional chunking. |
|
Project sample onto basis with optional chunking. |
Package Contents¶
- sctop.list_available_bases() sctop.processing.List[str][source]¶
List available premade bases that can be loaded.
Returns:¶
- basis_keyslist
List of available basis keys
- sctop.load_basis(basis_key: str, cache_dir: sctop.processing.Optional[sctop.processing.Union[str, pathlib.Path]] = None, force_download: bool = False) sctop.processing.Tuple[sctop.processing.pd.DataFrame, sctop.processing.pd.DataFrame][source]¶
Load a basis from an h5ad file hosted online.
Parameters:¶
- basis_keystr
Name/key of the basis to load (e.g., “MCKO legacy”)
- cache_dirstr, optional
Directory to cache downloaded files. If None, uses system temp directory.
- force_downloadbool
If True, re-downloads even if cached file exists
Returns:¶
- basispd.DataFrame
Basis matrix (genes x cell types)
- metadatapd.DataFrame
Metadata for the basis (cell types x attributes)
Example:¶
>>> basis, metadata = load_basis( ... basis_key="MCKO legacy" ... )
- sctop.create_basis(adata: sctop.utils.ad.AnnData, cell_type_column: str, threshold: int, test_size: float = 0.2, random_state: int = 42, n_jobs: int = -1, do_anova: bool = False, n_features: int = 20000, anova_percentile: sctop.processing.Optional[float] = None, spec_value: float = 0.1, outer_chunks: int = 10, inner_chunk_size: int = 1000, n_scoring_jobs: int = 4, cv_folds: sctop.processing.Optional[int] = None, plot_results: bool = True) sctop.utils.Dict[source]¶
Create basis and evaluate with optional ANOVA selection and cross-validation.
Parameters:¶
- adataad.AnnData
Annotated data object
- cell_type_columnstr
Column name for cell types in adata.obs
- thresholdint
Minimum number of cells per cell type
- test_sizefloat
Fraction of data to use for testing (if cv_folds is None)
- random_stateint
Random seed
- n_jobsint
Number of parallel jobs for basis creation
- do_anovabool
Whether to perform ANOVA feature selection
- n_featuresint
Number of features to select with ANOVA
- anova_percentilefloat, optional
Percentile of features to keep (overrides n_features)
- spec_valuefloat
Threshold for unspecified predictions
- outer_chunksint
Number of chunks for parallel scoring
- inner_chunk_sizeint
Chunk size for internal processing
- n_scoring_jobsint
Number of parallel jobs for scoring
- cv_foldsint, optional
Number of cross-validation folds. If None, uses single train-test split
Returns:¶
- resultsdict
Dictionary containing: - ‘basis’: final basis - ‘selected_genes’: selected genes (if ANOVA) - ‘metrics’: performance metrics - ‘cv_results’: cross-validation results (if cv_folds is not None) - ‘confusion_matrix’: confusion matrix - ‘per_cell_type’: per cell type accuracy
- sctop.analyze_sample_contributions(sample_data_dict: sctop.utils.Dict[str, sctop.processing.Union[sctop.processing.pd.DataFrame, sctop.processing.np.ndarray]], basis: sctop.processing.pd.DataFrame, cell_types: sctop.processing.Optional[sctop.processing.List[str]] = None, n_top_genes: int = 20, process_data: bool = True) sctop.utils.Dict[str, sctop.utils.Dict][source]¶
Analyze gene contributions for multiple samples/clusters.
- Parameters:
sample_data_dict (dict) – Dictionary mapping sample_name -> expression_data
basis (pd.DataFrame) – Basis matrix
cell_types (list, optional) – Cell types to analyze. If None, uses all
n_top_genes (int) – Number of top genes to identify per sample
process_data (bool) – Whether to process the data
- Returns:
results – Nested dictionary with structure: {cell_type: {
’contributions’: {sample_name: contribution_matrix}, ‘top_genes’: {sample_name: [gene1, gene2, …]}, ‘expressions’: {sample_name: expression_matrix}
}}
- Return type:
dict
- sctop.process(df_in: pandas.DataFrame | numpy.ndarray | scipy.sparse.spmatrix, average: bool = False, chunk_size: int | None = None) pandas.DataFrame[source]¶
Process scRNA-seq data with optional chunking.
- sctop.score(basis: pandas.DataFrame, sample: pandas.DataFrame, full_output: bool = False, chunk_size: int | None = None) pandas.DataFrame | List[source]¶
Project sample onto basis with optional chunking.
- sctop.plot_highest(projections, n=10, ax=None, color='olive', fontsize=40, **kwargs)[source]¶
Plots a horizontal bar chart of the top N projections with a fixed x-axis scale.
- sctop.plot_expression_distribution(scores, n=10, ax=None, box_color='skyblue', fontsize=30, **kwargs)[source]¶
Plots boxplots of expression for top genes with a fixed y-axis scale.
- sctop.plot_two(projections, celltype1, celltype2, gene=None, gene_expressions=None, ax=None, **kwargs)[source]¶
- sctop.plot_all_contributions(results: sctop.utils.Dict[str, sctop.utils.Dict], sample_names: sctop.processing.List[str], output_dir: sctop.processing.Optional[str] = None, highlight_genes: sctop.processing.Optional[sctop.utils.Dict[str, sctop.processing.List[str]]] = None, dpi: int = 150, **plot_kwargs) None[source]¶
Create and save contribution plots for all cell types and samples.
- Parameters:
results (dict) – Results from analyze_sample_contributions
sample_names (list) – List of sample names to plot
output_dir (str, optional) – Base directory for saving plots. If None, uses current directory
highlight_genes (dict, optional) – Dictionary mapping cell_type -> [genes_to_highlight]
dpi (int) – DPI for saved images
**plot_kwargs – Additional kwargs passed to plot_gene_contribution_scatter
- sctop.process(df_in: pandas.DataFrame | numpy.ndarray | scipy.sparse.spmatrix, average: bool = False, chunk_size: int | None = None) pandas.DataFrame[source]¶
Process scRNA-seq data with optional chunking.
- sctop.score(basis: pandas.DataFrame, sample: pandas.DataFrame, full_output: bool = False, chunk_size: int | None = None) pandas.DataFrame | List[source]¶
Project sample onto basis with optional chunking.