Modules

StrainCascade v2.0.0 comprises 30 analysis modules (SC1–SC30), each operating within its own Apptainer container for full reproducibility and isolation.

Module Reference

Module	Tool	Description
SC1	Canu	Long-read correction & trimming
SC2	LJA	Multiplex de Bruijn graph assembly (HiFi-optimised)
SC3	SPAdes	Multi-k-mer de Bruijn graph assembly
SC4	Canu	Overlap-layout-consensus assembly with adaptive k-mer weighting
SC5	Flye	Repeat graph assembly for long error-prone reads
SC6	Unicycler	Hybrid (long+short) or long-read-only assembly (miniasm+Racon)
SC7	QUAST	Assembly quality evaluation round 1 (pre-merge)
SC8	MAC2	Consensus assembly merging from multiple assemblers
SC9	QUAST	Assembly quality evaluation round 2 (post-merge)
SC10	Circlator	Contig circularisation
SC11	QUAST	Assembly quality evaluation round 3 (post-circularisation)
SC12	Arrow/Medaka + Polypolish	Assembly error correction (long-read + optional short-read polishing)
SC13	minimap2 + BBMap	Read mapping & coverage statistics
SC14	CheckM2	Genome completeness & contamination QC
SC15	GTDB-Tk	Taxonomic classification
SC16	GTDB-Tk	De novo phylogenetic tree construction
SC17	Bakta	Genome annotation (v2)
SC18	Prokka	Prokaryotic genome annotation
SC19	DeepFRI	Deep-learning protein function prediction (GO + EC)
SC20	MicrobeAnnotator	Metabolic pathway annotation
SC21	PlasmidFinder	Plasmid replicon identification
SC22	AMRFinderPlus	Antimicrobial resistance gene detection
SC23	ResFinder	Resistance gene identification
SC24	dbCAN3	Carbohydrate-active enzyme (CAZyme) identification
SC25	IslandPath-DIMOB	Genomic island prediction
SC26	VirSorter2	Viral/phage sequence detection (HMM multi-classifier)
SC27	geNomad	Viral & plasmid identification (marker-based classification)
SC28	CRISPRCasFinder	CRISPR-Cas system detection
SC29	ISEScan	Insertion sequence element detection
SC30	—	Consolidated data integration & interactive HTML report

Module Groups

Assembly (SC1–SC8)

Five distinct assembler algorithms maximise complementarity:

Canu — adaptive k-mer weighting with overlap-layout-consensus
SPAdes — multi-k-mer de Bruijn graph approach
Flye — repeat graph construction for long error-prone reads
LJA — multiplex de Bruijn graphs optimised for HiFi reads
Unicycler — hybrid mode (long+short) or long-read-only (miniasm+Racon)

Draft assemblies are evaluated by QUAST (SC7), then merged into a consensus assembly by MAC2 (SC8).

Quality Control & Taxonomy (SC14–SC16)

CheckM2 (SC14) — completeness/contamination assessment using ML-based marker gene analysis
GTDB-Tk (SC15) — taxonomic classification against the Genome Taxonomy Database
GTDB-Tk (SC16) — de novo phylogenetic tree placement

Annotation (SC17–SC20)

A multi-tool annotation consensus framework:

Bakta (SC17) — primary annotation (NCBI-compliant, database-driven)
Prokka (SC18) — complementary gene calls
DeepFRI (SC19) — deep-learning protein function prediction across GO categories (MF, BP, CC) and EC numbers
MicrobeAnnotator (SC20) — metabolic pathway annotation via KEGG/COG

When Bakta and Prokka disagree, DeepFRI serves as an equal voting member in a three-way consensus. If DeepFRI agrees with one tool, that annotation is preferred; when all three disagree, structure-based predictions are considered alongside database-derived annotations.

Functional Screens (SC21–SC29)

Module	Target	Method
SC21	Plasmid replicons	PlasmidFinder database matching
SC22	AMR genes	AMRFinderPlus HMM + BLAST
SC23	Resistance genes	ResFinder database screening
SC24	CAZymes	dbCAN3 multi-tool consensus
SC25	Genomic islands	IslandPath-DIMOB dinucleotide bias
SC26	Phages/viruses	VirSorter2 HMM multi-classifier
SC27	Viruses & plasmids	geNomad marker-based classification
SC28	CRISPR-Cas	CRISPRCasFinder pattern matching
SC29	IS elements	ISEScan profile HMM scanning

Data Integration (SC30)

All module outputs are consolidated into a unified results directory with an interactive HTML report for visual exploration.

Execution Modes

Mode	Modules	Use Case
`minimal`	SC3, SC7, SC14, SC15, SC17, SC30	Quick assembly check
`efficient`	SC1–SC3, SC7–SC12, SC14–SC15, SC17, SC21, SC30	Balanced analysis
`standard`	SC1–SC6, SC7–SC15, SC17–SC30	Comprehensive analysis (default)
`comprehensive`	All 30 modules	Full analysis
`custom`	User-specified (e.g. `custom:SC1,SC2,SC3`)	Targeted analysis

Bundles

Bundle	Focus	Modules
`assembly`	Genome assembly & QC	SC1–SC14
`annotation`	Annotation modules	SC17–SC20
`functional`	Functional screens	SC15, SC22–SC25
`phage`	Phage detection	SC26–SC29

Assembly-Input Mode

When pre-assembled genomes are provided via -a, the following modules are automatically skipped:

SC1 (Canu trim), SC2–SC6 (assemblers), SC7–SC8 (evaluation 1 + merge), SC9 (evaluation 2), SC10 (circularisation), SC12 (polishing), SC13 (coverage)

The pipeline starts from SC11 (evaluation 3) and SC14 (CheckM2) onwards.