Usage
Input Types
StrainCascade accepts two mutually exclusive input types:
| Flag | Input Type | Accepted Formats |
|---|---|---|
-i |
Sequencing reads | .fasta, .fa, .fna, .fastq, .fastq.gz, .bam |
-a |
Pre-assembled genomes | .fasta, .fa, .fna |
Both flags accept a single file or a directory containing multiple files. When a directory is provided, all matching files are processed sequentially.
Sequencing Types
| Parameter | Platform | Description |
|---|---|---|
pacbio-hifi |
PacBio | HiFi/CCS reads (default) |
pacbio-corr |
PacBio | Corrected subreads |
pacbio-raw |
PacBio | Raw subreads |
nano-hq |
ONT | High-quality (Q20+, R10.4+) |
nano-corr |
ONT | Corrected reads |
nano-raw |
ONT | Raw reads (older chemistry) |
straincascade -i reads.fastq.gz -s nano-hq -t 32Hybrid Assembly
Providing paired Illumina short reads activates two features:
- Unicycler Hybrid Assembly (SC6) — combines long and short reads for improved assembly
- Short-Read Polishing (SC12) — BWA-MEM alignment + Polypolish consensus correction
straincascade -i longreads.fastq.gz \
-sr1 R1.fastq.gz -sr2 R2.fastq.gz \
-o output/ -s pacbio-hifi -t 32When only long reads are provided, Unicycler falls back to long-read-only mode (miniasm+Racon).
External Assemblies
Provide pre-existing assemblies from external tools for inclusion in the de novo phylogenetic tree (SC16):
straincascade -i reads.fastq.gz -ea /path/to/external_assemblies/ -o output/Assembly Selection Algorithm
Two algorithms are available for selecting the best assembly from the multi-assembler suite:
| Algorithm | Method |
|---|---|
contig (default) |
MAD-based outlier filtering on genome size, then minimum contig count |
continuity |
Bivariate MAD filtering on both genome size and contig count |
straincascade -i reads.fastq.gz -sa continuityLocus Tag
By default, StrainCascade generates an automatic INSDC-compliant locus tag from the GTDB-Tk taxonomic classification. You can override this:
straincascade -i reads.fastq.gz -l MYORGINSDC locus tag prefixes must contain 3–12 alphanumeric uppercase characters and start with a letter.
Reproducibility Modes
| Mode | Threads | Entropy | Use Case |
|---|---|---|---|
--heuristic (default) |
User-specified | System | Fast, standard analysis |
--deterministic |
Forced to 1 | Fixed file | Bit-identical reproducibility |
# Deterministic mode
straincascade -i reads.fastq.gz --deterministic
# Heuristic mode (default)
straincascade -i reads.fastq.gz --heuristic -t 32Result Types
| Type | Description |
|---|---|
main (default) |
Standard output files |
all |
All intermediate and final files |
R |
R-friendly output format |
straincascade -i reads.fastq.gz -r allForce Overwrite
By default, existing output files are overwritten. To skip existing outputs:
straincascade -i reads.fastq.gz -f noUpdating StrainCascade
# Update software scripts
straincascade -us
# Update Apptainer images
straincascade -uai
# Update databases
straincascade -udbFor instructions on full reinstallation, see Installation.
Caveats
Conda environment and PATH in SBATCH jobs: When submitting
sbatchjobs, your conda environment will not remain active automatically. If you have multiple installations, activate the correct environment in your SBATCH script to ensure the appropriate paths are used.Resource requirements: StrainCascade is resource-intensive and will overwhelm standard desktop or laptop computers. Use an HPC environment with at least 32 cores and 96 GB RAM for optimal performance.