Modular bacterial genome assembly & annotation pipeline for long-read sequencing data
30 modules 5 assemblers hybrid assembly deterministic mode Apptainer containers

Pipeline Stages

Read Processing
Long-read correction & trimming with Canu
SC1
Multi-Assembler
LJA, SPAdes, Canu, Flye, Unicycler — consensus merging via MAC2
SC2–SC8
Refinement
Evaluation, circularization, polishing (Arrow/Racon/Medaka + Polypolish), coverage
SC9–SC13
Quality Control
CheckM2 completeness & contamination assessment
SC14
Taxonomy
GTDB-Tk classification & de novo phylogenetic trees
SC15–SC16
Annotation
Bakta, Prokka, DeepFRI (deep-learning GO/EC), MicrobeAnnotator
SC17–SC20
Functional Screens
Plasmids, AMR, CAZymes, genomic islands, phages, CRISPR, IS elements
SC21–SC29
Integration
Consolidated results & interactive HTML report
SC30

At a Glance

30
Analysis modules
5
Assembly algorithms
PacBio + ONT
Sequencing platforms
Hybrid
Short-read support

Install

# Clone and install
git clone https://github.com/SBUJordi/StrainCascade.git
cd StrainCascade
find scripts/ -type f -exec chmod +x {} \;
./scripts/StrainCascade_installation.sh

Quick Start

# From long-read sequencing data
straincascade -i reads.fastq.gz -o output/ -s pacbio-hifi -t 32

# Hybrid assembly (long + short reads)
straincascade -i longreads.fastq.gz -sr1 R1.fastq.gz -sr2 R2.fastq.gz -o output/

# From pre-assembled genome
straincascade -a assembly.fasta -o output/ -t 32

# Deterministic mode for full reproducibility
straincascade -i reads.fastq.gz -o output/ --deterministic

Citation

Jordi SBU, Baertschi I, Li J, Fasel N, Misselwitz B, Yilmaz B. StrainCascade: An automated, modular workflow for high-throughput long-read bacterial genome reconstruction and characterization. bioRxiv (2026). doi:10.64898/2026.02.04.698786