Systems Engineer in Bio

Where Rust, Go, and TS actually land in the 2026 biotech stack
Verified from GitHub + job postings Β· June 2026

The bio stack has two layers: a C++/CUDA simulation core that nobody is replacing, and a Python everything-else layer that dominates orchestration, ML, and lab tooling. The entry points for systems engineers are at the edges β€” fast I/O, trajectory analysis, and a genuine gap in LLM-friendly simulation tools.

Where do you want to start?

The 2026 Bio Stack

Production engines are C++/CUDA. Analysis and tooling is Python. Rust is emerging at the I/O and trajectory analysis layer. Go is rare.

Layer Dominant language Rust/Go presence Status
MD simulation core C++/CUDA (GROMACS, AMBER, OpenMM) None in production C++ entrenched
I/O parsing (BAM/VCF/PDB) C/C++ historically noodles, alevin-fry, rust-bio β€” production-ready Rust viable
Trajectory analysis Python (MDAnalysis, MDTraj) molar β€” peer-reviewed, faster than Python Rust challenger
ML inference (folding) Python/PyTorch (AF3, ESM3) candle (Rust ML framework β€” no protein models yet) Rust aspirational
Workflow orchestration Python (Nextflow, Snakemake) None Python wins
Lab automation Python (Opentrons v2, PyLabRobot) None Python wins
Services / APIs Python (FastAPI) Go listed optionally (Recursion, Benchling infra) Python default

GPU backends (2026): GROMACS 2026.0 ships CUDA (primary), SYCL (AMD/Intel portable), and HIP (AMD native). AMBER 24 adds AMD ROCm/HIP for MI100–MI300A. OpenMM 8.2.0 adds HIP. All three major engines now support AMD GPU β€” not just NVIDIA.

Best Contribution Targets

Ranked by realistic near-term impact. Stars and activity verified June 2026.

Project Lang Stars Why Entry effort
pdbtbx
PDB/mmCIF structure parser
Rust 70 17 open issues with specific gaps: secondary structure (#139), bond parsing (#137), mmCIF helix (#140). Solo maintainer, welcomes help. PDB is the universal protein structure format. Low–medium
alevin-fry
scRNA-seq preprocessing
Rust 207 24 open issues. Academic maintainers (Rob Patro group) receptive to external contributors. Nature Methods 2022. Beat salmon (C++) on preprocessing. 98.4% Rust. Medium
molar (MolAR)
MD trajectory analysis
Rust 52 Peer-reviewed (PMC11609497, J. Computational Chemistry Dec 2024). Benchmarked faster than MDAnalysis, MDTraj, and CPPTRAJ. Active June 2026. Low competition for contributions. Medium
noodles
BAM/CRAM/VCF/FASTQ/GFF I/O
Rust 709 Most production-ready Rust bio library. In use at St. Jude. Solo expert maintainer β€” high bar. Requires reading format specs (BAM 1.6, CRAM 3.1). High credibility payoff per merged PR. High
ferritin
Protein structure utilities
Rust 33 Early-stage, active May 2026. Open issues: dependency slim-down (#110), MolViewSpec renderer (#18), LigandMPNN tests (#58). First-mover advantage. Risk: may stall. Low–medium
SeqKit
FASTA/FASTQ toolkit
Go 1,600 Most-starred Go bio tool. 13 open issues. Published iMeta 2024 (10-year anniversary). Solo maintainer. Go contributor in bio is rare β€” differentiating signal. Medium
OpenFold-3
Open AlphaFold3 reproduction
Python 750 Apache-2.0, active June 2026, 5 open PRs. Best open contribution target in protein folding β€” AF3 itself only accepts bug fixes. Python-only, no Rust angle, but high scientific visibility. Medium–high

Avoid near-term: AlphaFold3 (Google CLA, CC BY-NC-SA non-commercial, bug fixes only), lumol (stalled Feb 2024), biogo (abandoned), candle protein port (multi-month investment as a first project).

Where Rust Uniquely Wins

BAM/CRAM/VCF I/O at scale β€” No GIL, no Python process overhead. noodles is in production at St. Jude. For pipelines that process thousands of samples, the Python bottleneck is real and Rust removes it.
scRNA-seq preprocessing β€” alevin-fry (Rust) beat salmon (C++) on memory efficiency and speed while being memory-safe. Proof that Rust can outperform C++ in bio without sacrificing correctness.
MD trajectory analysis β€” molar outperforms MDAnalysis, MDTraj, and CPPTRAJ on RMSD/alignment at 4,300-atom and 445,000-atom benchmarks (PMC11609497). The Python analysis layer is the bottleneck in large-trajectory workflows; Rust removes it.
Genomic data structures β€” Ginkgo chose Rust for gen, a Git-like VCS for genetic sequences with block graphs for polyploid genomes. Memory safety prevents data corruption in variant representation β€” the kind of bug that invalidates months of experiments.
Low-latency protein LM inference without Python β€” candle (HuggingFace, 20k stars) enables ML inference in production services where Python startup overhead is unacceptable. No protein models exist yet β€” porting ESM2 would be the highest-impact Rust-in-bio contribution possible.

Where Python Wins and Rust Adds Nothing

Model training β€” AlphaFold3, OpenFold-3, ESM3, Boltz-2 all train in PyTorch/JAX. The training loop, custom attention ops, and gradient checkpointing are Python-controlled GPU operations. This is not a realistic contribution target in Rust.
Workflow orchestration β€” Nextflow (Groovy/JVM) and Snakemake (Python) are the two dominant pipeline tools. Neither has a Rust plugin surface. Contributing here means Groovy or Python.
Lab automation β€” Opentrons Protocol API v2, PyLabRobot, Hamilton/Tecan SDKs β€” all Python-only. Hardware vendors ship Python. A Rust client would have no users.
LIMS/ELN integration β€” Benchling SDK is Python-only. eLabFTW has only a Python codegen client. These are CRUD/REST domains where Python wins on velocity and ecosystem.
Exploratory analysis β€” scanpy, anndata, seaborn, biopython. Even polars (Rust-backed) is Python-interfaced. Contributing here means Python. The scientific workflow will not switch.

The Open Gap: Fast Simulation Tools for LLMs

Current bio simulation tools are mostly Jupyter/GUI tools designed for human scientists β€” slow to start, complex to invoke programmatically, not designed for looping. roadrunner (C++/Python, ODE/SBML) is the fastest existing option but has a dated API nobody designed for AI agents. (Full landscape sweep below.)

What "uv for bio sim" looks like:

With: SBML input (the standard format β€” interop with everything), JSON/CSV output (LLMs can parse it), Python bindings via PyO3 (LLM agents using Python call it natively), sub-100ms cold start.

Why it doesn't exist yet: Scientists don't need 10,000 simulations/sec. LLMs do. The use case is new.

The existing field (verified June 2026)

A sweep of the open-source landscape β€” none of these hits clean JSON + sub-100ms cold start.

Tool Lang Stars Status LLM-loop fitness
roadrunner C++ + Python 62 Active (3 releases in 2026) Best existing. SBML in, numpy out. 200–400ms startup (LLVM JIT). Not JSON-native, not sub-100ms. The engine behind Tellurium.
NFsim C++ 15 Active (Jun 2026) Fast compiled binary. Recently added .nfevent.json output β€” the only tool moving toward JSON. Rule-based stochastic.
BioNetGen C++/Perl 68 Active (Jun 2026) CLI scriptable, but tab-separated .gdat output (not JSON). 82 open issues.
COPASI / basico C++ (SWIG) 128 Active (Jan 2026) Most feature-complete. CopasiSE CLI is headless and SBML-aware β€” decent for orchestration. Heavy, dated API.
Smoldyn C++ 30 Slowing (Jan 2025) Spatial stochastic, pip-installable, simple text model format.
Tellurium Python 142 Slowing (no release since Dec 2024) Wraps roadrunner. ~1–2s import overhead (matplotlib, libsbml). 130 open issues.
GillesPy2 Python 84 Abandoned (features stalled 2021) Returns UserDict/UserList, not JSON. Don't build on this.
StochSS JS/Python 25 Stalled (GUI-first) Requires JupyterHub deployment. Not a library.

Read of the field: roadrunner is the reference implementation worth studying β€” fast C++ core, SBML-native. NFsim's JSON output is the signal that someone else sees the gap. But nothing is Rust-native, JSON-first, and sub-100ms. The space is open.

Career fit: Demonstrates Rust + bio intersection, is benchmarkable (vs roadrunner/COPASI), publishable as a short Methods paper or biorxiv preprint, and has a clear user (any lab doing AI-assisted circuit design). Also the natural foundation for a game simulation core.

One sentence

The bio simulation stack needs what uv gave Python: a Rust-native tool fast enough to run in an LLM loop without Python overhead.

Thinking Tools for Cells

A cell is a program. But asking whether a cell "understands itself" is like asking if a neuron has a model β€” wrong level. The useful question is: what are the thinking tools that let us reason about cells, the way language and logic let us reason with neurons?

The simplification hierarchy

Level Model What it captures Speed
1 Boolean networks Gene ON/OFF logic, regulatory circuits, cell fate attractors (Kauffman NK model) Laptop, milliseconds
2 Gillespie / ODE Stochastic gene expression noise, protein concentration dynamics, repressilator circuits Laptop, seconds
3 Flux balance analysis Metabolic flow β€” linear programming over reaction constraints. What can the cell produce? Laptop, seconds
4 Coarse-grained MD (MARTINI) Membrane dynamics, lipid bilayers, large protein complexes β€” atoms grouped into beads GPU, hours
5 All-atom MD Every atom explicit. Nanosecond timescales. GROMACS/AMBER/OpenMM. GPU cluster, days
6 Whole-cell model Karr 2012 (Cell): complete Mycoplasma genitalium, 525 genes, 128 researchers. Still the state of the art. HPC cluster, hours

The LLM-friendly zone is levels 1–3. Boolean networks and Gillespie simulation are fast enough to run in an agent loop, rich enough to capture real biology, and the tools are Python/Java GUI β€” not ergonomic for programmatic use.

Experimental thinking tools

None of these experimental tools benefit from Rust. Their speed limits are wet-lab biology, not compute.

Industry Hiring Reality

Verified from public job postings and GitHub org analysis, June 2026.

Company Stack (verified) Rust/Go?
Recursion Python primary. gflownet (289 stars) is most active OSS project. Go in 3 infra forks (logrus-stack, go-testutils), last active 2020. Go optional in full-stack JDs alongside Python/Java/Ruby
Benchling Python/Flask core, TypeScript frontend Go optional in infra JDs; no Rust
Ginkgo Bioworks Python + C# (Driver Developer JD). One internal Rust tool: gen (sequence VCS, 9 stars, last active Apr 2025) Rust exists but not in hiring signals
Generate:Biomedicines Python, ML Ops No Rust/Go in JDs
Unnamed top-10 pharma HN "Who is Hiring" May 2026: Rust SDK + TypeScript + workflow DSL. Hiring 4–5 Rust engineers. Company not named. Rust explicitly required β€” only confirmed case

Bottom line: Zero named companies list Rust in job postings as of June 2026. The most likely employers for Rust/C++ simulation engineering β€” D.E. Shaw Research, SchrΓΆdinger, Relay Therapeutics β€” were not surveyed but are the correct targets if simulation performance is the goal.

The HFT β†’ bio transfer works, but the bottleneck shifts: from network/syscall latency to GPU memory bandwidth and Python interop. The mental model adapts.

90-Day Entry Path

For a Rust/Go/TS senior engineer with no prior bio background, one merged PR in a peer-reviewed or production bio project by day 90 is achievable.

Month 1: Orient

Month 2: Ship

Month 3: Commit

What to avoid