Why Go is fast, what Rust could learn, and how it compares to TypeScript
Last updated: 2026-02-19
Go compiles in seconds, runs goroutines at 2KB each, pauses GC for <100 microseconds, and ships as a single binary — trading some raw CPU performance for dramatically faster development cycles and simpler operations.
Go was designed for fast compilation and simple concurrency at Google's scale. These are the technical decisions that make it fast.
| 6x | faster compilation than Rust (0.23s vs 1.47s for same problem)[1] |
| 2KB | initial goroutine stack size (vs 1-2MB for OS threads) — 500-1000x smaller |
| <100μs | typical GC stop-the-world pause (improved from 300ms in Go 1.0 to sub-ms in Go 1.8) |
| Language | Same Problem | Notes |
|---|---|---|
| Go | 0.23s | Fastest |
| C++ | 0.56s | ~2.4x slower |
| Rust | 1.47s | ~6.4x slower (but catches more at compile time) |
Unlike C/C++ where the compiler parses thousands of header files, Go packages contain export data: a binary description of exported definitions. Importing a package reads one object file, not transitive dependencies. When imports don't affect exported types, dependent packages don't need recompilation.
Go isn't the fastest language at raw computation. It's 30-200% slower than Rust/C++ in CPU-intensive tasks. But it's fast enough for:
Hover over highlighted termsTerm TooltipThroughout this page, terms with dotted underlines show definitions when hovered. anywhere on this page for definitions.
When to use Go vs alternatives.
Go's scheduler multiplexes millions of goroutines onto a small pool of OS threads.
| Step | What happens |
|---|---|
| 1 | go func() creates G, places in P's Local Run Queue |
| 2 | M bound to P fetches G from LRQ |
| 3 | If LRQ empty, check Global Run Queue |
| 4 | If GRQ empty, work steal from other P's queues |
| 5 | G blocks on syscall: M detaches from P, P picked up by another M |
Before Go 1.14: Cooperative only. CPU-bound goroutines could freeze others.
Go 1.14+: Asynchronous preemption via SIGURG signal. Goroutines running >10ms get preempted at safe points.
Go uses a concurrent tri-color mark-and-sweepTri-Color GCObjects classified as White (unvisited), Gray (reachable, children not scanned), Black (fully scanned). White objects collected. collector optimized for low latency.
| Parameter | Default | Effect |
|---|---|---|
GOGC | 100 | % growth before GC. Higher = less CPU, more memory. |
GOMEMLIMIT | unlimited | Hard memory ceiling. Triggers GC when approached. |
| Resource | Goroutine | OS Thread | Factor |
|---|---|---|---|
| Initial stack | 2KB | 1-2MB | 500-1000x smaller |
| Context switch | ~100ns (3 registers) | ~1-10μs (50+ registers) | 10-100x faster |
| Creation cost | ~300ns | ~10μs+ | 30x+ faster |
| Max concurrent | Millions possible | ~10k practical | 100x+ more |
With 2KB stacks: 0.5 million goroutines per GB of RAM.
Go's type system is deceptively simple on the surface but has sophisticated runtime machinery underneath. Understanding it explains both Go's power and its tradeoffs vs Rust and TypeScript.
Named types are distinct even with same underlying type:
type Celsius float64 type Fahrenheit float64 // Cannot assign Celsius to Fahrenheit // Must explicitly convert
Interfaces are satisfied implicitly by method signature:
type Reader interface {
Read(p []byte) (n int, err error)
}
// *os.File satisfies Reader
// No "implements" keyword needed
Every type has an abi.Type descriptor in the binary (from internal/abi/type.go):
Size_ — size in bytesPtrBytes — prefix bytes containing pointers (for GC scanning)Hash — pre-computed hash for O(1) type identity checksKind_ — category: Int, Struct, Interface, Pointer, etc.Equal — function pointer for equality comparisonGCData — pointer bitmap for garbage collectorKey flag: TFlagDirectIface — when set, concrete value stored directly in interface (not pointer to heap). Used for pointers and single-word values.
Interfaces are fat pointers — 16 bytes on 64-bit systems:
| Interface Type | Structure | Size |
|---|---|---|
Non-empty (io.Reader) |
iface { tab *itab; data unsafe.Pointer } |
16 bytes |
Empty (any) |
eface { _type *_type; data unsafe.Pointer } |
16 bytes |
Empty interfaces (any/interface{}) skip the itab lookup since there are no methods to dispatch.
The itabInterface TableCore dispatch structure containing interface type, concrete type, and method function pointers. Cached globally for reuse. (interface table) is Go's mechanism for dynamic dispatch:
type itab struct {
inter *interfacetype // Interface being satisfied
_type *_type // Concrete type implementing it
hash uint32 // Copy of _type.hash (fast type switches)
_ [4]byte // Padding
fun [1]uintptr // Variable-sized method table (vtable)
}
fun is declared as [1]uintptr but the compiler allocates space for all interface methods. Runtime accesses via pointer arithmetic, bypassing bounds checking. fun[0] == 0 means "type doesn't implement interface."
Read might be index 0, Write index 1.var r io.Reader = f runs, runtime calls getitab() to build/lookup the itab.fun[i] with concrete method addresses.r.Read(buf) → load itab.fun[0] → call with r.data as receiver. One pointer deref + indexed load.MOVQ (r.tab), DX // Load itab pointer MOVQ 24(DX), AX // Load fun[0] (method at offset 24) MOVQ 8(r), CX // Load r.data (concrete value) CALL AX // Jump to method
Offset 24 = sizeof(inter) + sizeof(_type) + sizeof(hash) + padding = 8+8+4+4.
| Receiver Type | Method Set Contains |
|---|---|
Value T | Methods with receiver T only |
Pointer *T | Methods with receiver T and *T |
type Stringer interface { String() string }
type S struct{}
func (s *S) String() string { return "S" }
var _ Stringer = &S{} // OK: *S has String
var _ Stringer = S{} // ERROR: S doesn't have String
Values in interfaces are not addressable, so pointer-receiver methods can't be called on them.
When the compiler can prove the concrete type, it converts interface calls to direct calls:
h := sha1.New() // Returns hash.Hash, but compiler knows it's *sha1.digest h.Write(data) // Converted to direct call: sha1.(*digest).Write
Check with: go build -gcflags='-m'
PGO transforms hot interface calls based on runtime profile data:
// Before PGO
r.Read(buf)
// After PGO (if profile shows *os.File dominates)
if f, ok := r.(*os.File); ok {
f.Read(buf) // Direct call, inlinable
} else {
r.Read(buf) // Fallback indirect call
}
Result: 2-14% CPU reduction typical.
| Operation | Cost | Notes |
|---|---|---|
| Direct call | ~1.6 ns | Baseline |
| Interface call | ~15 ns | Includes potential allocation |
| itab lookup (cached) | ~0.15 ns | Global hash table |
| Type assertion | ~1 ns | Pointer comparison |
| Type switch | ~2-5 ns | Hash + comparison |
func maybeError() error {
var p *os.PathError = nil
return p // Returns (type=*os.PathError, data=nil)
}
err := maybeError()
fmt.Println(err == nil) // false! Interface has type but nil data
An interface is nil only when both tab (or _type) AND data are nil.
func maybeError() error {
var p *os.PathError = nil
if p == nil {
return nil // Returns (type=nil, data=nil)
}
return p
}
func process(r io.Reader) {
buf := make([]byte, 1024) // Escapes to heap!
r.Read(buf) // Compiler can't analyze through interface
}
Compiler doesn't know concrete Read implementation → assumes buffer could be retained → heap allocation.
| Version | Time | Allocations |
|---|---|---|
| With interface | ~24.5 ns/op | 1 alloc |
| With concrete type | ~5.5 ns/op | 0 alloc |
m := map[string]any{"count": 42}
count := m["count"].(int64) // PANIC: 42 is int, not int64
// JSON unmarshaling is worse:
var data map[string]any
json.Unmarshal([]byte(`{"count":42}`), &data)
count := data["count"].(int) // PANIC: it's float64!
Fix: Use comma-ok form or type switch.
| Aspect | Go | Rust | TypeScript |
|---|---|---|---|
| Interface model | Structural (implicit) | Nominal (explicit impl) |
Structural (implicit) |
| Dispatch | itab vtable at runtime | Monomorphization OR dyn vtable |
None (erased at compile) |
| Runtime type info | Full (reflect pkg) | Limited (TypeId) |
None (erased) |
| Generic dispatch | GCShape stenciling + dict | Full monomorphization | N/A (erased) |
| Null safety | nil interface trap | Option type (no null) | null/undefined |
| Performance | ~15 ns/interface call | 0 ns (monomorph) or ~3 ns (dyn) | V8-dependent |
// Static dispatch (monomorphization) - zero cost
fn process<R: Read>(r: R) { ... }
// Dynamic dispatch (vtable) - explicit opt-in
fn process(r: &dyn Read) { ... }
Rust advantage: You choose. Default is zero-cost. dyn vtable is ~3 ns (smaller than Go's itab).
Rust disadvantage: Monomorphization causes binary bloat and longer compile times.
interface Reader {
read(buf: Uint8Array): number;
}
// Types completely erased at runtime
// API response typed as User is actually `any`
TypeScript problem: No runtime type checking. I/O boundaries are unsafe. Need Zod or similar for validation.
Go accepts ~15 ns interface overhead in exchange for:
implementsreflect package works on all typesFor most server workloads, 15 ns per interface call is noise compared to network latency (1-100 ms).
Go 1.18+ generics use GCShape stenciling with dictionaries — a hybrid approach:
| Approach | Go | Rust |
|---|---|---|
| Strategy | Group types by "GC shape" (pointer vs non-pointer) | Full monomorphization per concrete type |
| Code generated | One per shape + runtime dictionary | One per concrete type |
| Binary size | Smaller | Larger (can be huge) |
| Compile time | Faster | Slower |
| Runtime perf | Slower (dictionary indirection) | Faster (fully specialized) |
Different philosophies: Go prioritizes developer velocity, Rust prioritizes correctness.
| Metric | Go | Rust | Winner |
|---|---|---|---|
| CPU-intensive | 1x baseline | 1.3-2x faster | Rust |
| Binary trees bench | 1x | 12x faster | Rust |
| Web service RPS | 2,001 RPS | 3,887 RPS | Rust |
| Memory usage | Higher (GC) | 30-50% less | Rust |
| Latency consistency | GC spikes | Deterministic | Rust |
| Development speed | Days to productive | Months to proficient | Go |
To understand why Go and Rust make different tradeoffs, we need to see how Rust handles dynamic dispatch and how it interacts with async.
// Interface always uses itab var r io.Reader = file r.Read(buf) // vtable lookup
16 bytes: {tab *itab, data *T}
Global hash table caches itabs by (interface, concrete) pair.
// Static dispatch (default)
fn process<R: Read>(r: R) { }
// Dynamic dispatch (explicit)
fn process(r: &dyn Read) { }
16 bytes: {data: *T, vtable: *const ()}
Vtable is statically allocated, no runtime lookup.
Rust's vtable layout is simpler than Go's itab:
// Rust vtable layout (conceptual)
struct Vtable {
drop_in_place: fn(*mut ()), // Destructor
size: usize, // sizeof(T)
align: usize, // alignof(T)
method_0: fn(*const ()) -> R, // First trait method
method_1: fn(*const ()) -> R, // Second trait method...
}
Key differences:
getitab() equivalentdrop_in_place enables Box<dyn Trait> to clean up| Operation | Go | Rust dyn | Rust impl |
|---|---|---|---|
| Single call overhead | ~2 ns (cached) | ~2-3 ns | ~0.6 ns |
| First-time setup | 15-45 ns (itab) | 0 ns (static vtable) | 0 ns |
| Inlining possible | No | No | Yes |
| Branch prediction | Indirect jump | Indirect jump | Direct call |
| Cache locality | Pointer chase | Pointer chase | Inline |
| Approach | Time | vs Baseline | Why |
|---|---|---|---|
Rust impl Trait (static) | 64 ms | 1.0x (baseline) | Inlined, zero dispatch |
Rust &dyn Trait | 216 ms | 3.4x slower | Vtable lookup per call |
| Go interface | ~250 ms* | ~3.9x slower | itab + potential boxing |
*Estimated based on similar workloads; actual varies by GC pressure and escape analysis.
At 20M iterations, even 2 ns per call adds up to 40 ms. But the real cost is lost inlining: static dispatch lets the compiler inline the method body, eliminate bounds checks, and apply loop optimizations. Dynamic dispatch (Go or Rust dyn) blocks all of this. In hot loops, this is the difference between 64 ms and 216+ ms.
| Scenario | Best Choice | Why |
|---|---|---|
| Hot inner loop (millions of calls) | Rust impl | 3-4x faster than dyn |
| Heterogeneous collection | Rust dyn / Go | Must store different types |
| Plugin architecture | Rust dyn / Go | Types not known at compile time |
| API boundaries | Go / Rust dyn | Abstraction more important than speed |
| Rapid prototyping | Go | No need to choose dispatch strategy |
The deepest architectural difference between Go and Rust concurrency is "function colors."
// All functions are the same "color"
func process(items []int) {
for _, item := range items {
go handleItem(item) // Just spawn
}
}
func handleItem(x int) {
// Can do I/O, blocking, anything
time.Sleep(time.Second)
fmt.Println(x)
}
Goroutines are invisible to the type system. Any function can spawn concurrent work.
// Async functions are "colored"
async fn process(items: Vec<i32>) {
for item in items {
handle_item(item).await; // Must await
}
}
async fn handle_item(x: i32) {
tokio::time::sleep(Duration::from_secs(1)).await;
println!("{}", x);
}
Async infects the call stack. Can't call async from sync without runtime.
// Rust: Can't do this!
fn sync_function() {
// ERROR: `await` is only allowed inside `async` functions
async_database_query().await;
}
// Must propagate async up the call stack
async fn sync_function() { // Now async
async_database_query().await;
}
// Or use block_on (spawns new runtime, expensive)
fn sync_function() {
tokio::runtime::Runtime::new()
.unwrap()
.block_on(async_database_query());
}
Go avoids this entirely: The scheduler handles blocking transparently. When a goroutine blocks on I/O, the runtime parks it and runs another — no code changes needed.
Combining async with dyn Trait has been one of Rust's hardest problems.
// This doesn't work directly
trait Database {
async fn query(&self) -> Vec<Row>; // Returns impl Future, size unknown!
}
// Can't put in vtable: Future size varies per implementation
Why? Async functions return impl Future — the concrete Future type differs per implementation. Vtables need fixed-size entries.
#[async_trait]
trait Database {
async fn query(&self) -> Vec<Row>;
}
// Expands to:
trait Database {
fn query(&self) -> Pin<Box<dyn Future<Output = Vec<Row>> + Send>>;
}
Cost: Every call allocates a Box on the heap. ~10-50 ns overhead per call.
// Now works natively! (Rust 1.75, Dec 2023)
trait Database {
fn query(&self) -> impl Future<Output = Vec<Row>>;
}
// But: still can't use with dyn Database easily
// Need trait_variant or manual boxing for dynamic dispatch
Still evolving: async fn in traits landed, but dyn Trait with async methods requires trait_variant or explicit boxing.
| Approach | Allocation | Overhead | Use Case |
|---|---|---|---|
Static (impl Trait) | None | ~0 ns | Known type at compile time |
async_trait (boxed) | Per call | ~10-50 ns | dyn Trait needed |
| Go interface | Sometimes | ~2-15 ns | Always (no static option) |
Rust async functions capture their environment. Moving Futures across threads requires proving thread safety.
// This fails:
async fn process(data: &RefCell<Data>) {
// RefCell is not Sync, can't hold across .await
let guard = data.borrow();
async_operation().await; // ERROR: future is not Send
println!("{:?}", guard);
}
// Go equivalent just works:
func process(data *Data) {
mu.Lock()
defer mu.Unlock()
asyncOperation() // Blocks, but goroutine handles it
fmt.Printf("%v\n", data)
}
Rust forces you to think about:
.await pointsSend (movable between threads)Sync (accessible from multiple threads)Go hides this: The runtime manages goroutine migration. Data races are possible but not compile errors.
Rust: Complex to write, impossible to have data races (in safe code). Compiler errors can be cryptic for async + lifetimes.
Go: Simple to write, race detector catches issues at runtime. Easier to write, easier to have subtle bugs.
Go's "no colors" comes from its runtime design: the scheduler intercepts all blocking operations. Rust's zero-cost abstraction philosophy means you manage concurrency — no hidden runtime magic. This is a fundamental philosophical difference, not a missing feature. Rust trades simplicity for control; Go trades control for simplicity.
| Use Case | Go | Rust |
|---|---|---|
| Web services | Best | Good |
| CLI tools | Best | Good |
| Microservices | Best | Good |
| Rapid prototyping | Best | Slower |
| Systems programming | Adequate | Best |
| Real-time systems | GC concern | Best |
| WebAssembly | Possible | Best |
| Embedded | No | Best |
2025 trend: Hybrid stacks are common. Go for services, Rust for performance-critical components.
Compiled native code vs JIT-compiled JavaScript.
| Metric | Go | Node.js | Factor |
|---|---|---|---|
| HTTP throughput | 180k+ req/s | <40k req/s | 4.5x |
| CPU-bound tasks | 1x | 2.6-30x slower | 2.6-30x |
| Memory (high load) | <150MB | >280MB | 2x less |
| Concurrent connections | 100k+ efficient | ~30k before overhead | 3x+ |
| Cold start | 50ms | 170ms | 3x faster |
| Feature | Go | TypeScript |
|---|---|---|
| Type checking | Compile-time, enforced | Compile-time, erased |
| Runtime type info | Full (reflect package) | None (erased) |
| Nil/null safety | Nil panics possible | Same (null/undefined) |
| Soundness | Sound (no any) | Unsound (any escape) |
TypeScript types are removed at compile time. API responses typed as User are just any at runtime. Go's reflect package provides full runtime type introspection. Use Zod or similar for runtime validation in TypeScript.
| Aspect | Go | Node.js |
|---|---|---|
| Model | Goroutines + channels | Event loop + async/await |
| CPU parallelism | Automatic (all cores) | Worker threads (explicit) |
| Unit memory | 2KB per goroutine | ~1MB per worker thread |
| CPU-bound work | Native parallel | Blocks event loop |
| I/O-bound work | Excellent | Excellent |
Microsoft is rewriting the TypeScript compiler in Go for 10x faster compilation:
| Project | Before | After | Speedup |
|---|---|---|---|
| VS Code (1.5M lines) | 77.8s | 7.5s | 10.4x |
| Playwright | 11.1s | 1.1s | 10.1x |
| TypeORM | 17.5s | 1.3s | 13.5x |
| Editor load time | 9.6s | 1.2s | 8x |
Why Go over Rust: "Programming style closely resembles existing TypeScript codebase. Faster port timeline (1 year vs years for Rust)."
| Resource | URL | For |
|---|---|---|
| Official docs | go.dev/doc | Reference |
| Go Playground | go.dev/play | Quick experiments |
| Go by Example | gobyexample.com | Pattern reference |
| Ardan Labs Blog | ardanlabs.com/blog | Advanced internals |
| TechEmpower Benchmarks | techempower.com/benchmarks | Performance comparison |