Your first benchmark

A `BenchmarkPluralize` in 10 lines.

7 mineasy

Go has the cleanest built-in benchmarking I've ever used. Write a function with a name starting Benchmark, loop over b.N, run go test -bench=.. Done. This lesson is that whole loop, end-to-end.

The shape of a benchmark

internal/generate/pluralize_test.go

package generate

import "testing"

func BenchmarkPluralize(b *testing.B) {
  for i := 0; i < b.N; i++ {
    Pluralize("category")
  }
}

Three things to absorb:

Name starts with Benchmark — Go test tool finds it.
Takes *testing.B.
Loops b.N times. Go decides b.N at runtime — it ramps until the benchmark runs long enough to be statistically meaningful (usually a second).

Run it

go test -bench=BenchmarkPluralize -benchmem -count=3 ./internal/generate/

Three flags worth knowing:

-bench=BenchmarkXxx — which benchmarks to run.=. matches all.
-benchmem — also report bytes/op and allocs/op. Use it every time.
-count=3 — run each benchmark 3 times. You want this — single runs are noisy.

The output

goos: darwin
goarch: arm64
pkg: github.com/MUKE-coder/grit/internal/generate
BenchmarkPluralize-10  	5012461	   228.4 ns/op	    64 B/op	   2 allocs/op
BenchmarkPluralize-10  	5102937	   226.1 ns/op	    64 B/op	   2 allocs/op
BenchmarkPluralize-10  	5098102	   227.8 ns/op	    64 B/op	   2 allocs/op
PASS
ok  	github.com/MUKE-coder/grit/internal/generate	4.523s

Decode each column:

BenchmarkPluralize-10 — the -10 is GOMAXPROCS (cores). Same number across runs means same machine.
5012461 — the b.N Go settled on. ~5 million iterations.
228.4 ns/op — average nanoseconds per operation. THIS is the headline metric.
64 B/op — bytes allocated per op. Lower = less GC pressure.
2 allocs/op — number of heap allocations per op. Lower = even less GC pressure.

Why "per op" not "total time"

b.N changes between runs (or even between machines), so absolute total time is meaningless. What matters is the per-op average, which IS comparable across runs and machines (with caveats — see "noise" below).

Multiple benchmarks in one file

func BenchmarkPluralize_short(b *testing.B) {
  for i := 0; i < b.N; i++ { Pluralize("dog") }
}

func BenchmarkPluralize_long(b *testing.B) {
  for i := 0; i < b.N; i++ { Pluralize("administrator") }
}

func BenchmarkPluralize_irregular(b *testing.B) {
  for i := 0; i < b.N; i++ { Pluralize("child") }
}

Compare different input shapes side by side. Often the revelation: "long words are 3x slower", "irregular words trigger a fallback path".

What about setup that shouldn't count?

func BenchmarkParseInline(b *testing.B) {
  // Setup — NOT measured
  input := buildLargeStringFromFile()

  b.ResetTimer()   // restart the clock from here

  for i := 0; i < b.N; i++ {
    ParseInlineFields(input)
  }
}

b.ResetTimer() says "everything before this is setup; start measuring now". Critical for benchmarks where the setup is expensive.

Sub-benchmarks (table-driven)

func BenchmarkPluralize_table(b *testing.B) {
  for _, name := range []string{"dog", "cat", "administrator", "child", "deer"} {
    b.Run(name, func(b *testing.B) {
      for i := 0; i < b.N; i++ {
        Pluralize(name)
      }
    })
  }
}

Output shows each as its own line:

BenchmarkPluralize_table/dog-10           5102937   226.1 ns/op
BenchmarkPluralize_table/cat-10           5012461   228.4 ns/op
BenchmarkPluralize_table/administrator-10 4012461   299.8 ns/op
...

One file, comprehensive coverage of the function's input space. The Go convention for serious benchmarking.

Benchmarks are noisy on a laptop. Background Slack, Spotlight indexing, thermal throttling — all add jitter. For serious numbers: kill background apps, close browser tabs, plug in to power, run -count=10 and use benchstat (next chapter) to get a statistically sound comparison.

Quick check

You write `for i := 0; i < 100; i++` instead of `for i := 0; i < b.N; i++` in a benchmark. What happens?

Try it

Write your first three benchmarks:

Pick a function in your service layer. Anything pure and side-effect-free is easiest (a parser, a formatter).
Add func BenchmarkX(b *testing.B) { for i := 0; i < b.N; i++ { X() } } to the test file.
Run with go test -bench=. -benchmem -count=3 ./your/pkg/.
Make a 3-input table-driven version (sub-benchmarks).
Paste the output in notes.md. Circle the most interesting line — the one that's slower than you expected.

What's next

Next lesson — Reading the output. The numbers tell a story. After this lesson you'll be fluent enough to spot "there's a hidden allocation" or "this function is O(n²) and we didn't notice".

Spot a typo? Have an idea?

Help us improve this lesson. One click opens a GitHub issue with the lesson URL pre-filled — suggest clearer wording, report a bug, or request more depth. The course keeps improving thanks to learners like you.

Suggest an improvement on GitHub

Next lesson