AI (Claude + OpenAI)

Streaming chat, embeddings, and how the abstraction lets you swap providers.

8 minmedium

Last battery: AI. Grit ships a unified client for Claude (Anthropic) and OpenAI — same interface, swap with one env var. Streaming chat, embeddings, structured output, all behind a small Go service.

Why we need a wrapper

Provider switching — one SDK per provider gets you locked in. A common interface lets you A/B Claude vs. GPT vs. Llama-on-OpenRouter without rewriting features.
Streaming — both APIs use SSE differently. The wrapper normalises so your handler always emits the same shape to the frontend.
Cost tracking — every call logs tokens in, tokens out, cost (rate × tokens). Admin dashboard shows weekly spend.

Where it lives

apps/api/internal/ai/
├── ai.go            ← Service: Chat, ChatStream, Embed
├── claude.go        ← Anthropic implementation
├── openai.go        ← OpenAI implementation
└── usage.go         ← Token + cost logging

The unified interface

apps/api/internal/ai/ai.go

type Provider interface {
  Chat(ctx context.Context, req ChatRequest) (ChatResponse, error)
  ChatStream(ctx context.Context, req ChatRequest, send func(delta string) error) error
  Embed(ctx context.Context, text string) ([]float32, error)
}

type ChatRequest struct {
  Model       string         // e.g. "claude-opus-4-5", "gpt-4o"
  Messages    []ChatMessage
  Temperature float64
  MaxTokens   int
}

type ChatMessage struct {
  Role    string  // "system" | "user" | "assistant"
  Content string
}

type ChatResponse struct {
  Content    string
  TokensIn   int
  TokensOut  int
  ModelUsed  string
}

Three operations: blocking chat, streaming chat, embeddings. Same shape regardless of provider. Pick which provider via:

func New(cfg Config) Provider {
  switch cfg.Provider {
  case "claude":
    return NewClaude(cfg.AnthropicKey)
  case "openai":
    return NewOpenAI(cfg.OpenAIKey)
  default:
    return NewClaude(cfg.AnthropicKey)
  }
}

A streaming chat handler

apps/api/internal/handlers/ai_handler.go

func (h *AIHandler) Chat(c *gin.Context) {
  var in struct{ Prompt string }
  if err := c.ShouldBindJSON(&in); err != nil {
    c.JSON(400, gin.H{"error": err.Error()})
    return
  }

  // Server-Sent Events headers
  c.Header("Content-Type", "text/event-stream")
  c.Header("Cache-Control", "no-cache")
  c.Header("Connection", "keep-alive")
  c.Writer.Flush()

  err := h.ai.ChatStream(c.Request.Context(), ai.ChatRequest{
    Model: "claude-opus-4-5",
    Messages: []ai.ChatMessage{
      {Role: "system", Content: "You are a helpful assistant."},
      {Role: "user",   Content: in.Prompt},
    },
    MaxTokens: 1000,
  }, func(delta string) error {
    fmt.Fprintf(c.Writer, "data: %s\n\n", delta)
    c.Writer.Flush()
    return nil
  })
  if err != nil {
    fmt.Fprintf(c.Writer, "event: error\ndata: %s\n\n", err.Error())
  } else {
    fmt.Fprintf(c.Writer, "event: done\ndata: \n\n")
  }
}

Notice: each chunk gets flushed immediately. The frontend reads the SSE stream and appends each delta to the UI as it arrives — that's the "typing" effect users expect from a modern AI app.

The frontend consuming the stream

async function streamChat(prompt: string, onDelta: (text: string) => void) {
  const res = await fetch('/api/ai/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ prompt }),
  })
  const reader = res.body!.getReader()
  const decoder = new TextDecoder()
  while (true) {
    const { value, done } = await reader.read()
    if (done) break
    const chunk = decoder.decode(value)
    for (const line of chunk.split('\n\n')) {
      if (line.startsWith('data: ')) onDelta(line.slice(6))
    }
  }
}

Browser-native streaming. No special SDK. Works on every modern browser; works in React Native via the same fetch + reader pattern.

Embeddings — for semantic search

// Convert text to a 1536-dim vector
vec, err := h.ai.Embed(ctx, "How do I reset my password?")
// vec is now []float32 of length 1536

// Save in DB (pgvector or any vector store)
db.Create(&Doc{Title: title, Body: body, Embedding: vec})

// At query time:
queryVec, _ := h.ai.Embed(ctx, userQuery)
// db.Order("embedding <-> ?", queryVec).Limit(5).Find(&docs)  -- pgvector
// returns top 5 docs by semantic similarity

Embeddings enable "find me docs that MEAN this", not just "match these words". The vector is dense — far better than keyword search for question-answering, support bots, and recommendations.

Never send user input to AI without auth + rate limit. AI calls cost real money. An unauthenticated endpoint that prompts on user input is a $50,000-overnight bug. Require auth, throttle per user, set a hard MaxTokens cap.

Cost tracking

apps/api/internal/ai/usage.go

type Usage struct {
  ID        uint
  UserID    uint
  Model     string
  TokensIn  int
  TokensOut int
  CostCents int     // computed: tokens * model rate
  CreatedAt time.Time
}

// After every chat, log usage:
func (s *AIService) logUsage(ctx context.Context, userID uint, resp ChatResponse) {
  cost := costFor(resp.ModelUsed, resp.TokensIn, resp.TokensOut)
  s.db.WithContext(ctx).Create(&Usage{
    UserID: userID, Model: resp.ModelUsed,
    TokensIn: resp.TokensIn, TokensOut: resp.TokensOut,
    CostCents: cost,
  })
}

Admin page /admin/system/ai shows:

Today / week / month spend by model.
Top users by token consumption.
Average cost per request.

When the bill arrives at end of month, you can correlate it to product usage. Without this, AI costs are a mystery.

How to modify this battery

Add a new provider (Llama via OpenRouter, Mistral, Gemini) — implement the Provider interface in a new file. Wire it in New(). Done.
Change the default model — edit the handler's Model: string. Or pull from config so it's env-driven.
Add prompt logging — for debugging or audit trail. Store the user's prompt + the assistant's reply in a table. Be aware of privacy implications; redact PII if your domain demands it.
Per-user budget — sum usage.cost_cents for the user this month. Reject if over.

Local dev — what you need

No mock; you need real keys for local dev.

AI_PROVIDER=claude         # or openai
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

Use the cheapest model in dev (claude-haiku-4-5 or gpt-4o-mini) so accidental loops don't burn money. Switch to the big model only when shipping.

Quick check

You want to A/B test Claude vs GPT for a summarisation feature. How does Grit's AI service help?

Try it

Build a real AI feature:

Add an endpoint: POST /api/notes/:id/summarise that loads a Note from the DB and asks the AI to summarise the body into 2 sentences.
Save the summary to a new field note.summary string.
Render the summary in the notes list.
Try with both providers. Set AI_PROVIDER=openai, restart, hit the endpoint again. Same code path, different model.
Confirm token + cost logging in the usage table.

You finished the Batteries chapter 🎉

Five batteries: Cache, Storage, Mail, Jobs, AI. You know what each does, where the code lives, how to call it from a service, and how to modify it. That's the entire surface of the Grit batteries-included offering.

What's next

Chapter 7 — Architecture Modes. With all the Grit fundamentals in place, the last orientation chapter: which architecture mode (kit) is right for which kind of product.

Spot a typo? Have an idea?

Help us improve this lesson. One click opens a GitHub issue with the lesson URL pre-filled — suggest clearer wording, report a bug, or request more depth. The course keeps improving thanks to learners like you.

Suggest an improvement on GitHub

Previous lesson Next lesson