AI (Claude + OpenAI)

Streaming chat, embeddings, and how the abstraction lets you swap providers.

8 minmedium

Last battery: AI. Grit ships a unified client for Claude (Anthropic) and OpenAI — same interface, swap with one env var. Streaming chat, embeddings, structured output, all behind a small Go service.

Why we need a wrapper

  • Provider switching — one SDK per provider gets you locked in. A common interface lets you A/B Claude vs. GPT vs. Llama-on-OpenRouter without rewriting features.
  • Streaming — both APIs use SSE differently. The wrapper normalises so your handler always emits the same shape to the frontend.
  • Cost tracking — every call logs tokens in, tokens out, cost (rate Ɨ tokens). Admin dashboard shows weekly spend.

Where it lives

apps/api/internal/ai/
ā”œā”€ā”€ ai.go            ← Service: Chat, ChatStream, Embed
ā”œā”€ā”€ claude.go        ← Anthropic implementation
ā”œā”€ā”€ openai.go        ← OpenAI implementation
└── usage.go         ← Token + cost logging

The unified interface

apps/api/internal/ai/ai.go
type Provider interface {
Chat(ctx context.Context, req ChatRequest) (ChatResponse, error)
ChatStream(ctx context.Context, req ChatRequest, send func(delta string) error) error
Embed(ctx context.Context, text string) ([]float32, error)
}
type ChatRequest struct {
Model string // e.g. "claude-opus-4-5", "gpt-4o"
Messages []ChatMessage
Temperature float64
MaxTokens int
}
type ChatMessage struct {
Role string // "system" | "user" | "assistant"
Content string
}
type ChatResponse struct {
Content string
TokensIn int
TokensOut int
ModelUsed string
}

Three operations: blocking chat, streaming chat, embeddings. Same shape regardless of provider. Pick which provider via:

func New(cfg Config) Provider {
switch cfg.Provider {
case "claude":
return NewClaude(cfg.AnthropicKey)
case "openai":
return NewOpenAI(cfg.OpenAIKey)
default:
return NewClaude(cfg.AnthropicKey)
}
}

A streaming chat handler

apps/api/internal/handlers/ai_handler.go
func (h *AIHandler) Chat(c *gin.Context) {
var in struct{ Prompt string }
if err := c.ShouldBindJSON(&in); err != nil {
c.JSON(400, gin.H{"error": err.Error()})
return
}
// Server-Sent Events headers
c.Header("Content-Type", "text/event-stream")
c.Header("Cache-Control", "no-cache")
c.Header("Connection", "keep-alive")
c.Writer.Flush()
err := h.ai.ChatStream(c.Request.Context(), ai.ChatRequest{
Model: "claude-opus-4-5",
Messages: []ai.ChatMessage{
{Role: "system", Content: "You are a helpful assistant."},
{Role: "user", Content: in.Prompt},
},
MaxTokens: 1000,
}, func(delta string) error {
fmt.Fprintf(c.Writer, "data: %s\n\n", delta)
c.Writer.Flush()
return nil
})
if err != nil {
fmt.Fprintf(c.Writer, "event: error\ndata: %s\n\n", err.Error())
} else {
fmt.Fprintf(c.Writer, "event: done\ndata: \n\n")
}
}

Notice: each chunk gets flushed immediately. The frontend reads the SSE stream and appends each delta to the UI as it arrives — that's the "typing" effect users expect from a modern AI app.

The frontend consuming the stream

async function streamChat(prompt: string, onDelta: (text: string) => void) {
const res = await fetch('/api/ai/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt }),
})
const reader = res.body!.getReader()
const decoder = new TextDecoder()
while (true) {
const { value, done } = await reader.read()
if (done) break
const chunk = decoder.decode(value)
for (const line of chunk.split('\n\n')) {
if (line.startsWith('data: ')) onDelta(line.slice(6))
}
}
}

Browser-native streaming. No special SDK. Works on every modern browser; works in React Native via the same fetch + reader pattern.

Embeddings — for semantic search

// Convert text to a 1536-dim vector
vec, err := h.ai.Embed(ctx, "How do I reset my password?")
// vec is now []float32 of length 1536
// Save in DB (pgvector or any vector store)
db.Create(&Doc{Title: title, Body: body, Embedding: vec})
// At query time:
queryVec, _ := h.ai.Embed(ctx, userQuery)
// db.Order("embedding <-> ?", queryVec).Limit(5).Find(&docs) -- pgvector
// returns top 5 docs by semantic similarity

Embeddings enable "find me docs that MEAN this", not just "match these words". The vector is dense — far better than keyword search for question-answering, support bots, and recommendations.

Never send user input to AI without auth + rate limit. AI calls cost real money. An unauthenticated endpoint that prompts on user input is a $50,000-overnight bug. Require auth, throttle per user, set a hard MaxTokens cap.

Cost tracking

apps/api/internal/ai/usage.go
type Usage struct {
ID uint
UserID uint
Model string
TokensIn int
TokensOut int
CostCents int // computed: tokens * model rate
CreatedAt time.Time
}
// After every chat, log usage:
func (s *AIService) logUsage(ctx context.Context, userID uint, resp ChatResponse) {
cost := costFor(resp.ModelUsed, resp.TokensIn, resp.TokensOut)
s.db.WithContext(ctx).Create(&Usage{
UserID: userID, Model: resp.ModelUsed,
TokensIn: resp.TokensIn, TokensOut: resp.TokensOut,
CostCents: cost,
})
}

Admin page /admin/system/ai shows:

  • Today / week / month spend by model.
  • Top users by token consumption.
  • Average cost per request.

When the bill arrives at end of month, you can correlate it to product usage. Without this, AI costs are a mystery.

How to modify this battery

  • Add a new provider (Llama via OpenRouter, Mistral, Gemini) — implement the Provider interface in a new file. Wire it in New(). Done.
  • Change the default model — edit the handler's Model: string. Or pull from config so it's env-driven.
  • Add prompt logging — for debugging or audit trail. Store the user's prompt + the assistant's reply in a table. Be aware of privacy implications; redact PII if your domain demands it.
  • Per-user budget — sum usage.cost_cents for the user this month. Reject if over.

Local dev — what you need

No mock; you need real keys for local dev.

AI_PROVIDER=claude # or openai
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

Use the cheapest model in dev (claude-haiku-4-5 or gpt-4o-mini) so accidental loops don't burn money. Switch to the big model only when shipping.

Quick check

You want to A/B test Claude vs GPT for a summarisation feature. How does Grit's AI service help?

Try it

Build a real AI feature:

  1. Add an endpoint: POST /api/notes/:id/summarise that loads a Note from the DB and asks the AI to summarise the body into 2 sentences.
  2. Save the summary to a new field note.summary string.
  3. Render the summary in the notes list.
  4. Try with both providers. Set AI_PROVIDER=openai, restart, hit the endpoint again. Same code path, different model.
  5. Confirm token + cost logging in the usage table.

You finished the Batteries chapter šŸŽ‰

Five batteries: Cache, Storage, Mail, Jobs, AI. You know what each does, where the code lives, how to call it from a service, and how to modify it. That's the entire surface of the Grit batteries-included offering.

What's next

Chapter 7 — Architecture Modes. With all the Grit fundamentals in place, the last orientation chapter: which architecture mode (kit) is right for which kind of product.

Spot a typo? Have an idea?

Help us improve this lesson. One click opens a GitHub issue with the lesson URL pre-filled — suggest clearer wording, report a bug, or request more depth. The course keeps improving thanks to learners like you.

Suggest an improvement on GitHub