AI (Claude + OpenAI)
Streaming chat, embeddings, and how the abstraction lets you swap providers.
Last battery: AI. Grit ships a unified client for Claude (Anthropic) and OpenAI ā same interface, swap with one env var. Streaming chat, embeddings, structured output, all behind a small Go service.
Why we need a wrapper
- Provider switching ā one SDK per provider gets you locked in. A common interface lets you A/B Claude vs. GPT vs. Llama-on-OpenRouter without rewriting features.
- Streaming ā both APIs use SSE differently. The wrapper normalises so your handler always emits the same shape to the frontend.
- Cost tracking ā every call logs tokens in, tokens out, cost (rate Ć tokens). Admin dashboard shows weekly spend.
Where it lives
apps/api/internal/ai/ āāā ai.go ā Service: Chat, ChatStream, Embed āāā claude.go ā Anthropic implementation āāā openai.go ā OpenAI implementation āāā usage.go ā Token + cost logging
The unified interface
type Provider interface {Chat(ctx context.Context, req ChatRequest) (ChatResponse, error)ChatStream(ctx context.Context, req ChatRequest, send func(delta string) error) errorEmbed(ctx context.Context, text string) ([]float32, error)}type ChatRequest struct {Model string // e.g. "claude-opus-4-5", "gpt-4o"Messages []ChatMessageTemperature float64MaxTokens int}type ChatMessage struct {Role string // "system" | "user" | "assistant"Content string}type ChatResponse struct {Content stringTokensIn intTokensOut intModelUsed string}
Three operations: blocking chat, streaming chat, embeddings. Same shape regardless of provider. Pick which provider via:
func New(cfg Config) Provider {switch cfg.Provider {case "claude":return NewClaude(cfg.AnthropicKey)case "openai":return NewOpenAI(cfg.OpenAIKey)default:return NewClaude(cfg.AnthropicKey)}}
A streaming chat handler
func (h *AIHandler) Chat(c *gin.Context) {var in struct{ Prompt string }if err := c.ShouldBindJSON(&in); err != nil {c.JSON(400, gin.H{"error": err.Error()})return}// Server-Sent Events headersc.Header("Content-Type", "text/event-stream")c.Header("Cache-Control", "no-cache")c.Header("Connection", "keep-alive")c.Writer.Flush()err := h.ai.ChatStream(c.Request.Context(), ai.ChatRequest{Model: "claude-opus-4-5",Messages: []ai.ChatMessage{{Role: "system", Content: "You are a helpful assistant."},{Role: "user", Content: in.Prompt},},MaxTokens: 1000,}, func(delta string) error {fmt.Fprintf(c.Writer, "data: %s\n\n", delta)c.Writer.Flush()return nil})if err != nil {fmt.Fprintf(c.Writer, "event: error\ndata: %s\n\n", err.Error())} else {fmt.Fprintf(c.Writer, "event: done\ndata: \n\n")}}
Notice: each chunk gets flushed immediately. The frontend reads the SSE stream and appends each delta to the UI as it arrives ā that's the "typing" effect users expect from a modern AI app.
The frontend consuming the stream
async function streamChat(prompt: string, onDelta: (text: string) => void) {const res = await fetch('/api/ai/chat', {method: 'POST',headers: { 'Content-Type': 'application/json' },body: JSON.stringify({ prompt }),})const reader = res.body!.getReader()const decoder = new TextDecoder()while (true) {const { value, done } = await reader.read()if (done) breakconst chunk = decoder.decode(value)for (const line of chunk.split('\n\n')) {if (line.startsWith('data: ')) onDelta(line.slice(6))}}}
Browser-native streaming. No special SDK. Works on every modern browser; works in React Native via the same fetch + reader pattern.
Embeddings ā for semantic search
// Convert text to a 1536-dim vectorvec, err := h.ai.Embed(ctx, "How do I reset my password?")// vec is now []float32 of length 1536// Save in DB (pgvector or any vector store)db.Create(&Doc{Title: title, Body: body, Embedding: vec})// At query time:queryVec, _ := h.ai.Embed(ctx, userQuery)// db.Order("embedding <-> ?", queryVec).Limit(5).Find(&docs) -- pgvector// returns top 5 docs by semantic similarity
Embeddings enable "find me docs that MEAN this", not just "match these words". The vector is dense ā far better than keyword search for question-answering, support bots, and recommendations.
MaxTokens cap.Cost tracking
type Usage struct {ID uintUserID uintModel stringTokensIn intTokensOut intCostCents int // computed: tokens * model rateCreatedAt time.Time}// After every chat, log usage:func (s *AIService) logUsage(ctx context.Context, userID uint, resp ChatResponse) {cost := costFor(resp.ModelUsed, resp.TokensIn, resp.TokensOut)s.db.WithContext(ctx).Create(&Usage{UserID: userID, Model: resp.ModelUsed,TokensIn: resp.TokensIn, TokensOut: resp.TokensOut,CostCents: cost,})}
Admin page /admin/system/ai shows:
- Today / week / month spend by model.
- Top users by token consumption.
- Average cost per request.
When the bill arrives at end of month, you can correlate it to product usage. Without this, AI costs are a mystery.
How to modify this battery
- Add a new provider (Llama via OpenRouter, Mistral, Gemini) ā implement the
Providerinterface in a new file. Wire it inNew(). Done. - Change the default model ā edit the handler's
Model:string. Or pull from config so it's env-driven. - Add prompt logging ā for debugging or audit trail. Store the user's prompt + the assistant's reply in a table. Be aware of privacy implications; redact PII if your domain demands it.
- Per-user budget ā sum
usage.cost_centsfor the user this month. Reject if over.
Local dev ā what you need
No mock; you need real keys for local dev.
AI_PROVIDER=claude # or openaiANTHROPIC_API_KEY=sk-ant-...OPENAI_API_KEY=sk-...
Use the cheapest model in dev (claude-haiku-4-5 or gpt-4o-mini) so accidental loops don't burn money. Switch to the big model only when shipping.
Quick check
Try it
Build a real AI feature:
- Add an endpoint:
POST /api/notes/:id/summarisethat loads a Note from the DB and asks the AI to summarise the body into 2 sentences. - Save the summary to a new field
note.summary string. - Render the summary in the notes list.
- Try with both providers. Set
AI_PROVIDER=openai, restart, hit the endpoint again. Same code path, different model. - Confirm token + cost logging in the
usagetable.
You finished the Batteries chapter š
Five batteries: Cache, Storage, Mail, Jobs, AI. You know what each does, where the code lives, how to call it from a service, and how to modify it. That's the entire surface of the Grit batteries-included offering.
What's next
Chapter 7 ā Architecture Modes. With all the Grit fundamentals in place, the last orientation chapter: which architecture mode (kit) is right for which kind of product.
Spot a typo? Have an idea?
Help us improve this lesson. One click opens a GitHub issue with the lesson URL pre-filled ā suggest clearer wording, report a bug, or request more depth. The course keeps improving thanks to learners like you.
Suggest an improvement on GitHub