AI Gateway

Streaming + 100+ models.

8 minmedium

Most apps that wrap AI need: streaming responses, multi-provider support, and one API key to manage. Grit's AI Gateway integration gives you all three via Vercel AI Gateway β€” Claude, GPT-4, Mistral, Llama, ~98 others.

The Gateway shape

Your API β†’ AI Gateway (Vercel) β†’ Claude / OpenAI / Anthropic / OpenRouter / …
one API key you pay one bill, swap models without code
streaming SSE

Setup

.env
AI_GATEWAY_API_KEY=vck_... # from vercel.com/ai-gateway
AI_GATEWAY_MODEL=anthropic/claude-sonnet-4-6 # provider/model format
AI_GATEWAY_URL=https://ai-gateway.vercel.sh/v1

One env var change to swap from Claude to GPT-4 β€” change AI_GATEWAY_MODEL, no code change.

Non-streaming call

apps/api/internal/handlers/ai.go (excerpt)
func (h *AIHandler) Summarize(c *gin.Context) {
var in struct {
Text string `json:"text"`
}
c.ShouldBindJSON(&in)
out, err := h.ai.Chat(c.Request.Context(), []ai.Message{
{Role: "system", Content: "Summarize the input in 2 sentences."},
{Role: "user", Content: in.Text},
})
if err != nil {
respond.Error(c, 500, "AI_ERROR", err)
return
}
respond.OK(c, gin.H{"summary": out.Content}, "")
}

Streaming (SSE) β€” for chat UIs

Streaming makes the perceived latency tiny β€” the user sees tokens as they arrive instead of waiting for the full response.

func (h *AIHandler) Stream(c *gin.Context) {
c.Header("Content-Type", "text/event-stream")
c.Header("Cache-Control", "no-cache")
c.Writer.Flush()
err := h.ai.ChatStream(c.Request.Context(),
[]ai.Message{{Role: "user", Content: prompt}},
func(token string) error {
fmt.Fprintf(c.Writer, "data: %s\n\n", token)
c.Writer.Flush()
return nil
},
)
if err != nil {
log.Printf("stream: %v", err)
}
}

Each token goes out as an SSE event. The frontend opens an EventSource and renders as tokens arrive β€” same shape ChatGPT and Claude.ai use.

Cost + rate-limiting per user

AI calls cost real money. Two patterns to control spend:

  • Per-user rate limits in Sentinel (covered in next chapter): "max 100 AI calls per user per day".
  • Token budgets: track input + output tokens per user in a ai_usage table, cap based on subscription tier.
Never expose your AI_GATEWAY_API_KEY to the frontend. All AI calls go through your API. If you let the frontend talk to the gateway directly, the key's in JS source and anyone can run up your bill.

Function calling / tools

For agentic workflows β€” "Claude, look up this user's recent orders" β€” Grit's AI module supports tool definitions:

tools := []ai.Tool{{
Name: "get_user_orders",
Description: "Returns the user's last 10 orders",
Parameters: ...,
Handler: func(args json.RawMessage) (any, error) {
return ordersService.RecentForUser(userID, 10)
},
}}
out, err := h.ai.Chat(ctx, messages, ai.WithTools(tools))

The model decides whether to call the tool; Grit dispatches the handler and feeds the result back. Multi-turn tool use just works.

Quick check

A user complains that AI responses 'take 10 seconds'. They actually take 10 seconds total, but the user expects sub-second feedback. What's the fix?

Try it

Call the AI Gateway directly from your bench-api:

  1. Get a Vercel AI Gateway key (free tier works) at vercel.com/ai-gateway.
  2. Set it in your .env.
  3. Add a debug handler that calls h.ai.Chat with the prompt "Write me a one-line haiku about Go programming."
  4. Hit the handler and paste the result in notes.md.

What's next

Chapter 5 β€” Security + Observability. Sentinel WAF, Pulse dashboards, the tamper-evident audit log. The trio that keeps production alive.

Spot a typo? Have an idea?

Help us improve this lesson. One click opens a GitHub issue with the lesson URL pre-filled β€” suggest clearer wording, report a bug, or request more depth. The course keeps improving thanks to learners like you.

Suggest an improvement on GitHub