AI Gateway
Streaming + 100+ models.
Most apps that wrap AI need: streaming responses, multi-provider support, and one API key to manage. Grit's AI Gateway integration gives you all three via Vercel AI Gateway β Claude, GPT-4, Mistral, Llama, ~98 others.
The Gateway shape
Your API β AI Gateway (Vercel) β Claude / OpenAI / Anthropic / OpenRouter / β¦one API key you pay one bill, swap models without codestreaming SSE
Setup
AI_GATEWAY_API_KEY=vck_... # from vercel.com/ai-gatewayAI_GATEWAY_MODEL=anthropic/claude-sonnet-4-6 # provider/model formatAI_GATEWAY_URL=https://ai-gateway.vercel.sh/v1
One env var change to swap from Claude to GPT-4 β change AI_GATEWAY_MODEL, no code change.
Non-streaming call
func (h *AIHandler) Summarize(c *gin.Context) {var in struct {Text string `json:"text"`}c.ShouldBindJSON(&in)out, err := h.ai.Chat(c.Request.Context(), []ai.Message{{Role: "system", Content: "Summarize the input in 2 sentences."},{Role: "user", Content: in.Text},})if err != nil {respond.Error(c, 500, "AI_ERROR", err)return}respond.OK(c, gin.H{"summary": out.Content}, "")}
Streaming (SSE) β for chat UIs
Streaming makes the perceived latency tiny β the user sees tokens as they arrive instead of waiting for the full response.
func (h *AIHandler) Stream(c *gin.Context) {c.Header("Content-Type", "text/event-stream")c.Header("Cache-Control", "no-cache")c.Writer.Flush()err := h.ai.ChatStream(c.Request.Context(),[]ai.Message{{Role: "user", Content: prompt}},func(token string) error {fmt.Fprintf(c.Writer, "data: %s\n\n", token)c.Writer.Flush()return nil},)if err != nil {log.Printf("stream: %v", err)}}
Each token goes out as an SSE event. The frontend opens an EventSource and renders as tokens arrive β same shape ChatGPT and Claude.ai use.
Cost + rate-limiting per user
AI calls cost real money. Two patterns to control spend:
- Per-user rate limits in Sentinel (covered in next chapter): "max 100 AI calls per user per day".
- Token budgets: track input + output tokens per user in a
ai_usagetable, cap based on subscription tier.
Function calling / tools
For agentic workflows β "Claude, look up this user's recent orders" β Grit's AI module supports tool definitions:
tools := []ai.Tool{{Name: "get_user_orders",Description: "Returns the user's last 10 orders",Parameters: ...,Handler: func(args json.RawMessage) (any, error) {return ordersService.RecentForUser(userID, 10)},}}out, err := h.ai.Chat(ctx, messages, ai.WithTools(tools))
The model decides whether to call the tool; Grit dispatches the handler and feeds the result back. Multi-turn tool use just works.
Quick check
Try it
Call the AI Gateway directly from your bench-api:
- Get a Vercel AI Gateway key (free tier works) at
vercel.com/ai-gateway. - Set it in your
.env. - Add a debug handler that calls
h.ai.Chatwith the prompt "Write me a one-line haiku about Go programming." - Hit the handler and paste the result in
notes.md.
What's next
Chapter 5 β Security + Observability. Sentinel WAF, Pulse dashboards, the tamper-evident audit log. The trio that keeps production alive.
Spot a typo? Have an idea?
Help us improve this lesson. One click opens a GitHub issue with the lesson URL pre-filled β suggest clearer wording, report a bug, or request more depth. The course keeps improving thanks to learners like you.
Suggest an improvement on GitHub