AI-Powered Features
Every Grit project comes with a built-in AI service powered by Vercel AI Gateway. In this course, you will learn how AI integration works — completions, streaming, multi-turn chat, model switching, and building a real chat UI — all without writing provider-specific code.
What is AI Integration?
AI integration means connecting your application to a Large Language Model (LLM) so your users can interact with AI directly from your app. Think chatbots, content generators, code assistants, summarizers — any feature where the app sends text to an AI and gets text back.
The naive way to add AI is to call each provider's API directly — but that means writing different code for Anthropic, different code for OpenAI, different code for Google. If you want to switch models, you rewrite your integration. Grit takes a better approach.
Why Grit uses Vercel AI Gateway instead of direct provider APIs:
- • One API key — a single key gives you access to hundreds of models from dozens of providers
- • No provider-specific code — the same request format works for Claude, GPT, Gemini, Llama, and more
- • Automatic fallbacks — if one provider is down, the gateway can route to another
- • Switch models instantly — change one environment variable, no code changes needed
Challenge: Name 3 AI Providers
Name 3 AI providers and one model from each. For example: Anthropic makes Claude, OpenAI makes GPT, and Google makes Gemini. Can you name any others? (Hint: Meta makes Llama, Mistral makes Mistral/Mixtral, Cohere makes Command.)
How Vercel AI Gateway Works
Vercel AI Gateway uses the OpenAI-compatible API format. This is the most widely adopted format in the AI industry — almost every tool and library supports it. The gateway accepts requests in this format and routes them to the correct provider behind the scenes.
messages array, a model field, and optional parameters like temperature and max_tokens. Because it's so widely adopted, most AI tools and gateways support this format regardless of the actual provider.Models use a provider/model format. For example, anthropic/claude-sonnet-4-6 means"use the Claude Sonnet model from Anthropic." The gateway reads the provider prefix and routes the request accordingly.
Here's the architecture:
Your App → AI Gateway → Anthropic (Claude)
→ OpenAI (GPT)
→ Google (Gemini)
→ Meta (Llama)
→ Mistral (Mixtral)
→ ...hundreds moreYour app only talks to the gateway. The gateway talks to providers. You never need to learn each provider's unique API — the gateway handles translation, authentication, rate limits, and retries for you.
baseURL to the gateway endpoint instead of OpenAI's endpoint.Challenge: Explore the AI Gateway
Visit vercel.com/ai-gateway and look at the model list. How many providers are available? Pick 3 models you'd like to try and write down their provider/model names.
Configuration
Grit's AI integration needs just 3 environment variables. Open your .env file and you'll find these:
AI_GATEWAY_API_KEY=your-key # Get from vercel.com/ai-gateway
AI_GATEWAY_MODEL=anthropic/claude-sonnet-4-6 # provider/model format
AI_GATEWAY_URL=https://ai-gateway.vercel.sh/v1Let's break down each one:
- •AI_GATEWAY_API_KEY — Your single API key from Vercel AI Gateway. One key unlocks all providers. You get this from
vercel.com/ai-gatewayafter creating an account. This key is billed through Vercel, so you don't need separate accounts with Anthropic, OpenAI, or Google. - •AI_GATEWAY_MODEL — The model to use, in
provider/model-nameformat. For example,anthropic/claude-sonnet-4-6routes to Anthropic's Claude Sonnet model. Change this one variable to switch to any other model — no code changes required. - •AI_GATEWAY_URL — The gateway endpoint URL. This is always
https://ai-gateway.vercel.sh/v1for Vercel's hosted gateway. The/v1suffix matches the OpenAI API version path.
AI_GATEWAY_API_KEY is empty, the AI endpoints return a 503 AI_UNAVAILABLE response. Your app works perfectly fine without AI — no crashes, no errors, just a graceful "not configured" response.Challenge: Find Your AI Config
Open your .env file in the project root. Find the 3 AI variables. They're empty by default — you need a Vercel AI Gateway key to use AI features. What is the default value of AI_GATEWAY_MODEL?
The AI Service
The AI service is a Go struct that handles all communication with the AI Gateway. It lives at apps/api/internal/ai/ai.go and exposes two methods: Complete (wait for the full response) and Stream (receive text chunks in real-time).
type AI struct {
apiKey string
model string
baseURL string
client *http.Client
}
func New(apiKey, model, baseURL string) *AI
func (a *AI) Complete(ctx context.Context, req CompletionRequest) (*CompletionResponse, error)
func (a *AI) Stream(ctx context.Context, req CompletionRequest, handler StreamHandler) errorExplained: The AI struct stores your API key, model name, gateway URL, and an HTTP client. The New function creates an instance from your environment variables. Then you have two ways to call the AI:
- •Complete — sends a request and waits for the entire response. Good for short tasks where you need the full answer before proceeding (e.g., generating a title, classifying text, extracting data).
- •Stream — sends a request and receives the response in chunks as it's generated. Good for long responses where you want the user to see text appearing in real-time (e.g., chat conversations, long-form content).
net/http package internally — no external AI SDKs are needed. Because the AI Gateway speaks the OpenAI-compatible format, a simple HTTP POST with the right headers is all you need.Challenge: Read the AI Service Code
Open apps/api/internal/ai/ai.go. Read the Complete method. What HTTP endpoint does it call? What headers does it set? (Hint: look for Authorization and Content-Type headers.)
The Complete Endpoint
The simplest AI endpoint is POST /api/ai/complete. You send a prompt, and you get back the AI's full response. Here's the request and response format:
{
"prompt": "Explain Go interfaces in 2 sentences"
}{
"data": {
"content": "An interface in Go defines a set of method signatures that a type must implement to satisfy the interface. Unlike other languages, Go interfaces are satisfied implicitly — there is no 'implements' keyword.",
"model": "anthropic/claude-sonnet-4-6",
"usage": {
"input_tokens": 12,
"output_tokens": 45
}
}
}Explained: The request contains a single prompt string. The response follows Grit's standard API format with a data object containing the AI's content (the answer), the model that was used, and usage information showing how many tokens were consumed.
input_tokens is the cost of your prompt, and output_tokens is the cost of the AI's response. Shorter prompts and responses cost less.The complete endpoint is ideal for one-shot tasks: generate a title, summarize an article, classify a support ticket, extract keywords from a document. Anything where you send one prompt and need one answer.
Challenge: Test the Complete Endpoint
If you have an AI Gateway API key, start your project with grit dev and open the Swagger docs at localhost:8080/swagger/index.html. Find the POST /api/ai/complete endpoint. Send the prompt "What is Go?" and examine the response. How many tokens did it use?
The Chat Endpoint
The complete endpoint handles single prompts. But what about conversations? The chat endpoint POST /api/ai/chat supports multi-turn conversations where the AI remembers the context of previous messages.
{
"messages": [
{ "role": "user", "content": "What is Go?" },
{ "role": "assistant", "content": "Go is a statically typed, compiled programming language designed at Google. It is known for its simplicity, concurrency support, and fast compilation." },
{ "role": "user", "content": "How does it handle concurrency?" }
]
}Explained: Instead of a single prompt, you send an array of messages. Each message has a role ("user" for the human, "assistant" for the AI) and content (the text). The AI reads the entire conversation history and generates a contextual response.
user or assistant) and content. The AI uses the full history to generate contextual responses. For example, when the user asks "How does it handle concurrency?" — the AI knows "it" refers to Go because of the previous messages.The frontend is responsible for storing the conversation history and sending the full array with each request. The AI itself is stateless — it doesn't remember previous requests. The conversation context comes entirely from the messages array you send.
Challenge: Write a Conversation
Write a 3-message conversation about the Grit framework. Include the roles and content for each message. For example: the user asks what Grit is, the assistant explains, then the user asks a follow-up question. What would the messages array look like?
Streaming with SSE
When you use the complete or chat endpoints, you wait for the entire response before seeing anything. For short answers that's fine, but for longer responses the user stares at a loading spinner for several seconds. Streaming solves this.
The streaming endpoint is POST /api/ai/stream. Instead of returning a JSON response, it opens an SSE connection and sends text chunks as they're generated by the AI. The frontend displays each chunk immediately — creating the "typing effect" you see in ChatGPT and other AI products.
Here's what the SSE event stream looks like:
event: message
data: "An interface"
event: message
data: " in Go"
event: message
data: " defines a set"
event: message
data: " of method signatures..."
event: done
data: [DONE]Explained: Each event: message contains a small chunk of text in the data field. The frontend reads these chunks one by one and appends them to the display. When the AI finishes, the server sends event: done to signal the end of the stream. The user sees text appearing word by word, just like watching someone type.
Complete for short, structured responses (titles, classifications, JSON extraction) and Stream for longer, conversational responses (explanations, content generation, chat).Challenge: Stream vs. Complete
Explain why streaming is better UX than waiting for the complete response. Then give 2 examples where you'd use Complete instead of Stream. (Hint: think about cases where you need the full response before you can do anything with it — like parsing JSON or using the result in a calculation.)
Switching Models
One of the biggest advantages of AI Gateway is how easy it is to switch models. Because every model uses the same request format, switching is just a one-line environment variable change:
# Claude (Anthropic) — excellent for reasoning and code
AI_GATEWAY_MODEL=anthropic/claude-sonnet-4-6
# GPT (OpenAI) — widely used, great general-purpose model
AI_GATEWAY_MODEL=openai/gpt-5.4
# Gemini (Google) — strong at multimodal tasks
AI_GATEWAY_MODEL=google/gemini-2.5-pro
# Open source (Meta) — free, runs on many providers
AI_GATEWAY_MODEL=meta/llama-4-scoutNo code changes needed. Change the AI_GATEWAY_MODEL value in your .env file, restart the API server, and every AI endpoint now uses the new model. Your Go service, your handlers, your frontend — nothing else changes.
This makes it easy to:
- • Test different models — try Claude, GPT, and Gemini for the same prompt and compare quality
- • Optimize costs — use a cheaper model for simple tasks and a premium model for complex ones
- • Stay current — when a new model launches, just update the env var
- • Avoid vendor lock-in — you're never tied to one provider
Challenge: Pick 3 Models
If you wanted to test 3 different models for the same prompt, what 3 AI_GATEWAY_MODEL values would you try? Write out the full provider/model strings. Why did you choose those 3? (Consider: quality, speed, cost, and what your app needs.)
Building a Chat UI
Now that you understand the backend endpoints, let's look at the frontend. Building a chat UI means managing a list of messages, sending them to the streaming endpoint, and displaying the AI's response in real-time as chunks arrive.
Here's a simplified React component that consumes the SSE stream:
const [messages, setMessages] = useState<Message[]>([])
const [input, setInput] = useState("")
async function sendMessage() {
// 1. Add the user's message to the list
const newMessages = [...messages, { role: "user", content: input }]
setMessages(newMessages)
setInput("")
// 2. Send all messages to the streaming endpoint
const response = await fetch("/api/ai/stream", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${token}`,
},
body: JSON.stringify({ messages: newMessages }),
})
// 3. Read SSE chunks and update UI in real-time
const reader = response.body.getReader()
const decoder = new TextDecoder()
let assistantMessage = ""
while (true) {
const { done, value } = await reader.read()
if (done) break
const chunk = decoder.decode(value)
assistantMessage += parseSSEChunk(chunk)
// Update the last message in the list with new content
setMessages([...newMessages, { role: "assistant", content: assistantMessage }])
}
}Explained step by step:
- 1. The user types a message. We add it to the
messagesarray withrole: "user". - 2. We send the full messages array to
/api/ai/stream. The AI needs the full history to understand context. - 3. We use the Fetch API's
ReadableStreamto read SSE chunks as they arrive. - 4. Each chunk is appended to
assistantMessage, and we update React state so the UI re-renders with each new word.
The result is a smooth typing effect: the user sends a message, and the AI's response appears word by word in real-time. This is the same pattern used by ChatGPT, Claude's web interface, and every modern AI chat product.
AbortController.Challenge: Trace the AI Handler
Look at the AI handler code in apps/api/internal/handler/ai_handler.go. What happens if AI_GATEWAY_API_KEY is empty? What HTTP status code and error code does the handler return? (Hint: it returns 503 with error code AI_UNAVAILABLE.)
Summary
You now understand how Grit integrates AI into your application. Let's review what you learned:
- AI Gateway — one API key, one format, hundreds of models from dozens of providers
- Configuration — 3 env vars: API key, model (provider/model format), and gateway URL
- AI Service — Go struct with Complete (full response) and Stream (real-time chunks) methods
- Complete endpoint — POST /api/ai/complete for one-shot prompts
- Chat endpoint — POST /api/ai/chat for multi-turn conversations with message history
- Streaming (SSE) — POST /api/ai/stream for real-time word-by-word responses
- Model switching — change one env var to switch between Claude, GPT, Gemini, Llama, and more
- Chat UI — React component using ReadableStream to display SSE chunks in real-time
Challenge: Final Challenge: Design an AI Feature
Design a product description generator for an e-commerce app. Here's the scenario:
- The user enters a product name and a list of features (e.g., "Wireless Headphones" with features "noise cancelling, 30hr battery, Bluetooth 5.3")
- The frontend calls the AI endpoint with a prompt that includes the product details
- The AI generates a marketing description
- The user can click "Regenerate" to get a new version
Write out your plan:
- Which endpoint would you call —
/api/ai/complete,/api/ai/chat, or/api/ai/stream? Why? - What would the
messagesarray (orprompt) look like? Write the actual JSON. - How would you display the response — wait for the full text or stream it word by word?
- What happens when the user clicks "Regenerate"? Do you send the same messages or different ones?
Enjoying the course?
Help us grow — star us on GitHub, subscribe on YouTube, and follow on LinkedIn.