Courses/Grit Web/AI-Powered Features
Course 7 of 8~30 min10 challenges

AI-Powered Features

Every Grit project comes with a built-in AI service powered by Vercel AI Gateway. In this course, you will learn how AI integration works — completions, streaming, multi-turn chat, model switching, and building a real chat UI — all without writing provider-specific code.


What is AI Integration?

AI integration means connecting your application to a Large Language Model (LLM) so your users can interact with AI directly from your app. Think chatbots, content generators, code assistants, summarizers — any feature where the app sends text to an AI and gets text back.

Large Language Model (LLM): A type of AI trained on massive amounts of text data. It can understand and generate human-like text. Examples include Claude (by Anthropic), GPT (by OpenAI), and Gemini (by Google). Your app sends a prompt to the LLM, and it returns a response.

The naive way to add AI is to call each provider's API directly — but that means writing different code for Anthropic, different code for OpenAI, different code for Google. If you want to switch models, you rewrite your integration. Grit takes a better approach.

AI Gateway: A proxy service that sits between your application and AI providers. You send one standardized request to the gateway, and it routes it to whichever provider and model you choose. One API key, one request format, hundreds of models. Grit uses Vercel AI Gateway.

Why Grit uses Vercel AI Gateway instead of direct provider APIs:

  • One API key — a single key gives you access to hundreds of models from dozens of providers
  • No provider-specific code — the same request format works for Claude, GPT, Gemini, Llama, and more
  • Automatic fallbacks — if one provider is down, the gateway can route to another
  • Switch models instantly — change one environment variable, no code changes needed
1

Challenge: Name 3 AI Providers

Name 3 AI providers and one model from each. For example: Anthropic makes Claude, OpenAI makes GPT, and Google makes Gemini. Can you name any others? (Hint: Meta makes Llama, Mistral makes Mistral/Mixtral, Cohere makes Command.)

How Vercel AI Gateway Works

Vercel AI Gateway uses the OpenAI-compatible API format. This is the most widely adopted format in the AI industry — almost every tool and library supports it. The gateway accepts requests in this format and routes them to the correct provider behind the scenes.

OpenAI-compatible API: A standardized request/response format originally created by OpenAI for their Chat Completions API. It uses JSON with a messages array, a model field, and optional parameters like temperature and max_tokens. Because it's so widely adopted, most AI tools and gateways support this format regardless of the actual provider.

Models use a provider/model format. For example, anthropic/claude-sonnet-4-6 means"use the Claude Sonnet model from Anthropic." The gateway reads the provider prefix and routes the request accordingly.

Here's the architecture:

Architecture
Your App  →  AI Gateway  →  Anthropic (Claude)
                          →  OpenAI (GPT)
                          →  Google (Gemini)
                          →  Meta (Llama)
                          →  Mistral (Mixtral)
                          →  ...hundreds more

Your app only talks to the gateway. The gateway talks to providers. You never need to learn each provider's unique API — the gateway handles translation, authentication, rate limits, and retries for you.

Because AI Gateway uses the OpenAI-compatible format, any library or tool built for OpenAI also works with AI Gateway. Just point the baseURL to the gateway endpoint instead of OpenAI's endpoint.
2

Challenge: Explore the AI Gateway

Visit vercel.com/ai-gateway and look at the model list. How many providers are available? Pick 3 models you'd like to try and write down their provider/model names.

Configuration

Grit's AI integration needs just 3 environment variables. Open your .env file and you'll find these:

.env
AI_GATEWAY_API_KEY=your-key        # Get from vercel.com/ai-gateway
AI_GATEWAY_MODEL=anthropic/claude-sonnet-4-6  # provider/model format
AI_GATEWAY_URL=https://ai-gateway.vercel.sh/v1

Let's break down each one:

  • AI_GATEWAY_API_KEY — Your single API key from Vercel AI Gateway. One key unlocks all providers. You get this from vercel.com/ai-gateway after creating an account. This key is billed through Vercel, so you don't need separate accounts with Anthropic, OpenAI, or Google.
  • AI_GATEWAY_MODEL — The model to use, in provider/model-name format. For example, anthropic/claude-sonnet-4-6 routes to Anthropic's Claude Sonnet model. Change this one variable to switch to any other model — no code changes required.
  • AI_GATEWAY_URL — The gateway endpoint URL. This is always https://ai-gateway.vercel.sh/v1 for Vercel's hosted gateway. The /v1 suffix matches the OpenAI API version path.
AI features are optional. If AI_GATEWAY_API_KEY is empty, the AI endpoints return a 503 AI_UNAVAILABLE response. Your app works perfectly fine without AI — no crashes, no errors, just a graceful "not configured" response.
3

Challenge: Find Your AI Config

Open your .env file in the project root. Find the 3 AI variables. They're empty by default — you need a Vercel AI Gateway key to use AI features. What is the default value of AI_GATEWAY_MODEL?

The AI Service

The AI service is a Go struct that handles all communication with the AI Gateway. It lives at apps/api/internal/ai/ai.go and exposes two methods: Complete (wait for the full response) and Stream (receive text chunks in real-time).

apps/api/internal/ai/ai.go
type AI struct {
    apiKey  string
    model   string
    baseURL string
    client  *http.Client
}

func New(apiKey, model, baseURL string) *AI
func (a *AI) Complete(ctx context.Context, req CompletionRequest) (*CompletionResponse, error)
func (a *AI) Stream(ctx context.Context, req CompletionRequest, handler StreamHandler) error

Explained: The AI struct stores your API key, model name, gateway URL, and an HTTP client. The New function creates an instance from your environment variables. Then you have two ways to call the AI:

  • Complete — sends a request and waits for the entire response. Good for short tasks where you need the full answer before proceeding (e.g., generating a title, classifying text, extracting data).
  • Stream — sends a request and receives the response in chunks as it's generated. Good for long responses where you want the user to see text appearing in real-time (e.g., chat conversations, long-form content).
The AI service uses Go's standard net/http package internally — no external AI SDKs are needed. Because the AI Gateway speaks the OpenAI-compatible format, a simple HTTP POST with the right headers is all you need.
4

Challenge: Read the AI Service Code

Open apps/api/internal/ai/ai.go. Read the Complete method. What HTTP endpoint does it call? What headers does it set? (Hint: look for Authorization and Content-Type headers.)

The Complete Endpoint

The simplest AI endpoint is POST /api/ai/complete. You send a prompt, and you get back the AI's full response. Here's the request and response format:

POST /api/ai/complete — Request
{
  "prompt": "Explain Go interfaces in 2 sentences"
}
POST /api/ai/complete — Response
{
  "data": {
    "content": "An interface in Go defines a set of method signatures that a type must implement to satisfy the interface. Unlike other languages, Go interfaces are satisfied implicitly — there is no 'implements' keyword.",
    "model": "anthropic/claude-sonnet-4-6",
    "usage": {
      "input_tokens": 12,
      "output_tokens": 45
    }
  }
}

Explained: The request contains a single prompt string. The response follows Grit's standard API format with a data object containing the AI's content (the answer), the model that was used, and usage information showing how many tokens were consumed.

Token: The basic unit AI models use to process text. A token is roughly 0.75 words — so 100 words is about 133 tokens. You pay per token: input_tokens is the cost of your prompt, and output_tokens is the cost of the AI's response. Shorter prompts and responses cost less.

The complete endpoint is ideal for one-shot tasks: generate a title, summarize an article, classify a support ticket, extract keywords from a document. Anything where you send one prompt and need one answer.

5

Challenge: Test the Complete Endpoint

If you have an AI Gateway API key, start your project with grit dev and open the Swagger docs at localhost:8080/swagger/index.html. Find the POST /api/ai/complete endpoint. Send the prompt "What is Go?" and examine the response. How many tokens did it use?

The Chat Endpoint

The complete endpoint handles single prompts. But what about conversations? The chat endpoint POST /api/ai/chat supports multi-turn conversations where the AI remembers the context of previous messages.

POST /api/ai/chat — Request
{
  "messages": [
    { "role": "user", "content": "What is Go?" },
    { "role": "assistant", "content": "Go is a statically typed, compiled programming language designed at Google. It is known for its simplicity, concurrency support, and fast compilation." },
    { "role": "user", "content": "How does it handle concurrency?" }
  ]
}

Explained: Instead of a single prompt, you send an array of messages. Each message has a role ("user" for the human, "assistant" for the AI) and content (the text). The AI reads the entire conversation history and generates a contextual response.

Multi-turn Conversation: A conversation with history. Each message has a role (user or assistant) and content. The AI uses the full history to generate contextual responses. For example, when the user asks "How does it handle concurrency?" — the AI knows "it" refers to Go because of the previous messages.

The frontend is responsible for storing the conversation history and sending the full array with each request. The AI itself is stateless — it doesn't remember previous requests. The conversation context comes entirely from the messages array you send.

Longer conversations use more tokens because you resend the entire history with each request. A 50-message conversation means the AI processes all 50 messages every time. For very long conversations, you may want to truncate or summarize older messages to reduce costs.
6

Challenge: Write a Conversation

Write a 3-message conversation about the Grit framework. Include the roles and content for each message. For example: the user asks what Grit is, the assistant explains, then the user asks a follow-up question. What would the messages array look like?

Streaming with SSE

When you use the complete or chat endpoints, you wait for the entire response before seeing anything. For short answers that's fine, but for longer responses the user stares at a loading spinner for several seconds. Streaming solves this.

SSE (Server-Sent Events): A web standard for sending a stream of events from server to client over a single HTTP connection. Unlike WebSockets (which are bidirectional), SSE is one-way: the server pushes data to the client. It's perfect for AI streaming because the server generates text and the client displays it in real-time.

The streaming endpoint is POST /api/ai/stream. Instead of returning a JSON response, it opens an SSE connection and sends text chunks as they're generated by the AI. The frontend displays each chunk immediately — creating the "typing effect" you see in ChatGPT and other AI products.

Here's what the SSE event stream looks like:

SSE Event Stream
event: message
data: "An interface"

event: message
data: " in Go"

event: message
data: " defines a set"

event: message
data: " of method signatures..."

event: done
data: [DONE]

Explained: Each event: message contains a small chunk of text in the data field. The frontend reads these chunks one by one and appends them to the display. When the AI finishes, the server sends event: done to signal the end of the stream. The user sees text appearing word by word, just like watching someone type.

Streaming is better UX for any response longer than a sentence or two. The user sees progress immediately instead of waiting. Use Complete for short, structured responses (titles, classifications, JSON extraction) and Stream for longer, conversational responses (explanations, content generation, chat).
7

Challenge: Stream vs. Complete

Explain why streaming is better UX than waiting for the complete response. Then give 2 examples where you'd use Complete instead of Stream. (Hint: think about cases where you need the full response before you can do anything with it — like parsing JSON or using the result in a calculation.)

Switching Models

One of the biggest advantages of AI Gateway is how easy it is to switch models. Because every model uses the same request format, switching is just a one-line environment variable change:

.env — Model Options
# Claude (Anthropic) — excellent for reasoning and code
AI_GATEWAY_MODEL=anthropic/claude-sonnet-4-6

# GPT (OpenAI) — widely used, great general-purpose model
AI_GATEWAY_MODEL=openai/gpt-5.4

# Gemini (Google) — strong at multimodal tasks
AI_GATEWAY_MODEL=google/gemini-2.5-pro

# Open source (Meta) — free, runs on many providers
AI_GATEWAY_MODEL=meta/llama-4-scout

No code changes needed. Change the AI_GATEWAY_MODEL value in your .env file, restart the API server, and every AI endpoint now uses the new model. Your Go service, your handlers, your frontend — nothing else changes.

This makes it easy to:

  • Test different models — try Claude, GPT, and Gemini for the same prompt and compare quality
  • Optimize costs — use a cheaper model for simple tasks and a premium model for complex ones
  • Stay current — when a new model launches, just update the env var
  • Avoid vendor lock-in — you're never tied to one provider
Different models have different strengths. Claude excels at reasoning and following complex instructions. GPT is a strong general-purpose choice. Gemini handles multimodal inputs well. Llama is open-source and often available at lower cost. Try multiple models for your use case and pick the one that performs best.
8

Challenge: Pick 3 Models

If you wanted to test 3 different models for the same prompt, what 3 AI_GATEWAY_MODEL values would you try? Write out the full provider/model strings. Why did you choose those 3? (Consider: quality, speed, cost, and what your app needs.)

Building a Chat UI

Now that you understand the backend endpoints, let's look at the frontend. Building a chat UI means managing a list of messages, sending them to the streaming endpoint, and displaying the AI's response in real-time as chunks arrive.

Here's a simplified React component that consumes the SSE stream:

Chat UI — Simplified
const [messages, setMessages] = useState<Message[]>([])
const [input, setInput] = useState("")

async function sendMessage() {
  // 1. Add the user's message to the list
  const newMessages = [...messages, { role: "user", content: input }]
  setMessages(newMessages)
  setInput("")

  // 2. Send all messages to the streaming endpoint
  const response = await fetch("/api/ai/stream", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${token}`,
    },
    body: JSON.stringify({ messages: newMessages }),
  })

  // 3. Read SSE chunks and update UI in real-time
  const reader = response.body.getReader()
  const decoder = new TextDecoder()
  let assistantMessage = ""

  while (true) {
    const { done, value } = await reader.read()
    if (done) break
    const chunk = decoder.decode(value)
    assistantMessage += parseSSEChunk(chunk)
    // Update the last message in the list with new content
    setMessages([...newMessages, { role: "assistant", content: assistantMessage }])
  }
}

Explained step by step:

  • 1. The user types a message. We add it to the messages array with role: "user".
  • 2. We send the full messages array to /api/ai/stream. The AI needs the full history to understand context.
  • 3. We use the Fetch API's ReadableStream to read SSE chunks as they arrive.
  • 4. Each chunk is appended to assistantMessage, and we update React state so the UI re-renders with each new word.

The result is a smooth typing effect: the user sends a message, and the AI's response appears word by word in real-time. This is the same pattern used by ChatGPT, Claude's web interface, and every modern AI chat product.

In production, you'd also handle error states (network failure, rate limits), show a loading indicator while waiting for the first chunk, and add a "Stop generating" button that aborts the stream using an AbortController.
9

Challenge: Trace the AI Handler

Look at the AI handler code in apps/api/internal/handler/ai_handler.go. What happens if AI_GATEWAY_API_KEY is empty? What HTTP status code and error code does the handler return? (Hint: it returns 503 with error code AI_UNAVAILABLE.)

Summary

You now understand how Grit integrates AI into your application. Let's review what you learned:

  • AI Gateway — one API key, one format, hundreds of models from dozens of providers
  • Configuration — 3 env vars: API key, model (provider/model format), and gateway URL
  • AI Service — Go struct with Complete (full response) and Stream (real-time chunks) methods
  • Complete endpoint — POST /api/ai/complete for one-shot prompts
  • Chat endpoint — POST /api/ai/chat for multi-turn conversations with message history
  • Streaming (SSE) — POST /api/ai/stream for real-time word-by-word responses
  • Model switching — change one env var to switch between Claude, GPT, Gemini, Llama, and more
  • Chat UI — React component using ReadableStream to display SSE chunks in real-time
10

Challenge: Final Challenge: Design an AI Feature

Design a product description generator for an e-commerce app. Here's the scenario:

  1. The user enters a product name and a list of features (e.g., "Wireless Headphones" with features "noise cancelling, 30hr battery, Bluetooth 5.3")
  2. The frontend calls the AI endpoint with a prompt that includes the product details
  3. The AI generates a marketing description
  4. The user can click "Regenerate" to get a new version

Write out your plan:

  • Which endpoint would you call — /api/ai/complete, /api/ai/chat, or /api/ai/stream? Why?
  • What would the messages array (or prompt) look like? Write the actual JSON.
  • How would you display the response — wait for the full text or stream it word by word?
  • What happens when the user clicks "Regenerate"? Do you send the same messages or different ones?