Complete Guide: Building a Voice Travel Agent with AI

Step-by-Step Process from Setup to Deployment

Created by Dr. Erin Jacques

Instagram & TikTok: @dr.erinjacques

LinkedIn: linkedin.com/in/drerinjacques

Table of Contents

  1. What This Voice Agent Does
  2. What You Need Before Starting
  3. Getting Your API Key
  4. Understanding Your Three Files
  5. Creating Your Perfect System Prompt
  6. Deploying to Vercel
  7. Complete Troubleshooting Guide
  8. Testing Your Agent
  9. Customizing for ANY Topic

What This Voice Agent Does

This AI-Powered Voice Assistant:

  • Listens to users through their microphone
  • Converts speech to text using OpenAI's Whisper
  • Scrapes live information from your chosen websites
  • Generates smart responses using OpenAI's GPT-4 with that real-time data
  • Speaks the answer back using OpenAI's text-to-speech
  • All through a beautiful web interface users can access anywhere

The Flow

User speaks into microphone
Whisper API converts speech to text
System scrapes your chosen websites
GPT-4 API generates response with live data
TTS API converts response to speech
User hears the answer

What You Need Before Starting

Important: You only need ONE API key for this entire project!

Required Accounts

1. OpenAI Account (The Only API Key You Need)

Sign up: https://platform.openai.com/signup

Add payment method: Required even for pay-as-you-go

What you'll use it for:

  • Whisper (speech-to-text)
  • GPT-4 (AI brain that generates responses)
  • TTS (text-to-speech)

2. Vercel Account (Free Hosting)

Go to: https://vercel.com/signup

Free tier available. This hosts your voice agent.

3. GitHub Account (Code Storage)

Go to: https://github.com/signup

Free. This stores your files.

Getting Your API Key

OpenAI API Key Setup

Step 1: Go to https://platform.openai.com/api-keys

Step 2: Click the green "Create new secret key" button

Step 3: Give it a name like "Voice Travel Agent"

Step 4: Copy the key (starts with sk-proj-...)

Step 5: Save it somewhere safe - you can't see it again!

CRITICAL: Make sure you have a payment method added to your OpenAI account, or the API won't work!

To Add Payment Method:

Understanding Your Three Files

You have three code files that work together:

File 1: The Frontend (HTML/CSS/JavaScript)

What it does:

  • Shows the interface users see
  • Has a microphone button they hold to speak
  • Displays what they said and the AI's response
  • Plays the audio response back to them

You customize:

  • The title and header text
  • Colors and styling
  • Your branding and social media links

File 2: The Chat API (GPT-4 + Web Scraping)

What it does:

  • Receives the user's question (as text from Whisper)
  • Scrapes your chosen websites for fresh information
  • Sends everything to GPT-4 with your system prompt
  • Returns GPT-4's response

You customize:

  • The URLs to scrape (your knowledge sources)
  • The system prompt (GPT-4's personality and instructions)
  • The keyword mapping (which URLs to use for which questions)

Current URLs Being Scraped (Brooklyn Example):

File 3: The Other APIs (Whisper + TTS)

What they do:

  • One receives audio from the user and sends it to Whisper to convert to text
  • One receives GPT-4's text response and sends it to TTS to convert to audio

You customize:

  • The voice (choose from 6 different voices)
  • The speaking speed (slower for clarity, faster for energy)

Creating Your Perfect System Prompt

This is THE MOST IMPORTANT part - it's what makes your voice agent unique!

What is a System Prompt?

The system prompt tells GPT-4:

Your Current Brooklyn Guide Prompt

You are a knowledgeable and enthusiastic Brooklyn travel guide. 
Your goal is to help visitors discover the best of Brooklyn - from 
trendy neighborhoods and delicious dining spots to exciting events 
and hidden gems.

IMPORTANT GUIDELINES:
- Be conversational, friendly, and enthusiastic about Brooklyn
- Keep responses concise (2-3 sentences max) since they will be 
  converted to speech
- Provide specific recommendations when possible
- If you don't have current information, acknowledge it and provide 
  general guidance
- Focus on the most relevant information from the context provided

CURRENT BROOKLYN INFORMATION:
[This is where the scraped website content gets inserted automatically]

Remember: You're speaking to someone who wants quick, helpful, and 
engaging information about Brooklyn. Be their friendly local guide!

The Universal Template for ANY Topic

You are a [ADJECTIVE] [ROLE] for [TOPIC/LOCATION].

IMPORTANT GUIDELINES:
- TONE: Be [describe tone - conversational/professional/energetic/calming/etc.]
- LENGTH: Keep responses to [2-4] sentences since they will be converted to speech
- SPECIFICITY: [Provide specific names and recommendations / Give general overviews / 
  Focus on unique features]
- UNCERTAINTY: If you don't have current information, [acknowledge it honestly / 
  provide general insights / suggest alternatives]
- FOCUS: Prioritize [what matters most - price/quality/convenience/experience/etc.]

CURRENT [TOPIC] INFORMATION:
[Scraped content goes here automatically]

Remember: You're helping someone [their goal]. Be their [relationship to them]!

Real Examples for Different Industries

Example 1: Miami Real Estate Agent

You are a professional and trustworthy real estate advisor for Miami, Florida.

IMPORTANT GUIDELINES:
- TONE: Be professional yet approachable and informative
- LENGTH: Keep responses to 3-4 sentences for voice clarity
- SPECIFICITY: Always mention specific neighborhoods, price ranges, and property features
- UNCERTAINTY: If you lack current market data, provide general Miami market insights
- FOCUS: Prioritize helping buyers and sellers make informed financial decisions

CURRENT MIAMI REAL ESTATE INFORMATION:
[Scraped content]

Remember: You're guiding someone through one of the biggest financial decisions 
of their life. Be accurate, helpful, and trustworthy!

Example 2: Los Angeles Fitness & Wellness Guide

You are an energetic and motivating fitness coach and wellness expert for Los Angeles.

IMPORTANT GUIDELINES:
- TONE: Be upbeat, positive, and encouraging with high energy
- LENGTH: Keep responses to 2-3 sentences since they'll be spoken aloud
- SPECIFICITY: Recommend specific gyms, studios, trails, and wellness spots by name
- UNCERTAINTY: If you don't have current class schedules, suggest general fitness options
- FOCUS: Prioritize variety - yoga studios, hiking trails, boutique fitness, outdoor activities

CURRENT LA FITNESS & WELLNESS INFORMATION:
[Scraped content]

Remember: You're inspiring someone to prioritize their health. 
Be their motivational fitness buddy!

Example 3: Austin Food Truck Guide

You are a passionate and enthusiastic food expert specializing in Austin's food truck scene.

IMPORTANT GUIDELINES:
- TONE: Be excited, descriptive, and fun when talking about food
- LENGTH: Keep responses to 3-4 sentences for speech conversion
- SPECIFICITY: Always mention specific truck names, locations, and signature dishes
- UNCERTAINTY: If you don't have current menus, focus on the truck's style and specialties
- FOCUS: Prioritize unique flavors, must-try dishes, and trucks that capture Austin's culture

CURRENT AUSTIN FOOD TRUCK INFORMATION:
[Scraped content]

Remember: You're helping someone discover their next amazing meal on wheels. 
Make them hungry!

How to Test Your System Prompt

After you create your prompt, test it with these 4 types of questions:

  1. Specific Question: "What's the best pizza place in [location]?"
    Tests if it gives specific recommendations
  2. Broad Question: "What should I do today?"
    Tests if responses are concise enough for speech
  3. Unknown Information: "What are the hours for [obscure place]?"
    Tests how it handles uncertainty
  4. Comparison Question: "What's better, X or Y?"
    Tests if it provides helpful guidance

Deploying to Vercel

Step 1: Push Your Files to GitHub

  1. Create a new repository on GitHub
  2. Upload your three files
  3. Commit the changes

Step 2: Connect Vercel to GitHub

  1. Go to https://vercel.com/dashboard
  2. Click "Add New" → "Project"
  3. Click "Import Git Repository"
  4. Select your GitHub repository
  5. Vercel will auto-detect your settings

Step 3: Add Your API Key to Vercel

CRITICAL STEP - Don't Skip This!
  1. In your Vercel project, click "Settings"
  2. Click "Environment Variables" on the left
  3. Add your API key:
    • Name: OPENAI_API_KEY
    • Value: Paste your OpenAI key (starts with sk-proj-)
    • Environments: Check all three boxes (Production, Preview, Development)
    • Click "Save"

Step 4: Deploy

  1. Click "Deploy" in Vercel
  2. Wait 1-2 minutes for deployment
  3. Vercel gives you a URL like: your-project-name.vercel.app
  4. Click the URL to test your voice agent!

Step 5: Every Time You Update Your Files

  1. Push changes to GitHub
  2. Vercel automatically redeploys
  3. Your live site updates in 1-2 minutes

That's it - no manual redeployment needed!

Complete Troubleshooting Guide

Problem 1: Microphone Doesn't Work

Symptom: Button doesn't respond or browser doesn't ask for permission

Fix:

  1. Make sure you're using HTTPS (Vercel provides this automatically)
  2. Check your browser settings:
    • Chrome: Visit chrome://settings/content/microphone and allow access
    • Safari: Go to Safari → Settings → Websites → Microphone
    • Firefox: Check about:permissions
  3. Try a different browser
  4. Check if another app is using your microphone
  5. Restart your browser

Problem 2: "401 Unauthorized" Error

Symptom: Nothing happens after speaking, or you see error messages

Fix:

  1. Check your API key is correctly entered in Vercel:
    • Go to Vercel → Your Project → Settings → Environment Variables
    • Verify OPENAI_API_KEY is correct
  2. Make sure your OpenAI account has a payment method:
  3. Regenerate your key if needed:
    • Create a new key on OpenAI platform
    • Update it in Vercel
    • Redeploy

Problem 3: "429 Too Many Requests"

Symptom: Works a few times, then stops working

Fix:

  1. You're hitting rate limits - wait 60 seconds and try again
  2. Check your usage tier at https://platform.openai.com/account/limits
  3. If testing a lot, space out your requests by 10-15 seconds
  4. Consider upgrading your OpenAI account tier for higher limits

Problem 4: No Audio Plays Back

Symptom: You see the text response but don't hear anything

Fix:

  1. Check your device volume
  2. Look for a "Click to play" button (some browsers block autoplay)
  3. Try a different browser (Safari sometimes has issues)
  4. Check browser console for errors (F12 key → Console tab)
  5. Make sure your OpenAI API key is working

Problem 5: Empty or Generic Responses

Symptom: GPT-4 gives generic answers or says it doesn't know anything

Fix:

  1. Check if your websites are being scraped correctly
  2. Your websites might be blocking scrapers - try different URLs
  3. Reduce the number of URLs you're scraping (max 3 at a time)
  4. Make sure the websites you're scraping have actual text content (not just images)
  5. Add fallback text in your code for when scraping fails

Problem 6: "Function Timeout" Error

Symptom: Request takes too long and fails

Fix:

  1. Reduce the number of websites you're scraping (stick to 2-3)
  2. Choose faster-loading websites
  3. Your Vercel free plan has a 10-second limit
  4. Upgrade to Vercel Pro for 60-second timeout
  5. Optimize your scraping to only grab essential text

Problem 7: Website Scraping Fails

Symptom: Agent works but gives generic answers without recent info

Fix:

  1. Check if the websites allow scraping:
    • Visit https://yourwebsite.com/robots.txt
    • Look for "Disallow" rules
  2. Some websites block automated access:
    • Try alternative websites on the same topic
    • Use official APIs if available
  3. Test individual URLs:
    • Remove all but one URL
    • See if that one works
    • Add URLs back one at a time

Problem 8: Deployment Fails

Symptom: Vercel shows "Build Failed" error

Fix:

  1. Check the error logs in Vercel:
    • Go to your project → Deployments
    • Click the failed deployment
    • Read the error message
  2. Common issues:
    • Missing files - make sure all three files are uploaded
    • Wrong file names - check spelling exactly
    • Node version - Vercel uses Node 18+ by default
  3. Redeploy:
    • Make a small change in GitHub (add a space, etc.)
    • Push the change
    • Vercel will try again

Problem 9: Environment Variables Not Loading

Symptom: Key doesn't work even though it's correct

Fix:

  1. Verify variable name is EXACT: OPENAI_API_KEY not openai_api_key or OPENAI_KEY
  2. Check all environments are selected:
    • Production ✓
    • Preview ✓
    • Development ✓
  3. After adding/changing variables, you MUST redeploy:
    • Go to Deployments
    • Click "..." menu on latest deployment
    • Click "Redeploy"

Problem 10: Audio Quality Issues

Symptom: Voice sounds robotic or unclear

Fix:

  1. Change the voice in your TTS settings:
    • Options: alloy, echo, fable, onyx, nova, shimmer
    • Try nova for friendly, onyx for professional
  2. Adjust speaking speed:
    • Default is 1.0
    • Try 0.9 for clearer speech
    • Try 1.1 for more energy
  3. Keep GPT-4's responses short (2-3 sentences):
    • Long responses are harder to listen to
    • Edit your system prompt to enforce brevity

Testing Your Agent

Pre-Launch Checklist

Test on Desktop:
  1. Open your Vercel URL
  2. Click the microphone button
  3. Say: "What are the best restaurants?"
  4. Check if you get a response with audio
Test on Mobile:
  1. Open URL on your phone
  2. Test the microphone (may need to tap "Allow")
  3. Make sure audio plays back
Test Different Questions:
  1. Ask about events
  2. Ask about specific neighborhoods
  3. Ask for recommendations
  4. Ask something obscure (test how it handles uncertainty)
Test Error Handling:
  1. Try clicking the button without speaking
  2. Try speaking too quietly
  3. Try asking in a noisy environment

Questions to Test With

For Brooklyn Guide:

  • "What are fun things to do today?"
  • "Best pizza in Brooklyn?"
  • "Tell me about Williamsburg"
  • "Any events this weekend?"
  • "Where should I go shopping?"

For Your Custom Agent:

Create 5-10 test questions that real users would ask.

Customizing for ANY Topic

Step-by-Step Customization Process

Step 1: Choose Your Topic

  • Location-based: [City] restaurants, [City] real estate, [City] attractions
  • Interest-based: Fitness, music venues, coffee shops, hiking trails
  • Service-based: Home services, event planning, travel tips

Step 2: Find 5-10 Websites with Good Information

What makes a good website:
  • Has current, updated information
  • Lots of text content (not just images)
  • Loads quickly
  • Covers your topic well
  • Allows scraping (check robots.txt)
Where to find them:
  • Official tourism sites
  • Local government sites
  • Review sites (Yelp, TripAdvisor)
  • Event calendars
  • Local blogs and magazines

Step 3: Map Keywords to URLs

Think about what people will ask, then decide which websites have those answers.

Example for Food Guide:

  • User says "restaurant" or "dining" → Use Yelp and local food blogs
  • User says "events" or "festivals" → Use event calendar sites
  • User says "cheap eats" → Use budget dining guides

Step 4: Write Your System Prompt

Use the template from earlier and fill in:

  1. The role (guide, advisor, expert, coach)
  2. The tone (friendly, professional, energetic)
  3. Response length (2-3 sentences usually best)
  4. What to focus on (price, experience, quality, convenience)
  5. How to handle unknowns

Step 5: Choose Your Voice

Voice Personalities:

Voice Personality Best For
nova Friendly and energetic Travel, lifestyle
alloy Neutral and balanced Good for anything
echo Clear and direct Instructions
fable Warm and expressive Stories
onyx Deep and authoritative Professional topics
shimmer Soft and calming Wellness

Step 6: Update Your Frontend Text

Change these in your HTML:

Step 7: Test Everything

Run through the testing checklist above!

Quick Customization Worksheet

My Voice Agent Topic:

_______________________________________

My Target Audience:

_______________________________________

5-10 Websites I'll Scrape:
  1. _______________________________________
  2. _______________________________________
  3. _______________________________________
  4. _______________________________________
  5. _______________________________________
Keywords People Will Use:
  • _____________ → URLs: _____________
  • _____________ → URLs: _____________
  • _____________ → URLs: _____________
My Agent's Personality (3 words):

_____________, _____________, _____________

Response Length:

_____ sentences

TTS Voice Choice:

_______________________________________

Main Focus/Priority:

_______________________________________

Real Customization Examples

Example 1: San Francisco Coffee Shop Guide

Topic: SF Coffee Shops

Audience: Coffee lovers and remote workers

Websites:

  • https://www.sfgate.com/food/article/best-coffee-shops-san-francisco
  • https://sf.eater.com/maps/best-coffee-shops-san-francisco
  • https://www.timeout.com/san-francisco/restaurants/best-coffee-shops-in-san-francisco

Voice: nova (friendly and energetic)

Example 2: Denver Hiking Guide

Topic: Denver Area Hiking Trails

Audience: Outdoor enthusiasts and tourists

Websites:

  • https://www.alltrails.com/us/colorado/denver
  • https://www.denver.org/things-to-do/sports-recreation/hiking/
  • https://www.uncovercolorado.com/best-hikes-near-denver/

Voice: fable (warm and expressive)

What to Do Next

Launch Checklist

Before Public Launch:
  1. Test on 3+ different devices
  2. Test with 10+ different questions
  3. Ask friends to test it
  4. Check your API key is working
  5. Verify environment variables in Vercel
  6. Make sure your branding is correct
After Launch:
  1. Monitor your API usage (check costs)
  2. Read the Vercel logs for errors
  3. Get user feedback
  4. Iterate on your system prompt based on responses
Share Your Work:
  1. Post on Instagram/TikTok: @dr.erinjacques
  2. Share your Vercel URL
  3. Tag Dr. Erin Jacques!

Monitoring Costs

OpenAI Usage:

  • Check usage: https://platform.openai.com/usage
  • Whisper: ~$0.006 per minute of audio
  • GPT-4: ~$0.01 per 1000 input tokens, ~$0.03 per 1000 output tokens
  • TTS: ~$0.015 per 1000 characters

Estimated Cost Per Interaction:

  • ~$0.02-0.04 per user interaction
  • 100 users per day = ~$2-4/day

Tips to Reduce Costs:

  1. Keep responses short
  2. Limit scraping to 2-3 URLs
  3. Use GPT-3.5 instead of GPT-4 for cheaper option
  4. Cache scraped data for 1 hour

Getting Help

Official Documentation:

Common Issues:

  • Re-read the troubleshooting section above
  • Check Vercel deployment logs
  • Verify API key is correct
  • Test each component separately

Connect with Dr. Erin Jacques:

Final Tips for Success

  1. Start Simple: Get the basic version working before adding complex features
  2. Test Constantly: After every change, test your agent
  3. Keep Responses Short: 2-3 sentences work best for voice
  4. Pick the Right Voice: The voice makes a huge difference in user experience
  5. Monitor Your Costs: Check your API usage weekly
  6. Update Your Content: Change your scraped URLs if information gets stale
  7. Listen to Users: Their questions will help you improve your system prompt
  8. Be Patient: Getting the system prompt perfect takes iteration
  9. Share Your Work: Tag @dr.erinjacques when you launch!
  10. Have Fun: This is YOUR voice agent - make it unique!