Most audio guide RFPs ask the wrong questions. They focus on technical specs and feature checklists (number of languages supported, file formats, device compatibility) and miss the things that actually determine whether the system works for your museum.
We've seen the results. Museums locked into five-year contracts with vendors whose product looked fine in the demo but fell apart at scale. Institutions that chose the cheapest option and spent more on workarounds than they saved on the contract. Collections stuck with guides nobody uses because the procurement process optimized for the wrong criteria.
What follows is a practical framework for writing an RFP that surfaces what matters: the questions that separate strong vendors from weak ones, and the red flags that tell you to walk away.
Start with what you actually need
Before writing a single requirement, get clear on your situation. This sounds obvious, but most RFPs skip it and jump straight to feature lists.
What problem are you solving? Replacing aging hardware? Adding multilingual support? Launching a guide for the first time? Improving adoption rates on an existing guide? Each of these leads to a different set of priorities. A museum replacing a ten-year-old hardware fleet cares about things a first-time buyer doesn't.
Who are your visitors? International tourists need multilingual support. School groups need age-appropriate content. Accessibility requirements shape the entire evaluation. Be specific about your visitor demographics. That's what vendors need to propose something useful rather than generic.
What does your content look like? Do you have existing scripts and recordings? Wall texts and catalog entries? Scholarly research that hasn't been adapted for visitors? The answer determines how much content creation work is involved and what kind of system can handle it.
Write these answers into the RFP. They're more useful to vendors than a twenty-page technical specification, and they'll produce more relevant proposals.
The questions that actually matter
These are the areas where vendor answers diverge the most, and where the differences have real consequences.
How does content get created and updated? This is the single most important question in the entire RFP. The gap between vendors here is enormous.
With traditional systems, content updates mean new scripts, new recording sessions, new translations. Changing a single stop can take weeks and cost thousands. Some vendors make this process deliberately painful because updates are a revenue stream for them.
AI-powered systems handle this differently, but there's a wide range. Some use AI to generate a first draft that humans then manually polish, which looks efficient until you realize you're still doing most of the work. Others generate content in real time from your source materials, which means updates propagate the moment you change the underlying data.
Ask vendors to walk you through the exact process for three scenarios: adding a new temporary exhibition, correcting a factual error in an existing stop, and updating the interpretive angle on a permanent collection piece. The specificity of their answer tells you everything.
Does the system preserve your curatorial voice, and how? Every vendor will say yes. Most are lying, or at least overstating.
Try this: ask them to demonstrate their system speaking about one of your objects in two different tonal registers. A scholarly tone for an adult evening event. A playful tone for a family day. If the system can only do one, or if both sound like Wikipedia, the "curatorial voice" claim is marketing.
The real architecture for voice preservation is multi-layered. You need control at the institutional level (overall persona and tone), the tour level (narrative arc and pacing), and the individual stop level (specific interpretive choices). If a vendor's idea of "voice control" is picking from five preset narrator styles, that's not it.
Some vendors in the AI space have a specific problem worth probing: they let museums use AI-generated content that looks good enough on first read but sounds generic at scale. The output passes a casual review, so museums approve it. But visitors can tell. The content has no personality, no point of view, no sense that a human with opinions about this collection shaped what they're hearing.
How does the system handle visitor questions beyond the script? This question only applies to AI-powered systems, but it's where the differences are sharpest.
Some systems can't handle off-script questions at all; the visitor gets the prepared narration and nothing else. Some handle questions by letting a general-purpose AI respond, which means the answer might be accurate, might be wrong, and definitely won't sound like your museum. The best systems ground every response in your data and stay in your voice even when answering questions nobody anticipated.
Ask specifically: if a visitor asks about something the museum hasn't provided information on, what happens? The answer should be clear constraints and honest boundaries, not "our AI knows everything."
What languages are supported, and at what quality? "We support 30 languages" is meaningless without quality context. Machine-translated narration in Mandarin that sounds like a phrase book is not the same as native-quality Mandarin speech with natural phrasing.
Ask for demos in languages you can evaluate, not just the big European ones. If your visitors include speakers of smaller languages, test those specifically. The difference between good and bad multilingual support is the difference between serving your international visitors and insulting them.
Also ask about the cost model for languages. Traditional systems charge per language because each one requires separate production. AI systems that generate in real time often include all languages at no extra cost. This changes the math on multilingual support entirely.
What analytics do you provide? Traditional audio guides give you device checkout numbers. Maybe completion rates. That's about it.
AI-powered guides that handle real-time conversation produce a fundamentally different kind of data. What questions do visitors ask at each stop? Which topics generate the most engagement? Where do people lose interest? What are visitors curious about that your current content doesn't address? Which languages are most used? Where do visitors diverge from the suggested path?
This data is valuable beyond the guide itself. It tells you what your visitors care about, which can inform exhibition planning, marketing, and content development. If a vendor's analytics offering is just a dashboard with page views and session counts, they're leaving the most useful data on the table.
What does accessibility look like? Screen reader support, high contrast modes, real-time transcripts for deaf and hard-of-hearing visitors, audio descriptions for blind and low-vision visitors. These aren't nice-to-haves. Many institutions require them, and even those that don't should want them.
Ask whether accessibility features are built in or bolted on. A system designed with accessibility from the start handles it differently than one that added a screen reader mode after a compliance complaint.
How does integration work with existing systems? Ticketing, CMS, website, membership databases. You probably have a technology stack already, and the audio guide needs to fit into it. Ask for specific examples of integrations with systems similar to yours, and ask how long they took.
Red flags in vendor responses
Some patterns in RFP responses should make you pause. Not necessarily walk away, but probe harder.
High upfront costs with long contracts. A vendor that wants a large payment before you've seen the system work with your content, locked into a multi-year contract, is shifting all the risk onto you. This model made more sense when audio guides required expensive hardware and production. It makes less sense now, and vendors that still insist on it are usually protecting themselves, not you.
No pilot option. Any vendor confident in their product will let you test it with a portion of your collection before committing. A pilot costs the vendor some time and resources, but it also demonstrates that their system actually works. Vendors that refuse pilots, or require you to sign a full contract before piloting, are telling you something about how they expect the evaluation to go.
Vague answers about content updates. If the process for updating a single stop isn't described in concrete, step-by-step terms, assume it's painful. "Our team works with you to update content" usually means "you submit a request and wait three weeks."
No analytics, or analytics limited to basic counts. If a vendor in 2026 can only tell you how many people used the guide and for how long, they're a decade behind on what's possible. Especially if they're offering an AI-powered system. The conversational data is sitting right there, and choosing not to surface it is a deliberate product gap.
Manual-only content updates. You should be able to update your own content through a CMS, not by emailing the vendor and waiting. This is non-negotiable for any museum that runs temporary exhibitions or makes regular interpretive changes.
Claims of AI with no explanation of guardrails. A vendor that proudly describes their AI capabilities but gets vague when you ask about hallucination prevention, content grounding, or off-topic responses is selling a feature without building the infrastructure to make it reliable.
What good responses look like
The strongest vendor responses share certain qualities, regardless of whether the system is traditional or AI-powered.
Specificity. When you ask about multilingual support, a good response names the languages, describes the quality assurance process, and offers to demo specific ones. A weak response says "we support many languages" and moves on.
Honesty about limitations. Every system has tradeoffs. A vendor that describes theirs ("our voice synthesis is strong in Romance languages but still improving in tonal languages") is more trustworthy than one that claims perfection across the board. You're going to discover the limitations eventually. Better to find out during evaluation than after signing.
Clear pricing with scenarios. A good response includes pricing for your actual situation: your collection size, your visitor volume, your language needs. It models what happens if usage doubles during a blockbuster show. It explains what's included and what costs extra. A weak response gives you a per-unit price and leaves you to figure out the total.
References you can actually call. Not just a list of logos. Specific contacts at specific institutions who used the system in conditions similar to yours. If a vendor has fifty museum clients but can't connect you with one that matches your size and type, ask why.
Building your evaluation framework
Resist the urge to score vendors on a twenty-column spreadsheet with equal weighting. Not all criteria matter equally for your museum.
Pick three to five dimensions that matter most for your specific situation and weight them heavily. For most museums in 2026, we'd suggest these carry the most weight:
- Content workflow — how easy is it to create, update, and manage content day-to-day?
- Visitor experience quality — does the guide produce an experience visitors actually enjoy and complete?
- Total cost of ownership — not just year one, but the realistic cost over three to five years including content updates, language additions, and staff time
- Analytics value — what do you learn about your visitors that you couldn't learn before?
Other criteria (integration, accessibility, scalability, vendor stability) matter, but they're more binary. Either a vendor meets your requirements or doesn't. The dimensions above are where the real differentiation shows up.
Have at least two people from different departments evaluate the proposals independently before comparing notes. A curator and an operations director will weight things differently, and both perspectives matter. The curator catches content quality issues the operations director misses. The operations director catches workflow problems the curator doesn't think about.
The pilot: your best evaluation tool
An RFP on paper only tells you so much. The pilot tells you the rest.
A good pilot covers a representative section of your collection. Enough stops to test the full experience, not so many that setup takes months. Two weeks of preparation and four weeks of visitor testing is a reasonable timeline for most AI-powered systems. Traditional systems may need longer for content production.
During the pilot, measure the things you care about: visitor completion rates, qualitative feedback from staff and visitors, time spent with the guide, questions asked, languages used. Compare these against your current guide if you have one, or against visitor behavior without a guide if you don't.
Pay attention to what happens behind the scenes. How responsive is the vendor when you need a content change? How quickly does their system reflect updates? Is their CMS something your team can actually use, or does every change require a support ticket?
The pilot is also when you evaluate the relationship. Technology vendors are partners for years. How they handle a pilot (communication speed, willingness to adapt, honesty about what's working and what isn't) predicts how they'll behave during a full deployment.
Any vendor worth working with should offer a pilot. The structure and cost vary, but the willingness shouldn't. A vendor that insists on a full contract before you've seen the system work with your content and your visitors is asking you to take a bet they aren't willing to take themselves.
A note on the AI question
Your RFP will probably need to address whether you want a traditional or AI-powered system. Rather than prescribing the answer, ask vendors from both categories to respond, and let the proposals speak for themselves.
In our experience, AI-powered systems outperform traditional ones on multilingual delivery, content flexibility, analytics depth, and total cost of ownership. Traditional systems still hold an edge in prestige single-language narrations: a celebrity voice reading a carefully crafted script is a specific product that AI doesn't replicate yet. That gap is shrinking, but it's real today.
The strongest approach is to evaluate both categories against the same criteria and see which delivers more value for your specific situation. The RFP framework above works for either. Good questions produce revealing answers regardless of the underlying technology.
If you're writing an RFP and want a second opinion on what to include, or want to see how an AI-powered system handles your specific collection, reach out. We'd rather you write a better RFP than a worse one, even if we're not the vendor you choose.