A mid-sized city museum we spoke with last quarter got three quotes from AI audio guide vendors. The numbers were 4,800 per year, 22,000 per year, and "let's talk" (which turned into 48,000 once the contract draft arrived). Same museum. Same 60 stops. Same three languages.
The director asked us what was going on. The short answer: each vendor was pricing a different product shape, even though they were all selling "AI audio guides." One was a per-interaction model that billed on actual usage. One was a flat annual SaaS with a tier based on visitor count. One was a legacy audio-guide company that bolted AI onto the top of their old content-production workflow and kept charging like they were still recording voice actors in a studio.
So before anyone can tell you what an AI audio guide costs, you have to know which of those three things you're buying.
The three pricing shapes
AI audio guide pricing collapses to three structures. Names vary by vendor. The mechanics don't.
Per-interaction (usage-based). You pay a small amount each time a visitor starts a session or triggers AI generation. Typical range: 0.08-0.45 per session. No upfront fee. No minimum. If nobody uses the guide one month, you pay nothing that month. If adoption spikes during a summer blockbuster, your bill goes up, but so does your visitor experience.
Revenue share on paid guides. Visitors pay a small ticket (typically 2-6 euros) through the vendor's platform. The vendor takes a cut — usually 20-40% — and the museum keeps the rest. Zero cost to the museum if nobody buys. The vendor earns only when visitors find the guide valuable enough to pay for it.
Flat monthly SaaS. A predictable subscription. Small museums start around 200-500/month. Mid-sized sites land at 800-1,800/month. Larger institutions hit 2,500-6,000/month at the enterprise tier, sometimes with extra fees for languages or analytics. The bill is the same whether fifty visitors use the guide or fifty thousand.
Each of these can be "AI-powered" on the product page. The economics underneath are completely different.
What the numbers actually look like
Here's what we see most often, broken down by museum size. These are real ranges from vendor quotes and from what Musa customers in each bucket pay today.
Small museum (under 50,000 annual visitors). Per-interaction: 600-3,500/year depending on adoption. Revenue share: 0-2,000/year to the museum after the split, often net positive. Flat SaaS: 2,400-6,000/year, plus whatever you spent on content production if the vendor required it. At this size, the flat SaaS often costs more than you'd ever recoup in value. Usage-based almost always wins.
Mid-sized museum (50,000-300,000 annual visitors). Per-interaction: 3,000-18,000/year. Revenue share: wildly variable — museums charging 3 euros with 15% adoption and a 30% vendor cut bring in 18,000-40,000 net from a visitor pool of 150,000. Flat SaaS: 10,000-22,000/year. This is the band where vendors fight hardest, and it's where the pricing model matters most. A mid-sized museum that misjudges adoption on a flat plan can overpay by 2-3x.
Large museum (300,000+ annual visitors). Per-interaction: 15,000-80,000/year. Revenue share: can run into six figures net to the museum at major sites. Flat SaaS: 25,000-90,000/year. At this scale the flat plan starts to make mathematical sense if your adoption is reliable, because your per-session cost drops below what per-interaction would bill. But you're trading cost predictability for losing the alignment between what you pay and what visitors actually use.
The pattern is consistent. At low volumes, flat-fee pricing punishes you. At high volumes with proven adoption, it starts to pay off. In the middle (which is most museums) usage-based pricing keeps the risk on the vendor's side of the table, which is where it belongs when nobody knows yet whether the guide will land.
When each model actually fits
Flat SaaS is the right choice in a narrow set of cases. You have predictable, high volume. You've run a guide before and know your adoption rate within a few percentage points. Your finance team won't approve a variable line item, full stop. If all three apply, a flat plan is fine. Otherwise it's a bet the vendor has designed to win.
Per-interaction pricing fits everyone else. It scales with your reality. A slow February costs you less. A busy summer costs more, but it also means the guide is doing its job. If a new exhibition flops with visitors, you don't keep paying for a guide nobody opened. The cost curve tracks the value curve.
Revenue share is the right call when you want the vendor to have actual skin in the game. If they earn only when visitors buy the guide, they will help you with onboarding, signage, staff scripts, and anything else that drives adoption. A vendor on a flat plan has already been paid. A vendor on revenue share is still selling, every day, alongside you. That's a different working relationship.
We lean hard toward usage-based or revenue-share for most customers for exactly this reason. The incentive alignment is worth more than the paper savings of a flat deal that looked cheaper on the spreadsheet.
The hidden costs vendors don't lead with
The headline number on the quote is rarely the number you pay. Here's what gets buried.
Setup and onboarding fees. Some vendors charge 5,000-25,000 to "set up" the guide. When you ask what that covers, it's often content ingestion (your catalog is uploaded into their system) and a kickoff workshop. With an AI-native platform, this work is automated or takes an hour. The fee is a holdover from when human producers wrote scripts for each stop. Push back on it. If the vendor can't itemize the hours, it's padding.
Content migration. If you have an existing audio guide — old recordings, transcripts, wall text in a CMS — someone has to move that content into the new system. Some vendors do this free. Some charge by the stop (30-150 per stop is common). For a 60-stop museum, that's up to 9,000 sitting in the "services" line of the contract you haven't read yet.
Language fees. This one still surprises people. A vendor quotes you 1,200/month, then mentions that price includes three languages. Each additional language is 200-400/month extra. For a museum that wants ten languages for international visitors, you've doubled your bill before anyone pressed play. AI-generated translation doesn't need separate production runs per language, so this fee is legacy pricing logic applied to a new product.
Integrations and SSO. Ticketing integration, CRM sync, analytics export, SSO for staff accounts. Each one shows up as a one-time fee (1,500-8,000) or a monthly add-on. If you need the guide to talk to your ticketing system for admission bundling, budget for it explicitly before signing.
Device or kiosk hardware. AI audio guides are BYOD almost by definition, but some vendors still push loaner tablets or charging stations for accessibility compliance. If that's a hard requirement, fine — but it should be optional and clearly priced. We've seen 12,000 hardware lines in contracts for museums that never asked for hardware.
Minimum commitments and auto-renewal. Read the term carefully. A flat SaaS plan with a 36-month minimum and auto-renewal isn't a subscription. It's a capital expenditure disguised as an operating one. If the guide underperforms, you're locked in anyway. Usage-based contracts rarely have this problem because the vendor's incentive is to keep you using, not to trap you.
Why the zero-capex case keeps winning
Strip away the pricing-page poetry and there's a structural argument underneath all of this.
The old audio guide business required a huge upfront investment: scripts, studios, voice actors, devices. Vendors charged upfront because they spent upfront. That logic is gone. AI generation happens at runtime, per visitor, at compute cost that's measured in fractions of a cent. The marginal cost of one more session is close to zero. The marginal cost of one more language is close to zero. The marginal cost of updating the whole tour because you rehung a gallery is close to zero.
When the underlying costs look like that, pricing should too. Usage-based or revenue-share reflects the actual shape of what's being delivered. A flat fee plus setup fee plus per-language fee is a pricing structure designed for a product that no longer exists.
The museums that benefit most from AI audio guides are the ones that couldn't afford the old model: small sites, regional collections, heritage locations, community museums. These are the institutions that most need the guide to cost zero until visitors actually use it. That's not a nice-to-have. It's the difference between launching and not launching at all.
We've watched boards approve 800/year usage-based deals that would never have cleared approval as 25,000 capital projects. The spend is the same order of magnitude over five years. The political and financial reality of getting it started is completely different.
What to ask every vendor before signing
When you're comparing AI audio guide quotes, don't compare the headline number. Compare the shape.
- What's the monthly cost at zero usage? If it's not zero, you're on a flat plan regardless of what the vendor calls it.
- What's the cost per additional language? If there is one, walk.
- What's the setup fee, and what specifically does it cover in hours?
- What's the minimum contract length? What happens if we cancel at month three?
- If we add a temporary exhibition with 15 new stops, what does that cost?
- What's the cost curve if our adoption doubles? Triples? Halves?
- Are analytics, accessibility features, and staff accounts included, or priced separately?
The answers will sort the vendors faster than any feature comparison. A vendor who gives straight, itemized answers is one you can build a long-term relationship with. A vendor who deflects into "let's schedule a call to discuss your needs" is one whose pricing model probably can't survive direct scrutiny.
Where we land
If you're running procurement for an AI audio guide right now, our honest advice is to start with a usage-based or revenue-share model unless you have a specific reason not to. Put the vendor on the hook for adoption. Keep your downside small. Measure what happens for three to six months. If the guide lands, you can consider moving to a flat plan for cost predictability once your usage is known. If it doesn't land, you walk away having spent a few thousand, not a few hundred thousand.
This is how we price Musa — per-interaction or revenue share, no setup fee, no per-language fee, no minimum term — because it's the model that actually reflects what AI-generated content costs to deliver. Other vendors price this way too. Our argument isn't "pick us." It's "pick the shape that aligns your vendor's incentives with your visitors' experience," and most of the time that rules out the flat-SaaS-with-hidden-fees quote before you even get to feature comparison.
For broader context on how audio guide pricing works across all formats, see our piece on audio guide pricing models. If you want the full multi-year cost picture including hardware and BYOD alternatives, the total cost of ownership breakdown has the five-year math. And if you're specifically weighing revenue-share deals, we wrote about how to structure them in audio guide revenue share models.
The vendor quote on your desk right now probably isn't wrong. It's just one of several valid shapes. Make sure you're choosing the shape on purpose.