"Supports 40+ languages" is the line every AI audio guide vendor uses. What that claim doesn't tell you is that those 40 languages are not all the same. Some sound like a native speaker. Some sound like a native speaker who's reading from a bad translation. And a few sound like exactly what they are: an AI that saw this language maybe ten thousand times in training and is doing its best.
If you're evaluating AI audio guides for a museum, the number of supported languages is a vanity metric. The real question is: how do your visitors' actual languages sound, and what do you do about the gaps?
The tier system nobody publishes
Every AI model trained on language data produces output that tracks with that data's availability. This isn't a controversial claim. It's just how these systems work. More text in the training corpus means the model has seen more examples of how native speakers actually write and speak. Less text means more interpolation, more averaging, more tells.
Here's how we think about the tiers after running Musa across 40+ languages at dozens of institutions:
Tier 1 — near-flawless. English, Spanish, French, German, Italian, Japanese, Mandarin (Simplified Chinese), and Brazilian Portuguese. These languages have effectively unlimited training data. The AI produces output that passes for a native writer in every register we've tested — conversational, formal, academic. Regional accents work. Idioms work. Long-form narration holds together.
Tier 2 — very good. European Portuguese, Russian, Korean, Polish, Dutch, Arabic (Modern Standard), Hebrew, Turkish, Swedish, Czech. Strong output. Occasional stiffness on rare vocabulary. Regional variants are sometimes approximate — Peninsular Spanish vs Mexican Spanish is clean, but Andalusian vs Madrileño is more a gesture than a real distinction. A native speaker can spot something is slightly off maybe one sentence in twenty. A visitor won't notice.
Tier 3 — functional but needs a curator. Catalan, Basque, Welsh, Finnish, Thai, Vietnamese, Ukrainian, Norwegian, Danish, Greek, Romanian, Hungarian. Good enough that visitors get real value. Not so good that you want to ship it without a native speaker reviewing your top stops. The failure modes here are specific: a technical art term might come out wrong, a cultural reference might translate literally instead of being localized, a formal register might slip into casual mid-paragraph.
Tier 4 — custom work required. Endangered or micro-corpus languages. Quechua, Navajo, many African languages, most indigenous languages of the Americas and Oceania. Some AI systems will generate output in these — it might be grammatically passable, but voice synthesis is often missing, and the prose itself is unreliable. If you're a heritage site whose mission includes one of these languages, you're in custom-voice-work territory. Don't expect it to come in the box.
The vendor that says "we support 50 languages, all at native quality" is either lying or hasn't listened carefully. The honest pitch is: here are the 10 we can ship with no review, here are the 20 that will work with an hour of curator attention, and here's how we handle the rest.
The tiering isn't fixed, either. Tier 2 today is where tier 1 was three years ago. Every model release nudges the frontier. Finnish was tier 3 in 2023 and is arguably tier 2 now. That's worth knowing when you're negotiating a multi-year contract — you're not just buying today's quality, you're buying whatever the platform's models become.
Why Portuguese is not Portuguese
Regional variants are where most AI audio guide platforms quietly fall short. "Portuguese" on a dropdown usually means one thing, not two. Which one you get matters more than you'd think.
Brazilian Portuguese and European Portuguese are mutually intelligible but feel genuinely different. Brazilian is warmer, more direct, uses different pronouns, runs at a different pace. European is more formal, clips its vowels, structures sentences differently. A Brazilian visitor hearing European Portuguese narration doesn't get a "slightly off" experience. They get a "this was clearly not made for me" experience.
Same problem with Spanish. Castilian Spanish (Madrid, formal register, vosotros for plural you) lands very differently from Mexican Spanish (Mexico City, slightly softer, ustedes) which lands differently again from Rioplatense (Buenos Aires, vos, distinctive melody). A Guatemalan museum running generic "Latin American Spanish" is serving its local visitors a weaker experience than it needs to.
Mandarin splits into Mainland Simplified and Taiwan Traditional, not just in script but in vocabulary and phrasing. French splits into Hexagonal, Quebecois, and Belgian, each with its own rhythm. Arabic is arguably the worst case — Modern Standard Arabic is what every platform ships, but it's nobody's conversational language. A Cairo museum might prefer Egyptian Arabic. A Dubai one might prefer Gulf Arabic.
When you demo a platform, don't just ask "do you support Spanish?" Ask "can you give me a sample in Mexican Spanish, then in Castilian, then in Rioplatense?" If the three samples sound identical, the platform is treating Spanish as one language. That's a red flag.
The cultural translation problem
Language isn't just words. A painting described as "haunting" in English loses something when translated literally. The right word in German might be eindringlich or bedrückend or verstörend depending on what the curator meant, and a word-for-word translation will pick one of these more or less at random.
Museum content is full of these landmines. Religious descriptions (what's respectful in one tradition is patronizing in another). Colonial history (the English-language framing often doesn't land in languages spoken by formerly colonized populations). Humor. Pacing. Formality. Silence.
Japanese museum narration leans formal. It uses desu/masu endings and a slower cadence. An AI that cheerfully translates a casual English tour into Japanese without adjusting register gives visitors something that feels weirdly flippant. Brazilian Portuguese does the opposite — stiff, formal narration that worked in English feels cold and corporate in Brazil, where a warmer tone fits museum contexts better.
The point: the translation can be technically correct and culturally wrong. This is why real localization is different from translation. A guide that respects language respects the cultural register behind it, not just the words.
Which languages actually matter
For most museums, the list of languages that matter is shorter than the list available, and the right subset depends entirely on where you are.
A Paris museum should prioritize French, English, Spanish, German, Italian, Japanese, Mandarin, Russian, Portuguese, Dutch. That's ten languages covering something like 85% of international visitors. A second tier of Korean, Arabic, Polish, Hebrew, Turkish covers another 10%.
A Tokyo museum looks different: Japanese, English, Mandarin (Simplified), Korean, Mandarin (Traditional), Thai, Vietnamese, Spanish, French, German. The long tail skews toward Southeast Asia and the regional languages that drive Japanese tourism.
A small museum in Bilbao has a different problem. Basque is non-negotiable for institutional reasons. Spanish, French, English cover the dominant share. Then you're looking at German and Italian for European tourists, and the long tail looks like Catalan, Portuguese, Dutch. The Basque question is the hard one: AI Basque is tier 3 quality. It'll work, but you want a native speaker reviewing the top 30 stops before you ship.
Pull your own visitor data before you decide. Ticket nationality, website geo, post-visit surveys. The ranked list you'll produce tells you where to focus curator review hours. It also tells you which regional variants matter — if 40% of your Spanish-speaking visitors are from Mexico, you want Mexican Spanish, not generic. We've written separately about what international tourists actually want from an audio guide, which is a useful companion read when you're sorting priorities.
For more on how to approach this as a strategic decision rather than a technical one, see our piece on when to add languages to your audio guide.
What to actually test in a demo
If you're evaluating vendors, don't accept the marketing pitch. Run these tests:
Sample in your top 5 visitor languages. Not a translation of a generic paragraph. Your actual content. Have a native speaker on staff (or a volunteer docent, or a board member) listen and rate it. Ask them to be specific about what sounds off.
Sample in a regional variant you care about. If you have Mexican visitors, ask for Mexican Spanish specifically. Listen for the difference. If there isn't one, the platform treats Spanish as one language.
Sample a tier 3 language. Even if you don't plan to launch in it immediately. This tells you how honest the vendor is about quality. A vendor who admits "Basque works but you'll want to review it" is more trustworthy than one who insists everything is native-quality across all languages.
Test a specific art-historical term. "Tenebrism." "Chiaroscuro." "Ukiyo-e." Watch how the AI handles specialized vocabulary in each language. Generic platforms tend to either leave the term untranslated (sometimes fine) or produce an awkward calque (usually wrong).
Ask about adding a language not on the list. The answer reveals the architecture. "We'll add it in two weeks as a configuration change" is one kind of platform. "We need a statement of work and a three-month timeline" is another.
The economics flip
Here's the piece that changes how you should think about languages entirely.
On a traditional recorded guide, each language costs $5,000 to $15,000 in fixed production costs — script adaptation, voice talent, studio, QA. You pay that whether ten people use the Korean version or ten thousand. Every language you add is a bet.
On AI with per-interaction pricing, you pay when a visitor actually listens. Zero visitors in Hungarian means zero Hungarian cost. One visitor in Basque costs roughly what one visitor in English costs. All 40+ languages are there by default, and your budget exposure is proportional to actual usage.
This flips the language-decision from "which ones can we afford to add" to "which ones need curator review first." You're not gatekeeping access anymore. Every visitor gets a guide in their language on day one. What you're deciding is where to spend the one or two hours of native-speaker review time to move a tier 3 language closer to tier 2 for your specific content.
Museums we work with at Musa tend to start by launching everything, watching usage data for two or three months, then investing review time in the languages that actually get used. The 2% of visitors who speak Thai get a functional guide from day one, and if Thai usage warrants it, the top 20 stops get a native review in month four. That sequence is only possible when adding a language doesn't cost anything upfront.
The short version
The number of supported languages on the vendor's marketing page is the wrong question. Better questions: how do my top five visitor languages actually sound, which regional variants does the platform distinguish, how does the vendor describe quality variation across the long tail, and what does custom work for a rare language look like when I need it.
A vendor that talks honestly about tiers is more trustworthy than one that claims uniform quality. Uniform quality across 40 languages isn't a thing. Anyone selling it either doesn't know their own product, or is hoping you don't.