A visitor stands at a memorial to enslaved people. They tap a button on their phone. An AI voice begins speaking.
What happens in the next ten seconds determines whether the technology serves or betrays the site's purpose. The wrong tone (too casual, too clinical, too detached) doesn't just fail as interpretation. It causes harm.
This is the question we hear from memorial sites, Holocaust museums, conflict heritage organizations, and indigenous cultural centers: can AI be trusted with content where tone is everything?
The short answer is that we believe it can be, with the right design. The longer answer requires honesty about what the technology does well, where the risks are real, and what we haven't yet proven.
Why this is different from other audio guide work
Most audio guide design is about engagement: keeping visitors interested, handling multiple languages, reducing cognitive load. Getting something slightly wrong at an art museum is a missed opportunity. Getting something wrong at Auschwitz is a violation.
The asymmetry matters. At a contemporary art gallery, an AI guide that occasionally sounds a bit flat or overly enthusiastic is a minor quality issue. At the National Memorial for Peace and Justice, at Tuol Sleng, at Robben Island, the margin for error collapses to zero. One flippant phrase, one poorly worded response to a visitor's question, one moment where the AI sounds like it's summarizing atrocity the way it would summarize a Wikipedia article, and the institution's credibility is damaged in ways that are hard to recover from.
Memorial sites know this already. It's why many are cautious about any technology that puts generated content between the visitor and the experience.
That caution is warranted. We want to address it directly rather than wave it away.
What "tone control" actually means in AI guides
The fear is understandable: an AI model, trained on the general internet, producing audio narration at a genocide memorial. If you picture a chatbot with a museum skin, the fear is justified.
But that's not how purpose-built AI guide systems work. The architecture matters.
When a museum designs an AI guide character on a platform like Musa, they're not writing a prompt that says "be respectful." They're building a persona at multiple layers, and each layer constrains the AI's behavior more tightly than the last.
Character design. The fundamental personality, tone, and behavioral boundaries. For a memorial site, this might mean: speaks with gravity, never uses humor, acknowledges suffering without sensationalizing it, defers to primary sources wherever possible. This isn't a suggestion the AI might ignore; it's baked into every response the system generates.
Content guidelines. What data the AI draws from, what it refuses to speculate about, how it handles ambiguity. A memorial site can specify that the AI never offers interpretive opinions about atrocities, only presents documented facts and lets visitors draw their own meaning.
Per-stop instructions. At a specific location, say a room containing personal belongings of victims, the AI can be instructed to speak more quietly (in generated speech terms), to keep responses shorter, to not volunteer additional information unless asked. Different spaces within the same site can have different behavioral rules.
Tour-level narrative arc. The overall emotional trajectory of the visit. A memorial audio guide might be designed to build gradually from historical context to personal testimony to reflection, and the AI maintains that arc even when visitors ask questions that could pull it off course.
The result is not "we told the AI to be careful." It's a system where the character cannot easily stray from the tone the institution designed, because the constraints operate at every level of the generation process.
The adversarial question
Someone will try to make the AI say something terrible. This is not hypothetical. Any public-facing AI system attracts people who treat breaking it as a game.
At a memorial site, the stakes of that game are higher. An attacker who forces an AI guide to crack a joke about the Holocaust, even if no one else hears it, creates a screenshot that can go viral and destroy the institution's reputation.
Here's where things stand: modern AI safety systems are good at handling adversarial prompts. Multiple detection layers identify when a user is trying to force the system outside its guidelines. The character design itself resists it. A persona built around solemnity and respect doesn't shift to humor because someone asks it to. Any such attempt violates terms of service, with full conversation logs available as evidence.
Is it theoretically possible to break through? With enough persistence and creativity, probably. No software system is perfectly adversarial-proof. But we're talking about the kind of sustained, sophisticated attack that would require real effort, not a visitor casually testing boundaries. The safety layers make this extremely unlikely in normal operation.
For sites where even "extremely unlikely" feels like too much risk, there's a more conservative option: disable free-form Q&A entirely and run the guide in curated-only mode. The AI delivers the designed narration, responds to navigation requests, and handles language selection, but doesn't accept open-ended questions. This eliminates the adversarial surface while keeping the benefits of multilingual, AI-generated delivery.
What we haven't proven yet
We should be direct about this: we don't have experience running AI guides at sensitive memorial sites. The use cases we know best are art museums, heritage houses, and cultural institutions where the interpretive stakes, while real, aren't in the same category as genocide memorials or sites of mass atrocity.
That means the architecture we've described (character design, content guardrails, per-stop instructions, adversarial detection) is proven in other contexts. We're confident in the technical foundation. But "confident in the technical foundation" is not the same as "tested under the specific pressures of a site where survivors and their families visit."
Every memorial site has its own sensitivities, its own community of stakeholders, its own red lines that outsiders might not anticipate. A technology provider can build good systems. Only the community and the institution can determine whether those systems are adequate for their particular context.
This is why we think the right approach is gradual, conservative, and always deferential to the people whose stories are being told.
What should stay human
Not everything at a sensitive site should be AI-generated. Some content needs human voices, literally.
Survivor testimony. Recordings from survivors, witnesses, and descendants should be presented exactly as recorded. The pauses, the emotion, the imperfect grammar, the ambient sound. This is primary source material. An AI should never paraphrase, summarize, or re-narrate testimony. It can introduce a recording ("You're about to hear from Maria, who was twelve years old when...") and provide context afterward. But the testimony itself is untouchable.
Core interpretive statements. The institution's position on what happened, why it matters, and what it means. These are curatorial statements that carry the weight of scholarship and institutional authority. Write them. Record them with a human voice if possible. The AI can deliver factual and contextual information around these statements, but the statements themselves should be authored and fixed.
Moments of silence. Some spaces call for nothing. No narration, no ambient sound, no helpful AI commentary. The guide should know when to stop talking. This is a design choice. Per-stop instructions can tell the AI to remain silent unless the visitor explicitly requests information.
What AI can handle well: multilingual delivery of curated content, wayfinding and practical information, factual Q&A within the boundaries of documented history, and adaptive pacing based on how long a visitor spends in each space. These are the operational tasks that benefit from AI without touching the interpretive core.
The consent question
Who gets to decide how the story is told?
For indigenous sacred sites, the answer should be the community. Full stop. Technology providers and even the institutions managing the sites are not the right arbiters of how indigenous narratives are presented, what can be shared publicly, and what must remain restricted.
The same principle applies, in different forms, to other sensitive heritage. Descendant communities of enslaved people should have a voice in how slavery heritage sites tell their ancestors' stories. Communities affected by conflict should be involved in how conflict heritage is interpreted.
An AI audio guide makes this both easier and harder. Easier because the system can be updated quickly when community feedback arrives, no re-recording sessions, no hardware updates. Harder because the AI generates new language in real time, and communities can't pre-approve every possible thing it might say.
The practical solution is layered. Community stakeholders approve the character design, the content boundaries, and the core narrative. They review sample outputs across a range of visitor interactions. They have ongoing access to conversation logs and the ability to flag problems. And for the most sensitive content, the AI doesn't generate at all. It delivers pre-approved, human-authored narration.
This is slower than a standard audio guide deployment. It should be.
A practical path forward
If you're responsible for a sensitive heritage site and considering any kind of AI-assisted audio guide, here's what we'd recommend.
Start with curated-only mode. Design the full tour with fixed narration. Use AI for multilingual generation and delivery, but don't enable free-form Q&A. This gives you the operational benefits (many languages, no hardware, easy updates) without the interpretive risk of open-ended AI responses.
Log everything. Every AI-generated response should be recorded and reviewable. For sensitive sites, this isn't optional. It's the mechanism that lets you catch problems, demonstrate accountability, and build confidence over time.
Open Q&A gradually. Once the curated tour is running and you've built confidence in the system's tone, consider enabling limited Q&A, perhaps only for factual and wayfinding questions at first, with interpretive questions redirected to the curated content. Expand the scope as your comfort grows.
Involve stakeholder communities from the start. Not after the guide is built. Not as a review step. From the initial design conversations. Their input shapes the character, the boundaries, and the red lines. Without it, you're guessing at sensitivities you may not understand.
Maintain a human fallback. For the foreseeable future, sensitive sites should have the ability to pull the AI back to fully curated mode at any time. A system update, a change in community sentiment, or a single problematic interaction should be enough reason to tighten the guardrails immediately.
The honest assessment
AI audio guides offer real benefits for sensitive heritage sites: multilingual accessibility, consistent tone across thousands of visitor interactions, easy updates as historical understanding evolves, and the ability to meet visitors where they are without dumbing down the content.
The risks are also real. A poorly designed system could cause genuine harm at a site where harm has already been done.
The question is not whether AI can be trusted with sensitive content in the abstract. It's whether a specific system, designed with a specific community, for a specific site, with the right safeguards, can meet the standard that site demands. That's an institutional decision, not a technology one.
We think the tools exist to do this well. We also think doing it well requires more care, more time, and more humility than a standard audio guide deployment. Sites that rush the process to save money or meet a deadline are taking a risk they shouldn't take.
If you're thinking about this for your site and want to talk through what a careful, staged approach looks like, we're available. No pressure, no timeline. Getting this right matters more than getting it done fast.