A digital lead at a mid-size art museum emailed me last month with a question I get a lot. "We already have Axiell for collections, Contentful for the website, and our audio guide vendor. Do we really need yet another content system just for the audio tour?"
Fair question. The honest answer: you probably don't need another one, but you do need a one. And the audio guide vendor you've been working with may or may not count, depending on how they've set themselves up.
Here's the thing most procurement conversations miss. A modern audio guide isn't a piece of content you buy. It's a content system you operate. If nobody at your institution can change a stop, add a language, or publish a new tour variant without emailing the vendor, you don't have a CMS. You have a deliverable that happens to play audio.
What an audio guide actually needs to model
Before deciding whether your existing tools can handle audio guide content, look at what that content actually is. It isn't pages. It isn't rows in a collection database. It's a specific shape that general-purpose tools don't model well.
A working audio guide needs to represent at least these things as first-class objects:
- Tours — curated sequences with a theme, a target audience, a duration, and sometimes a physical route. "Highlights in 45 minutes." "Dutch Golden Age deep dive." "Kids' scavenger hunt."
- Stops — the actual moments of interpretation. Not the same thing as objects. A stop might cover one painting, three objects in a vitrine, an empty room with a story, or an architectural detail with no catalog record at all.
- Objects — the underlying collection records the stops usually reference. Provenance, materials, makers, dates.
- Translations — not as a content field, but as a workflow. English goes in, twelve other languages need review, approval, and versioning independently.
- Personas and voices — the curator, the chatty docent, the kid-friendly guide. Different framings of the same knowledge.
- Instructions and prompts — in AI-driven guides, the editorial guardrails that shape how the system talks. "Don't speculate about attribution." "Always mention the conservation history for this piece."
- Variants — the holiday tour, the accessibility tour, the after-hours tour. Same underlying knowledge, different presentation.
A website CMS treats all of this as "pages with fields." A collection management system treats it as "records with controlled vocabulary." Neither is wrong. Both are incomplete.
Why your website CMS isn't the answer
I've seen three museums try to run their audio guide out of Contentful or a headless CMS they already licensed. Each time, the same thing happened.
The first few stops work fine. You create a "Stop" content type, add fields for title, description, audio file, and transcript. You stand up a simple endpoint. The app on the visitor's phone fetches the data. Everything looks clean.
Then the second language shows up. The CMS supports localization, but only at the field level. Your review workflow for the Dutch translation doesn't actually exist. The translator edits a draft, but there's no way to preview how the Dutch version sounds in the actual tour without publishing it. So someone publishes, someone catches an error, the Dutch-speaking visitor has already heard it. The institution's answer is to build a parallel approval flow outside the CMS, usually in Google Docs.
Then the second tour shows up. The highlights tour and the deep-dive tour share most of their stops but frame them differently. The CMS has no concept of "same stop, different framing," so either you duplicate every stop (and now maintain two versions of everything) or you bolt a reference field onto the content model and pray editors understand the indirection.
Then the temporary exhibition shows up with a two-month lifespan. Your CMS doesn't have a concept of time-bounded tours. Someone sets a reminder in a shared calendar. The reminder gets missed.
None of these problems are unsolvable. They're just solved poorly by a CMS built for marketing pages. You can hammer a nail with a wrench. Most people prefer a hammer.
Why your collection management system isn't the answer either
The opposite instinct is to run audio guide content out of Axiell, Vernon, Mimsy, or TMS. The collection database already has the objects. Just add a "tour text" field and a "audio file" field and you're done.
This breaks for a different reason. Collection management systems are, correctly, built around the principle that a record represents an object that exists in your collection. Provenance, accessioning, condition reports, loans. That's the job. The controlled vocabularies and authority files that make collection systems valuable are exactly the wrong shape for interpretive content.
A tour stop often isn't about one catalog record. It might be about the relationship between two paintings. It might be about the room itself. It might be a pause-and-listen moment where the curator tells you about the donor. Trying to force that into a record-per-object schema mangles both the collection data and the tour content.
We've also watched what happens when interpretation lives in the collection CMS. Curators get nervous about editing catalog records because those are the permanent scholarly record. So nobody touches the audio guide text, because touching it means touching the catalog. The audio guide becomes fossilised inside the most conservative tool in the building.
The integration pattern that works
The split that actually holds up in practice is simple. Your collection management system is the source of truth for what exists. Your audio guide CMS is the source of truth for how it's interpreted on tour. They talk to each other, but they don't merge.
The direction of the sync matters. Objects, makers, dates, images, and authority records flow from collections into the audio guide system. Interpretive content, tour structures, personas, and visitor-facing language live only in the audio guide system. If a curator fixes an attribution in Axiell, that update surfaces on the tour. If a tour writer reframes the stop, that change stays in the audio guide CMS and doesn't pollute the catalog.
This is the same pattern that works for museum websites talking to collections, and for good reason. The integrations that matter are the ones where each system does its job and defers to the others. A well-designed audio guide API makes this boring rather than heroic.
What good audio guide CMS UX actually looks like
If you're evaluating vendors, here's what separates a real CMS from a content deliverable dressed up as one.
Draft and publish states that mean something. You should be able to build an entire temporary exhibition tour over three weeks without any visitor seeing it. One click to go live. One click to roll back if something breaks.
Preview that matches the live experience. Not a separate preview app. Not a PDF. You should be able to walk through the tour on your phone exactly the way a visitor will, before publishing. If you have to imagine how it will sound, the tool isn't finished.
Multilingual review as a workflow, not a field. Each language needs its own draft state, its own reviewer, its own approval. A Japanese translator should be able to fix a nuance without blocking the German release.
AI-assisted drafting with human approval. The writing is the slow part. A good CMS drafts stops from your existing curatorial notes, catalog records, and wall text, then puts a human in the approval seat. The human edits, rewrites, or rejects. The AI never publishes on its own. If your vendor either refuses to use AI for drafting or uses it without a human checkpoint, both are wrong.
Versioning that distinguishes knowledge from presentation. When you update the base facts about an object, the interpretations built on top shouldn't all need rewriting. When you update the interpretation, the base facts stay stable. The schema has to separate these, or you'll end up redoing work every time a catalog record changes.
Access control that matches real roles. A curator editing tour copy has different needs than a front-desk staffer issuing QR codes. Both belong in the system. Neither should see the other's interface. Training museum staff on the audio guide gets dramatically easier when the tool matches the role.
The honest vendor test
The fastest way to tell whether your current or prospective audio guide vendor has a real CMS: ask them how you'd add a new stop to an existing tour next Tuesday.
If the answer is "send us a script and an MP3 and we'll have it live within two weeks," that's not a CMS. That's a production service. Useful in some situations, but not what you're buying if you're buying a CMS.
If the answer is "your education lead logs in, pastes the curatorial notes, picks a persona, previews it, and hits publish," that's a CMS.
If the answer involves you emailing a spreadsheet, assume the worst. Spreadsheet-based update workflows are the tell that the "CMS" is really just the vendor's internal tooling with no customer-facing surface. There are still vendors in this market operating that way in 2026. Don't sign a multi-year contract with one of them.
The pricing model follows the product model
Here's the shift that most procurement documents haven't caught up to yet.
The old audio guide model treated content as a project-scoped deliverable. You paid a vendor to produce a tour. You got a locked artefact back. Every subsequent change was a new project with a new statement of work. The pricing reflected that: big upfront fee, long production cycle, per-edit charges thereafter. You were buying a thing, once.
A real audio guide CMS flips the relationship. The tour is a live product your team owns and evolves. Content isn't delivered, it's cultivated. In that world, the vendor's job is to keep the platform running, improving, and stable. Not to write your stops for you. The pricing has to follow.
The model we've seen work — and what we built toward ourselves — is zero capex, revenue share or per-interaction. No big upfront content fee. No per-edit charge. The vendor earns when visitors actually use the guide. If the guide is good, both sides win. If the guide stops delivering value, the museum isn't locked into a sunk cost.
This is a genuine category shift. A museum evaluating audio guide systems today should be asking vendors directly: are you selling me a deliverable, or a platform? If it's a deliverable, the pricing should reflect that it's a one-time purchase, and you should expect to replace it in five years. If it's a platform, the pricing should be operational — tied to usage, not production — and you should expect to keep improving it every month.
Purpose-built systems like Musa are built around the second model. Curators edit directly. Tours go live in hours. The vendor doesn't stand between you and your own content. Pricing aligns with visitor interactions rather than editorial changes. Whether that's Musa or something else, the category test is the same: can your team change the tour today, without a ticket?
Where to start
If you're early in this evaluation, two practical moves.
First, look at how your content actually flows today. Draw the path from a curator having an idea to a visitor hearing something new. Count the handoffs. Count the weeks. That map tells you where your current stack is breaking, and what a CMS would actually have to fix.
Second, when you talk to vendors, skip the demo of the visitor app for the first meeting. Ask to see the authoring side. Ask to see a real museum's staging environment. Ask how an edit flows from draft to live. The quality of an audio guide CMS is almost never visible in the visitor-facing product. It's visible in what your team does on Tuesday morning.
Get that part right and the rest — the voices, the languages, the tours — follows from it. Get it wrong and you'll be paying someone else to make small changes for the next decade.