How to Pilot an AI Museum Guide

You're convinced an AI museum guide could work for your institution. Maybe you've seen a demo, read the case studies, or talked to a peer museum that's already using one. The question now isn't whether to try it. It's how to try it in a way that produces a clear answer: should we commit to this, adjust course, or walk away?

That's what a pilot is for. Not a soft launch you hope nobody notices. Not a proof of concept that lives on someone's laptop. A structured test with real visitors, defined metrics, and a decision framework at the end.

Here's how to run one well.

Step 1: Define the scope

The biggest mistake museums make with pilots is scoping them too broadly. "Let's test the AI guide across the entire museum" sounds ambitious. In practice, it creates too many variables, too many stakeholders, and too little clarity about what you're actually testing.

A good pilot scope is narrow enough to execute cleanly and broad enough to generate meaningful data.

Gallery scope. Pick one gallery, one wing, or one tour path. Your highlights tour is usually the best choice -- it's the experience most visitors would use, and it has the highest traffic. Alternatively, a temporary exhibition works well because it has a natural start and end date.

Visitor scope. During the pilot, the guide should be available to all visitors in the scoped area. Don't limit it to a hand-picked test group. You want real, unfiltered usage data from the full range of visitors who walk through your doors.

Content scope. You don't need to cover every object. Start with the 20-30 most important stops on the tour you've chosen. The guide doesn't have "gaps" -- it simply doesn't offer commentary on objects that aren't in the system yet. That's the same experience visitors have today without any guide at all.

Duration. 30 to 60 days of live visitor usage. Shorter doesn't generate enough data. Longer delays the decision without adding proportional insight.

Write the scope down. One paragraph. Share it with everyone involved. When scope creep starts -- and it will -- point back to the paragraph.

Step 2: Get the right stakeholders on board

A pilot doesn't need unanimous institutional buy-in. But it does need the right people aligned.

The sponsor. Someone with authority to approve the pilot and act on its results. Usually the director, deputy director, or head of visitor experience. This person doesn't need to run the pilot -- they need to protect it from being derailed by internal politics.

The champion. One person who owns the pilot day to day. They coordinate with the provider, manage internal feedback, and keep things moving. This is usually someone in visitor experience, digital, or education.

Curatorial input. Curators don't need to manage the pilot, but they need to review the guide's content before it goes live. Their role is quality control: does the guide say things we're comfortable with? Is the interpretation accurate? Does the tone match our institutional voice? Get their sign-off on the initial content, then invite ongoing feedback.

Front-of-house staff. The people who interact with visitors every day. They need to know what the guide is, how visitors access it, and what to say when someone asks about it. Their buy-in matters because they control the visitor's first impression of the guide. If staff don't mention it, adoption drops.

Who you don't need yet. Board approval (for a pilot, not a permanent commitment). Full IT integration. Marketing campaigns. Legal review of every possible edge case. These can come later if the pilot succeeds. Front-loading them is the fastest way to delay a pilot by three months.

Step 3: Define success metrics before you start

This step is non-negotiable. If you don't define success before the pilot, you'll define it afterward to match whatever happened. That's not a pilot. That's confirmation bias.

Your metrics should include:

Adoption rate. What percentage of visitors in the scoped area use the guide? Set a target. For a BYOD guide with QR code access, 10-15% in the first month is a reasonable starting point. If you're coming from zero (no prior audio guide), even 5% is meaningful because it represents visitors who now get interpretation they didn't have before.

Completion rate. Of visitors who start the guide, what percentage reach at least 70% of the stops? This tells you whether the guide is compelling enough to hold attention through the tour.

Visitor feedback. A simple question at the end of the guide or at the exit: "How was the audio guide?" Capture a satisfaction score and, optionally, a free-text comment. Target a baseline score and track it weekly.

Language distribution. Which languages are visitors using? This data alone can justify the pilot -- it often reveals unmet demand that the museum didn't know existed.

Qualitative staff feedback. Are front-of-house staff hearing positive comments? Are they seeing visitors engage differently with the collection? This is anecdotal but valuable for the decision at the end.

Write these metrics down before the pilot starts. Share them with the sponsor and the champion. At the end of the pilot, you'll evaluate against these specific targets, not against a vague sense of whether it "went well."

Step 4: Set up and test internally

Before visitors touch the guide, your team should break it.

The provider will set up the initial content based on your collection data. Once it's ready, distribute it to your internal team: curators, educators, front-of-house staff, the director. Ask them to use it as a visitor would. Walk the tour. Ask questions. Try to confuse it. Note what feels wrong.

Internal testing typically reveals:

  • Factual issues that need correcting
  • Tonal mismatches (too academic, too casual, too generic)
  • Missing connections between objects that experts consider obvious
  • Tour flow problems (the suggested order doesn't match how visitors actually move)
  • Questions the guide handles poorly

These are all quick fixes. With an AI guide, adjustments take minutes, not weeks. Edit a per-stop instruction, tweak the persona settings, add a curatorial note. The next visitor hears the updated version immediately.

Plan for one week of internal testing. Two rounds of feedback and adjustment are usually enough to get the guide to a state where you're comfortable putting it in front of visitors.

Step 5: Launch the pilot

Go live. Make the guide available and make sure visitors know about it.

Signage. QR codes at the gallery entrance, at the ticketing desk, and at individual stops. Clear, simple: "Free audio guide -- scan here." Don't bury it.

Staff briefing. A ten-minute briefing for front-of-house staff. What is it, how does it work, what do they say when a visitor asks. Give them the script: "We have a free audio guide you can access on your phone. Just scan the QR code at the entrance." That's enough.

No fanfare. Don't over-promote during the pilot. You want natural adoption data, not inflated numbers from a marketing push. Mention it on your website and at the desk. Let the guide prove itself through the experience.

During the pilot, monitor the metrics weekly. Not daily -- daily fluctuations are noise. Weekly trends are signal. If adoption is flat after two weeks, investigate. Are visitors seeing the QR codes? Are staff mentioning it? Is the guide loading properly on all devices? Usually the issue is visibility, not quality.

Step 6: Collect and analyze data

At the end of the pilot period, pull the numbers and compare them to your pre-defined metrics.

If adoption exceeded your target: The guide resonated. Visitors found it and used it. This is the strongest signal.

If adoption met your target but completion was low: Visitors started the guide but didn't finish. The content or tour design needs work, but the concept is validated. Adjust and re-pilot for two more weeks, or move to full rollout with a plan to iterate.

If adoption was below target: Investigate before concluding the guide failed. Low adoption usually means a distribution problem, not a product problem. Were QR codes visible? Did staff promote it? Was the onboarding frictionless? A guide that 3% of visitors used because nobody knew about it is different from one that 3% used despite heavy promotion.

Look at the qualitative data too. What did visitors say? What did staff observe? What questions did visitors ask the guide? This context gives meaning to the numbers. A 10% adoption rate with glowing feedback means something different from 10% with mixed reactions.

Step 7: Decide

The pilot ends with a decision. Three options:

Scale. The data supports full rollout. Expand the guide to the full museum, increase promotion, integrate it into ticketing and the website. This is the path for pilots that hit their metrics and generated positive feedback.

Adjust and re-pilot. The data is promising but not conclusive. Maybe adoption was lower than expected because of a distribution issue. Maybe the content needs significant rework. Run a second, shorter pilot (two to three weeks) with the adjustments in place.

Walk away. The data doesn't support continuing. This is a valid outcome and one of the pilot's main purposes: finding out cheaply. If the guide isn't right for your institution or your visitors right now, you've learned that at the cost of a small pilot rather than a full deployment.

The decision should be made by the sponsor, informed by the champion's analysis. It should be made against the pre-defined metrics, not against gut feeling.

Common pilot mistakes

Not promoting the guide. A QR code taped to a wall behind a plant does not count as promotion. If visitors don't know the guide exists, they can't use it. Low adoption from poor visibility is not the guide's fault.

Letting perfect be the enemy of live. A curator who wants to review every possible AI response before launch will delay the pilot indefinitely. Get comfortable with "good enough to test" and iterate based on real usage.

Too many cooks. If seven stakeholders need to approve the pilot plan, you'll spend more time in meetings than testing. Keep the decision-making circle small: sponsor, champion, one curatorial reviewer.

No baseline. If you don't measure your current state before the pilot, you can't measure improvement afterward. Capture your current audio guide adoption rate (or confirm it's zero), recent visitor feedback scores, and any relevant operational metrics.

Moving the goalposts. If you defined success as 10% adoption and you hit 8%, that's a near-miss worth investigating, not a failure to redefine away. Stick to the metrics you set. If they were wrong, note that for next time.

The business case connection

A successful pilot generates the evidence you need to build a full business case. Instead of projecting hypothetical benefits, you can point to real data from your museum, your visitors, your collection.

For guidance on structuring that business case, see Building the Business Case for a Museum Audio Guide. For the tactical timeline of getting from pilot to full rollout, see How to Launch a Museum Audio Guide in 30 Days.


A pilot is the lowest-risk way to answer the question "will this work for us?" It costs less than a full deployment, generates real evidence, and gives you a clear decision point. The museums that move fastest are the ones that scope tightly, define metrics upfront, and commit to acting on the data.

If you're ready to scope a pilot for your institution, we can help you design one that answers your specific questions.

Frequently Asked Questions

How long should a museum AI guide pilot last?
30 to 60 days of live visitor usage is the sweet spot. Shorter pilots don't generate enough data to be statistically meaningful. Longer pilots delay the decision without adding proportional insight. You need about 4-6 weeks of visitor data to see stable adoption patterns and gather meaningful feedback.
What does a museum need to start an AI guide pilot?
Your existing collection data (catalogue entries, wall text, curatorial notes), a decision about which galleries or tours to include, and one internal champion who owns the pilot. You don't need new content creation, hardware procurement, or a large budget. The provider handles the technical setup.
How much does it cost to pilot an AI museum guide?
Most AI guide providers offer pilot-specific pricing -- either a flat fee for the pilot period or usage-based pricing that scales with visitor volume. The cost is typically a fraction of a traditional audio guide deployment because there's no hardware, no recording sessions, and no per-language production costs.
What metrics should a museum track during an AI guide pilot?
At minimum: adoption rate (percentage of visitors who use the guide), completion rate, language distribution, and qualitative visitor feedback. Advanced metrics include engagement per stop, questions asked, drop-off points, and comparison to baseline visitor satisfaction scores.

Related Resources