Most museums don't have audio descriptions. Not because they don't care, but because producing them the traditional way is prohibitively slow, expensive, and almost impossible to scale across an entire collection. A blind visitor arrives, and the audio guide, if one exists, describes the room number and the artwork's title. What the painting actually looks like? That's left to a companion or to imagination.
This is a failure of tooling, not intent. Museums want to serve blind and low-vision visitors. The economics of manual audio description have made it impractical for all but the most well-funded institutions covering their marquee pieces.
AI image analysis changes this completely. Not in an incremental "slightly cheaper" way, but in a "every artwork in the collection gets described, for free, out of the box" way.
What audio descriptions actually require
A good audio description of a visual artwork does more than list what's there. "A painting of a woman" tells a blind visitor almost nothing useful. What they need is the kind of information a sighted person absorbs without thinking.
Composition and spatial layout. Where is the subject positioned? What dominates the frame? Is the scene crowded or sparse? A visitor who can't see the painting needs to build a mental map of it, and that requires explicit spatial language: left, right, foreground, background, center.
Color and light. The emotional tone of many artworks lives in their palette. A description that skips color strips out half the meaning. Dark blues and greens versus bright yellows and oranges — these aren't decorative details. They're part of the artist's intent.
Textures and materials. Is the brushwork visible? Is the surface smooth or impasto? For sculpture, what does the material look like, rough bronze, polished marble, weathered wood? These qualities change how a visitor understands the object.
Depicted content. The subjects, their actions, their expressions. Symbols, text, recurring motifs. Background details that sighted visitors might notice only on a second look.
Context the eye provides. A sighted visitor standing in front of a large painting gets an immediate sense of its scale and presence. A blind visitor doesn't. The description needs to supply that — how large the work is, how it's mounted, what surrounds it.
All of this takes skill. Which is exactly why the traditional approach doesn't scale.
The traditional approach and why it stalls
Professional audio description for museums works like this: a trained describer, usually someone with background in both visual arts and accessibility, visits the museum, studies each artwork, and writes a description. That description gets reviewed by curators for accuracy, tested with blind users for clarity, and recorded by a narrator.
The output is excellent. A well-crafted audio description by an experienced professional is detailed, structured, and useful. Nobody disputes the quality.
The problem is coverage. A professional describer might produce five to ten polished descriptions per day. For a museum with hundreds or thousands of objects, the math doesn't work. Budget for fifty descriptions, and that covers the highlights tour — the same pieces that already get the most interpretive attention. The remaining 95% of the collection stays silent.
Then there's maintenance. Collections rotate. Temporary exhibitions arrive and leave. Labels get rewritten. Every change potentially invalidates existing descriptions. The professional approach assumes a static collection, which most museums don't have.
The result is predictable. A handful of major museums offer audio descriptions for their flagship works. Everyone else offers nothing, or a minimal set that covers a single tour. Blind and low-vision visitors learn to expect very little.
What AI image analysis makes possible
AI vision models can now look at an image and produce a detailed account of what's in it: colors, composition, subjects, text, symbols, spatial relationships, and materials. These are the same categories a human describer addresses.
The quality isn't identical to a specialist human describer. An experienced audio describer brings art-historical judgment, careful word choice honed over years, and an intuitive sense for what matters most to a blind listener. AI image analysis brings breadth, speed, and something the traditional approach can't offer: interactivity.
When image analysis is built into an audio guide system, several things change.
Every artwork gets a description. Not only the fifty on the highlights tour. Every piece in the collection that has an image in the system, which for most museums is everything, becomes describable. A blind visitor can stop at any work and ask what they're looking at.
Descriptions are generated on demand. No pre-production, no recording sessions, no per-artwork cost. The analysis happens when the visitor needs it. Temporary exhibitions are covered from day one. A new acquisition is describable the moment its image enters the system.
Visitors can ask follow-up questions. This is where AI descriptions go beyond even the best pre-recorded ones. A traditional audio description is a fixed script. It covers what the describer chose to include, in the order they chose. An AI-powered guide can respond to specific questions. "What's in the bottom-left corner?" "What color is her dress?" "Are there any animals in the painting?" The visitor directs the description based on what they want to know, not what someone else decided to tell them.
A real example: seeing what the visitor didn't
We run a tour at Comma, a site connected to the painter Carl Frederik von Breda, with artworks by various artists including works associated with the af Klint legacy. One painting in the collection features Roman numerals — I, II, and III — marking different sections of the composition.
A visitor using the guide asked about section III. The AI had analyzed the image and detected all three numbered sections. Instead of answering only about the third section, it described all three, including sections I and II that the visitor hadn't asked about, because it had identified them as related visual elements in the same work.
A small moment, but it shows something worth noting. The AI wasn't following a script. It understood the visual structure of the painting well enough to provide context the visitor didn't know to request. A pre-written description might have covered the same ground, but only if the describer had anticipated that specific question. The AI responded to what was actually there in the image.
The curb-cut effect
There's a metalworking technique where you shape a tool for one purpose and discover it cuts better for another. Building image analysis for blind visitors did something similar. It made the experience better for everyone.
Once a guide can analyze and discuss visual content, sighted visitors start using it too. Someone standing in front of a painting notices a symbol they don't recognize and asks the guide about it. The guide looks at the image, identifies the symbol, and explains its significance. A visitor who's colorblind asks about the palette. A parent asks the guide to describe a painting to their child in simple terms.
None of these people are blind. But they're all benefiting from a capability that was built for accessibility. The image analysis that lets a blind visitor understand what a painting looks like also lets a sighted visitor understand what they're looking at more deeply.
This pattern shows up repeatedly in accessibility work. Features designed for edge cases turn out to improve the mainstream experience. Curb cuts were built for wheelchair users, and now everyone with a stroller, a suitcase, or a delivery cart uses them. Closed captions were built for deaf viewers, and now people watch with subtitles in noisy bars and quiet bedrooms. Audio descriptions built for blind museum visitors give every visitor a richer way to engage with visual art.
We didn't build image analysis as an accessibility feature and then find general applications for it. We found that making the system work for blind visitors forced us to build something more capable than we would have otherwise, and that capability benefited everyone.
Writing good audio descriptions
Whether you're using AI-generated descriptions, professional human-written ones, or a combination of both, these principles matter.
Start with the overview, then go specific. A blind visitor encountering a new artwork needs orientation first. What kind of work is it? How large? What's the dominant impression? Then move into details — specific figures, objects, text, fine details. This mirrors how a sighted person takes in a work: general impression first, then focused looking.
Name colors and their qualities. Not just "blue" but "a deep, muted blue that dominates the upper third." Color descriptions carry emotional weight and help visitors understand the artwork's mood. For abstract works, color and form may be the entire description.
Describe spatial relationships explicitly. "A figure stands at the center" is less useful than "a figure stands at the center of the canvas, facing left, with a landscape stretching behind her to the right edge." Blind visitors build mental images from spatial language. Be precise about position, scale, and relationship.
Don't interpret — describe first. There's a difference between "a sad woman" and "a woman with downcast eyes and a hand resting against her cheek." The second lets the listener form their own impression. Good description provides the visual evidence and lets the visitor do the interpreting, just as a sighted visitor would.
Mention what's unusual or unexpected. If a painting has a hidden figure, an anachronistic object, a visual pun, or an odd compositional choice — say so. These are the details that make art interesting, and they're the ones most easily missed without description.
Don't over-describe. A three-minute description of every square inch exhausts the listener. Focus on what matters. The artist's signature location is less important than the expression on a subject's face. Prioritize what gives the visitor the most meaningful understanding.
Making it work in practice
For museums implementing AI-powered audio descriptions, a few practical considerations.
Image quality matters. AI image analysis works from the images you provide. High-resolution photographs that capture the full artwork produce better descriptions than low-resolution crops or photos with glare. If your collection images are poor, the descriptions will be too. This is a one-time investment that benefits every digital use of your collection.
Pair descriptions with human curation where it counts. AI-generated descriptions handle breadth. For your most important or most complex works — pieces where curatorial interpretation matters, where the visual content is ambiguous, or where cultural sensitivity requires careful language, add human-written descriptions or curatorial notes that the AI can draw on. The combination of AI breadth and human depth covers more ground than either approach alone.
Test with actual blind users. The best descriptions in the world are useless if they don't work for the people they're designed for. Partner with local organizations for the blind during development. Their feedback will reveal gaps you can't see, literally. Common issues: descriptions that assume visual knowledge ("the typical Impressionist palette"), spatial language that's ambiguous, and descriptions that are too long for the context.
Don't gate it behind a special mode. If a visitor has to find and activate an "accessibility mode" to get descriptions, most won't. Build visual description into the standard guide experience. Any visitor should be able to ask "what does this painting look like?" and get a useful answer. Screen reader compatibility should be the default, not an option buried in settings.
What this means for your museum
Audio descriptions have been a known need and a persistent gap in museum accessibility for decades. The barrier was never willingness. It was the cost and effort of producing descriptions at the scale of an actual collection.
AI image analysis removes that barrier. Not partially. Fully. Every image in your system becomes describable, in any language, with no per-artwork production cost. Visitors get descriptions on demand and can ask follow-up questions that no pre-written script could anticipate.
At Musa, this comes included. There's no accessibility add-on, no premium tier, no per-description charge. Image analysis runs on every artwork that has an image. Blind visitors get detailed descriptions. Sighted visitors get the ability to ask about visual details. The museum doesn't have to do anything extra to enable it.
If your museum has been putting off audio descriptions because the traditional approach felt too expensive or too slow to cover your full collection, that constraint no longer applies.