Way back in the 1980’s in Silicon Valley I began regularly hearing the words “voice UI.” In other words, control a device using voice instead of buttons or keys, let alone a mouse or as some of us were working on at the time, touch.
I was a “voice control” nay vote. The reason was simple: voice control really only works in private situations. I’ll use two simple examples, both travel based:
- You’re driving your car and say “change radio channel.”
- You’re on a plane and want to change what’s shown on the seatback display so say “show latest movies.”
You already know what I’m going to write: #1 works because the odds are that you’re the only one in your car. #2 doesn’t work because the person in the seat next to you just said “show latest TV shows.” (Don’t laugh: I was just on a flight where four different people around me tried to pair their headphones to their seatback screen but that showed up on my iPad mini as a pair request, which I denied ;~).
So how is this relevant to cameras?
There’s a company, Camera Intelligence, that just got another big round of funding and who's building an m4/3 camera that supports AI-interpreted voice commands. Indeed, that seems to be the only “innovation” involved (at least the only one they’ve talked about in the press releases I’ve seen). Their specific video example has someone at a tourist destination saying: “hey, could you record a 30-second video” and “great, can you now make the colors more vibrant?” First problem, there’s a lag that’s significantly longer than a button press involved in responding to both things. Second problem, if not used in a private situation, their camera would probably also respond to me standing next to them saying “stop” to my out of control child. Oopsie.
Of course, you might imagine that AI could learn to distinguish your voice from others. Still, can you imagine standing in the room at the Louvre with the Mona Lisa in it where everyone is talking to their camera or phone?
I’m not saying voice control shouldn’t be attempted. It’s been shown to be a great way to help the disabled stay current with technology, for instance. The Mac I’m writing this on can be pretty much completely controlled by voice, right down to moving the virtual mouse. However, voice still has that privacy/public issue. A disabled person may already be isolated, so privacy isn’t an issue for some, but “open access” to voice control is something that can be quite problematic.
We already have the problem of people talking incessantly on their smartphones in public places. On my recent plane trip I heard one end of fourteen conversations while in the waiting area, the boarding line, and even in my seat waiting for the plane to take off. Clearly we can’t have voice control everything as well as being the only communication channel used, otherwise we’re all going to be in a constant din of voices. Indeed, that will trigger another disability category, as there is already a group of people who have hearing issues in situations where everyone is speaking; we’d just be making that a problem more common for them by using voice for everything. (Those folk, by the way, hate and avoid Apple stores, because all the reflective surfaces in those stores just make for a constant voice din that interferes with their ability to hear a one-on-one conversation.)
When I published my “Hey Nikki” April Fool’s joke last year, I was surprised at how many people went all the way to the buying page (thousands!). Apparently quite a few folk think voice control is the answer to something. That “something”, by the way, is almost always "complexity.” The proper way to address complexity comes from better design, and that may or may not include some voice control. For devices used in public, I don’t see voice as the best answer to controlling a device, however.
I wish Camera Intelligence luck. But I think they’ll need something other than “voice control” to actually make a camera that’s viable in the market.