May 15, 2019
By Bob O'Donnell
FOSTER CITY, Calif. — The original promise was bold, but the reality has been far from it. Perhaps until now.
Personal digital assistants – the voice-based interfaces that started with Apple’s Siri – were supposed to enable entirely new and significantly more intuitive ways of interacting with our devices, but they ended up being so frustrating that many people gave up on them.
Thankfully, we’re starting to see some significant advances in phone-based digital assistants, and recent developments from both Microsoft’s Build and Google’s I/O (and likely from Apple’s upcoming WWDC) developer conferences all highlight the important progress that has been made.
These conferences, primarily intended for software developers to learn about new advances in the host company’s technology platforms, also serve as a great guidepost for consumers to understand how technology is evolving. What became clear at both Build and I/O is that digital assistants are finally getting smart enough to be able to interpret what we really mean when we speak to them.
The truth is, the human brain is a remarkably flexible computing engine that immediately recognizes the context of a conversation and can interpret, for example, how the third comment you make in a conversation relates to the first one, or foreshadows the fourth one.
Computing devices can’t easily do that, even if they can accurately translate each individual word you say. The problem is that as soon as we started talking to machines, we quickly assumed they could easily understand the context of something like “what are my options for booking a flight to Chicago next Friday?” Unfortunately, they couldn’t.
Finally, however, we’re starting to see assistant-based technology that can properly decipher a phrase like this and then actually take the action of looking up flights on your preferred airlines, reading them aloud, and even booking a reservation in your name once you’ve selected one.
Even better, it can do it in reaction to receiving an email or text requesting a meeting that merely implies the need to, say, fly to Chicago. Underneath the covers of these extended, multi-turn and multi-context conversations are a great many Artificial Intelligence-based software algorithms that have “learned” what the words you say mean, what your own preferences are, and then what discrete actions need to be taken toward an expected outcome.
Microsoft calls this work conversational AI and is working to bring it not only to its Cortana personal assistant for PCs but also to other companies that want to integrate conversational AI into their own devices. At Build, for example, Microsoft discussed a partnership with BMW, where their latest X7 sedan will incorporate this technology, but it will all be branded and displayed as BMW.
In Google’s case, they also took a huge step forward in making their Google Assistant work while disconnected from any network connection. While that might sound like a minor technical detail, it’s profoundly important because it means the AI engines required to run Google Assistant can run on a smartphone.
As Google points out, this enables faster reaction times (up to 10x faster they said) and, most importantly, means your assistant can work without having to send your personal data to the cloud. Given all the very legitimate privacy-based concerns that have been raised about smart speakers and digital assistants, this is an extremely important step.
To be clear, Google can and does anonymize some of the data it collects and sends it to the cloud, but it’s using a technique they call federated learning that keeps it from being associated specifically with you. Essentially, federated learning allows Google to make its AI algorithms smarter by incorporating data from a wide variety of people, but then it sends the updated algorithms back down to the phone, where, over time, it provides even more accurate answers to your own questions. It’s a clever approach that I expect to see other vendors adopt in one form or another.
Another important announcement Google made about their forthcoming version of Google Assistant 2.0 (expected to be available with the next release of Android, codenamed Q, this fall) is the integration of the Google Lens camera-based functions into the Assistant.
Google Lens can do things like translate text, when you point your phone’s camera at it, into your native language, read text aloud for those who have difficulty seeing or reading, recognize locations or objects to better understand the physical location in which you find yourself, and more. In essence, the Google Lens functions give “eyes” to your digital assistant and can leverage that visual data to provide more accurate responses to your requests.
We’re clearly not completely past all the frustrations that plagued the earliest versions of voice-based digital assistants, but we are finally starting to see some of the capabilities that many people hoped would be in this technology. As they start to become more widely available, these voice-based assistants really will be able to transform our interactions with not just our digital devices, but the world around us.
Here’s a link to the original column: https://www.usatoday.com/story/tech/columnist/2019/05/15/alexa-google-assistant-and-siri-finally-get-smart-enough-keep-up/1192238001/
USA TODAY columnist Bob O'Donnell is the president and chief analyst of TECHnalysis Research, a market research and consulting firm that provides strategic consulting and market research services to the technology industry and professional financial community. His clients are major technology firms including Microsoft, HP, Dell, and Intel. You can follow him on Twitter @bobodtech.