Multimodal learning and the future of artificial intelligence
Lian Jye Su of ABI Research
According to our research, the total installed base of devices with Artificial Intelligence will grow from 2.694 billion in 2019 to 4.471 billion in 2024.
Billions of petabytes of data flow through AI devices every day. However, right now, most of these AI devices are working independently of one another. Yet, as the volume of data flowing through these devices increases in the coming years, technology companies and implementers will need to figure out a way for all of them to learn, think, and work together in order to truly take advantage of the potential that AI can deliver.
The key to making that a reality is multimodal learning, and it is fast becoming one of the most exciting – and potentially transformative – fields of AI.
What is multimodal learning?
Multimodal learning consolidates a series of disconnected, heterogenous data from various sensors and data inputs into a single model. Unlike traditional unimodal learning systems, multimodal systems can carry complementary information about each other, which will only become evident when they are both included in the learning process. Therefore, learning-based methods that combine signals from different modalities are capable of generating more robust inference, or even new insights, which would be impossible in a unimodal system.
Multimodal learning presents two primary benefits:
- Multiple sensors observing the same data can make more robust predictions, because detecting changes in it may only be possible when both modalities are present.
- The fusion of multiple sensors can facilitate the capture of complementary information or trends that may not be captured by individual modalities.
A gap between strategy and demand
Multimodal is well placed to scale, as the underlying supporting technologies like Deep Neural Networks (DNNs) have already done so in unimodal applications like image recognition in camera surveillance or voice recognition and Natural Language Processing (NLP) in virtual assistants like Amazon’s Alexa. Furthermore, the cost of developing new multimodal systems has fallen because the market landscape for both hardware sensors and perception software is already very competitive.
In addition, organisations are beginning to embrace the need to invest in multimodal learning in order to break out of AI silos. Instead of independent AI devices, they want to manage and automate processes that span the entirety of their operations.
Given these factors, ABI Research projects that the total number of devices shipped with multimodal learning applications will grow from 3.94 million in 2017 to 514.12 million in 2023, at a Compound Annual Growth Rate (CAGR) of 83%.
However, most AI platform companies, including IBM, Microsoft, Amazon, and Google, continue to focus on predominantly unimodal systems. Even the most widely known multimodal systems, IBM Watson and Microsoft Azure, have failed to gain much commercial traction – a result of poor marketing and positioning of multimodal’s capabilities.
This gap between demand and supply presents opportunities for platform companies and other partners. Multimodal learning will also create an opportunity for chip vendors, as some use cases will need to be implemented at the edge. The implementation requirements of sophisticated edge multimodal learning systems will favor heterogenous chip systems, because of their ability to serve both sequential and parallel processing.
Opportunities in key end markets
Momentum around driving multimodal applications into devices continues to build, with five end-market verticals most eagerly on board:
In the automotive space, multimodal learning is being introduced to Advanced Driver Assistance Systems (ADAS), In-Vehicle Human Machine Interface (HMI) assistants, and Driver Monitoring Systems (DMS) for real-time inferencing and prediction.
Robotics vendors are incorporating multimodal systems into robotics HMIs and movement automation to broaden consumer appeal and provide greater collaboration between workers and robots in the industrial space.
Consumer device companies, especially those in the smartphone and smart home markets, are in fierce competition to demonstrate the value of their products over competitors’. New features and refined systems are critical to generating a marketing edge, making consumer electronics companies prime candidates for integrating multimodal learning-enabled systems into their products. Growing use cases include security and payment authentication, recommendation and personalisation engines, and personal assistants.
Medical companies and hospitals are still relatively early in their exploration of multimodal learning techniques, but there are already some promising emerging applications in medical imaging. The value of multimodal learning to patients and doctors will be a difficult proposition for health services to resist, even if adoption starts out slow.
Media and entertainment companies are already using multimodal learning to help with structuring their content into labelled metadata to improve content recommendation systems, personalised advertising, and automated compliance marking. So far, deployments of metadata tagging system have been limited, as the technology has only recently been made available to the industry.
Where does multimodal learning go from here?
Multimodal learning has the potential to connect the disparate landscape of AI devices and truly power business intelligence and enterprise-wide optimisation. Learn more about the technology, and its impact on key verticals, in our free whitepaper – Artificial Intelligence Meets Business Intelligence, which is part of ABI Research’s AI & Machine Learning service.
The key to making that a reality is multimodal learning, and it is fast becoming one of the most exciting – and potentially transformative – fields of AI.
What is multimodal learning?
Multimodal learning consolidates a series of disconnected, heterogenous data from various sensors and data inputs into a single model. Unlike traditional unimodal learning systems, multimodal systems can carry complementary information about each other, which will only become evident when they are both included in the learning process. Therefore, learning-based methods that combine signals from different modalities are capable of generating more robust inference, or even new insights, which would be impossible in a unimodal system.
Multimodal learning presents two primary benefits:
- Multiple sensors observing the same data can make more robust predictions, because detecting changes in it may only be possible when both modalities are present.
- The fusion of multiple sensors can facilitate the capture of complementary information or trends that may not be captured by individual modalities.
A gap between strategy and demand
Multimodal is well placed to scale, as the underlying supporting technologies like Deep Neural Networks (DNNs) have already done so in unimodal applications like image recognition in camera surveillance or voice recognition and Natural Language Processing (NLP) in virtual assistants like Amazon’s Alexa. Furthermore, the cost of developing new multimodal systems has fallen because the market landscape for both hardware sensors and perception software is already very competitive.
In addition, organisations are beginning to embrace the need to invest in multimodal learning in order to break out of AI silos. Instead of independent AI devices, they want to manage and automate processes that span the entirety of their operations.
Given these factors, ABI Research projects that the total number of devices shipped with multimodal learning applications will grow from 3.94 million in 2017 to 514.12 million in 2023, at a Compound Annual Growth Rate (CAGR) of 83%.
However, most AI platform companies, including IBM, Microsoft, Amazon, and Google, continue to focus on predominantly unimodal systems. Even the most widely known multimodal systems, IBM Watson and Microsoft Azure, have failed to gain much commercial traction – a result of poor marketing and positioning of multimodal’s capabilities.
This gap between demand and supply presents opportunities for platform companies and other partners. Multimodal learning will also create an opportunity for chip vendors, as some use cases will need to be implemented at the edge. The implementation requirements of sophisticated edge multimodal learning systems will favor heterogenous chip systems, because of their ability to serve both sequential and parallel processing.
Opportunities in key end markets
Momentum around driving multimodal applications into devices continues to build, with five end-market verticals most eagerly on board:
In the automotive space, multimodal learning is being introduced to Advanced Driver Assistance Systems (ADAS), In-Vehicle Human Machine Interface (HMI) assistants, and Driver Monitoring Systems (DMS) for real-time inferencing and prediction.
Robotics vendors are incorporating multimodal systems into robotics HMIs and movement automation to broaden consumer appeal and provide greater collaboration between workers and robots in the industrial space.
Consumer device companies, especially those in the smartphone and smart home markets, are in fierce competition to demonstrate the value of their products over competitors’. New features and refined systems are critical to generating a marketing edge, making consumer electronics companies prime candidates for integrating multimodal learning-enabled systems into their products. Growing use cases include security and payment authentication, recommendation and personalisation engines, and personal assistants.
Medical companies and hospitals are still relatively early in their exploration of multimodal learning techniques, but there are already some promising emerging applications in medical imaging. The value of multimodal learning to patients and doctors will be a difficult proposition for health services to resist, even if adoption starts out slow.
Media and entertainment companies are already using multimodal learning to help with structuring their content into labelled metadata to improve content recommendation systems, personalised advertising, and automated compliance marking. So far, deployments of metadata tagging system have been limited, as the technology has only recently been made available to the industry.
Where does multimodal learning go from here?
Multimodal learning has the potential to connect the disparate landscape of AI devices and truly power business intelligence and enterprise-wide optimisation. Learn more about the technology, and its impact on key verticals, in our free whitepaper – Artificial Intelligence Meets Business Intelligence, which is part of ABI Research’s AI & Machine Learning service.
The author is Lian Jye Su, principal analyst of ABI Research
Comment on this article below or via Twitter @IoTGN