How AI is poised to make video calls as good as live face-to-face meetings
The pandemic response provided a foretaste of what remote conferencing can eventually become.
For nearly a third of America’s workforce in the early months of 2020, the opportunity to continue both working and meeting with co-workers from home, was something approaching a miracle. Had an office-clearing pandemic like Covid-19 arrived even one generation earlier, that sort of transition couldn’t have happened. Yet for most organizations, having their personnel work from home proved to be a remarkably productive alternative to conventional office-based work, despite the fact that it relied heavily on the employees’ own consumer-grade devices using internet-based video conference technologies.
Of course, it wasn’t perfect. Users experienced issues with both the sounds and images of video conferencing. There were background noises and echoes. Some forgot to mute themselves when other family members came into the room. The voices of people speaking directly into their microphones would boom, while others came across as barely audible. People’s faces were often poorly framed and badly lit. Resolution was frequently terrible, accompanied by jerky movements. Virtual backgrounds, when used, came across as laughably unconvincing, frequently swallowing up the user’s body parts as they talked. Participants’ gazes were typically focused on their screens or on their notes rather than toward their cameras and viewers. For a generation that grew up watching television, it was strictly Amateur Hour.
But imagine, for a moment, that a knowledgeable TV director had been available to run the conference’s technical side. Issues with sound and lighting would have been identified right away and corrected. For example, a participant’s image that had been framed showing their face, but just from the nose up, would be fixed. Camera angles could be changed to follow the speaker’s movements. Sound levels would be balanced. Light levels could be adjusted. Speakers could be queued to avoid talking over one another. And filters could help to soften unwanted background noises.
All these steps could help improve the quality of remote conferences, but having human directors running the boards as they might do in a studio is only a fantasy. In reality, however, none of these functions requires having a person on hand to do live direction. All of these measures and more can be automated, resulting in markedly improved technical quality of video conferences. Not only that, the companies that provide the most popular conferencing software are going even farther.
Improving Video Conferencing with AI
By mixing artificial intelligence into their applications, leading software vendors are starting to provide video conference enhancements that even a well-equipped commercial TV studio would struggle to offer. And it isn’t just limited to home-based workers, although they are certainly among its beneficiaries. For example, AI can help improve video compression, resulting in levels of image resolution that rivals HD. In the corporate office, AI can provide meeting-room analytics based on user patterns that can automate rescheduling of calls, rebooking of meeting rooms, sending important notifications, and suggesting resources that participants might need in their meeting.
For both home and office use, Microsoft and Google are rolling out software features that can recognize and cancel out unwanted background noises like lawn mowers, traffic, barking dogs, or the clatter of eating utensils. Unlike the broad-spectrum noise cancelling features of currently available headphones, this feature uses a cloud-based algorithm to analyze the sound and suppress unwanted noises while leaving the sounds of human speech in place. By using cloud resources rather than relying on the user’s own device for calculation, the system helps to make sure that the heavy lifting is done where the computational resources are greatest. Beyond that, both Google and Microsoft are on the cusp of offering multi-lingual transcriptions for everything said during the conference.
But it doesn’t stop there. Demonstrations of software tools similar to those used to create “deep fakes” are now being used to shift the gaze of participants, and even adjust their whole posture, to provide better camera views. What is being called “keypoint extraction,” where specific facial muscles are animated to create a lifelike version of the speaker, but requiring only a tiny bandwidth channel, or even creating entire avatars to represent speakers, are also in the works. Participant backgrounds can be replaced with far more convincing virtual ones. It’s even possible to simulate a studio audience using cluster of head and shoulder shots taken from another environment.
Of course, many people look forward to returning to the traditional office setting, at least for part of their work week. And once the pandemic is under control, that return is a strong likelihood. But the pattern of working from home is likely to remain a prominent element of the work environment going forward, and workstations specifically designed to improve the experience and efficiency of the work-from-home setting seem inevitable. It may not be the same as showing up at the office, but in some respects, it may actually turn out better.