The future of experiential computing

Magazine: Features
The future of experiential computing

FREE CONTENT FEATURE

In another decade smart glasses, and the networked infrastructure that will make them possible, will fundamentally alter all that we know. It is hard to fully anticipate the impact of such a profound change, but we can make a few predictions.

The future of experiential computing

By Ken Perlin, October 2022

Full text also available in the ACM Digital Library as PDF | HTML | Digital Edition

Tags: Mixed / augmented reality, Virtual reality

New kinds of media can change how we interact with each other in unexpected and sometimes radical ways. The World Wide Web led to an unprecedented democratization of knowledge. Smartphones led to vast changes in everything—how we shop, how we socialize, and how we travel. And now smart glasses, and the networked infrastructure that will make them possible, are positioned to fundamentally alter everything such as how children learn, how work is conducted, and the meaning of shared public spaces.

This article is divided into three sections. First, we will describe what changes we can expect by 2033 in the enabling technology of everyday communication. After that, we will describe how those changes will affect our everyday lives. Finally, we will look at some long-term implications of these changes.

Communication Technology in 2033

Many advances in computer information science and engineering in the current decade will lead to a transformed world, as we fully enter the age of experiential computing. The massive bandwidth of 6G (up to 30 times greater than 5G) will alter consumer access to information as fundamentally as the smartphone did in 2008, bringing new opportunities for real-time analysis of places, objects, people, and their gestures; scene modeling and display; and new challenges in low-latency data compression and transmission. Because of this access to high bandwidth, by 2033 wearable smart glasses will essentially be input/output devices; cameras, displays, microphones and speakers will be on-board your glasses, but all significant computation will be done in the cloud.

Communication over a distance will consequently become far more embodied, and more like in-person communication. For example, hand and finger gesture-based interaction will become widely used. Our human hands and fingers provide a uniquely rich and powerful means of communication but, until now, it has been difficult to make full use of this power in computer-mediated communication due to the difficulty of tracking hands and fingers non-invasively with high accuracy and a high sample rate. Recent progress in machine learning (ML) will advance this capability and allow us to explore how hand and finger gestures can be combined with speech and gaze for wearers of smart glasses to create new and extremely rich forms of communication.

Meanwhile, continuous measurement and assessment of blood pressure, heart rate, facial expression, head movements, eye gaze, posture, and body language will become widely available. When combined with ML, such measurements will greatly increase the power of our computer interfaces to perceive variations in mood and intent, allowing them to recognize cognitive or emotional states such as confusion, lack of confidence, amusement, fear, distress, anger, or joy. The ability to better understand and respond to each user's state of mind will benefit education, physical therapy, and other fields that require understanding and empathy. Eye gaze in particular has a very rapid response time. Unlike other input methods, eye gaze as an input modality allows people to react as rapidly as they can think.

Audio will become more powerful as it becomes more embodied. Natural language processing (NLP) will be combined with gesture, facial expression, eye gaze, etc. to get the most out of embodied collaboration. Spatialized audio will become the norm. Humans respond to sound with extremely low latency, and incorporating spatial audio into multimodal, embodied communication will greatly increase our sense of shared spatial presence.

And we will be able to touch each other from a distance. Humans respond strongly to, and learn a great deal from, touch, yet technical limitations have limited our ability to use haptics as an input modality. Challenges in creating haptic feedback that has all of the richness and expressiveness of human touch include simulating human-to-human haptic interaction at a distance via robot proxies, while using ML to quickly figure out what the robots need to do to interact with humans, safely and in context, even in the presence of unavoidable network latency.

Coordinated swarms of extremely small drones or other mobile robots will obtain and merge visual data from many angles for real-time volumetric capture. This will allow us to see "holographic" views of people who are in distant locations. Challenges include low-latency compression and transmission that maintains fidelity of critical visual features, such as eye gaze and facial expression, while coordinating robots to collectively capture optimal information at each moment in time. We will likely choose to use our smart glasses to visually filter out the robots that we use for haptic feedback and appearance-capture, effectively rendering them invisible, just as today we choose not to see the electrical wires and plumbing in our houses and office buildings.

Communication over a distance will consequently become far more embodied, and more like in-person communication.

2D interfaces—which provide interactive tools such as buttons, sliders, and dynamic text labels—will become ubiquitously embedded into our shared 3D world, overlaid onto architecture and combined with hand gestures and pen-like tools. Collectively these will provide task-specific information with minimal cognitive load, as well as optimal choices of public, privately shared, or individually private information.

Meanwhile, interactive 3D graphics will merge with the physical world around us, continuing to blur the line between real and virtual. This will lead to advances in geometric modeling, analysis and synthesis of texture and lighting, visual and audio integration of real and synthetic objects, lightfield compression, and photorealistic rendering. Computer graphics will be enhanced by ML analysis of speech and gesture for real-time modeling of shape and creation of animation tasks, which are important in many fields including architectural planning, scientific simulation, and cinematic special effects, and are traditionally both difficult and time consuming.

What Normal Reality Will Be Like in 2033

This is not the first time that evolving technology has changed our world. Other historical game-changing paradigm shifts include trains, paperback books, cars and roads, indoor plumbing, electric utilities, cinema, airplanes, radio, TV, air conditioning, the web, and smartphones. Each of these has changed our reality. Here are some things you may be able to do in 2033 that you can't do now.

You will have an effective feeling of co-presence with people who are located remotely. When you are talking to someone who is physically elsewhere, you will think of them as being "here" not as being "there." You might, for example, tell somebody "I had coffee with Fred today," and mean exactly that, even though you and Fred live in different cities.

You will be able to hold a private conversation with one or more other people in a conference room or in public. Those of you in conversation will be able to see and hear things together that nobody else in the same physical space will be able to see or hear.

The computer network will know when you and I have established eye contact with each other in or across a crowded room. Intuitive user interfaces will make use of this capability. For example, you and I will both be able to hear each other perfectly while speaking softly. You will no longer need to be sitting next to someone in a meeting to whisper something to them privately.

We will all be living the real-life version of Harold and the Purple Crayon. You will be able to simply draw in the air between you and someone you are talking to, and those drawings will come to life in expressive ways. When they do, we will be able to interact with them.

If you and I are walking to a restaurant together, nobody but the two of us will be able to see the walking directions from Google, superimposed on the buildings around us—street signs that only the two of us can see. We will not be distracted by any need to look at smartphones, and therefore we will be able to focus on our conversation with each other. When we arrive at our table, the "menu" will allow us to see the food that we order as though it is already on our plates. The appearance of that virtual food will continue to evolve as we decide to change or refine our order.

When a friend or colleague is on their way to meet you, whether they are traveling by subway, bus, or car or are walking on foot, you will always be able to see where they are while they are still across town or some blocks away, and you will know how long it will be before they arrive. You won't need to stare at a dot moving on a smartphone screen—you will be able to see them.

You will be able to incorporate your "life stream" into your meetings. During any meeting, you will be able to call up, in an intuitive way, and refer to conversations that you had a week or a year earlier. You will be able to easily search your life stream for the particular conversation or topic that you are looking for. If you witness a crime, you will always be able to replay and report what you saw after the fact.

You will be able to accurately measure the color, proportion, time, velocity, length, weight, etc. of objects and spaces around you without conscious effort. When you go to IKEA, you will be able to clearly see whether the couch that you are looking at will fit between your end table and your piano back at home. You will also know whether those curtains you like will match your wall at home, even with different room lighting.

You can have virtual pets running around in your house that will feel like real pets. These will also be able to serve as companion pets and health condition monitors for older or infirm people.

Humans respond strongly to, and learn a great deal from, touch, yet technical limitations have limited our ability to use haptics as an input modality.

You will wear virtual jewelry that you will think of the way you now think of physical jewelry.

You will be able to use your smart glasses to look up information quickly and unobtrusively without needing to stare into a smartphone. Unlike earlier attempts at smart glasses, such as Google Glass, your smart glasses will support the use of voice, eye gaze, and natural gestures together to enhance your information search.

Everyone will be able to modify their perception of the color of walls, the height of ceilings, and other aspects of their physical surroundings to be whatever feels most comfortable to or productive for them at the moment. Easy-to-use virtual buttons and gestures will allow you to turn various visual and auditory features on or off in your reality. You can tune out the passing subway train or turn up the sound of the virtual milk frother in the cafe as white noise.

If you are an animator, you will be able to mold a virtual character, as though out of magical clay, with your hands, and then see that character come to life and exhibit responsive behavior. You will then be able to direct the character using voice and gesture.

Interactions that today seem like magic will by 2033 be taken for granted as ordinary reality. Of course the teapot pours itself when you ask it to. Why wouldn't it?

You will be able to make use of the collective public data gathered from everyone's networked smart glasses, because this data will be continually updating a searchable database of the world around us. For example, if there are already too many people in a coffee shop, or not enough open spaces in a parking lot, you will know these things beforehand and will be able to modify your plans accordingly.

Every person learns differently and has different learning needs. Once everyone has access to smart glasses, individualized customized learning will be supported for everyone, either in-person or via remote instruction. Asynchronous learning will also evolve in fundamental ways. We will be able to revisit—and then roam around within—in-person meetings after the fact, the way we now do with Google Docs, leaving notes for other people and otherwise iteratively annotating as needed. For example, if you have missed a meeting, you can see which ideas excited the greatest interest, giving you an important starting point as you jump into the project.

Currently creative ideation is better done in person than online since meeting in person is more effective at supporting the "happy accidents" of spontaneous conversation [1]. By 2033 remote conversations will start to approach a similarly high level of spontaneous creativity, allowing people to "beam in" together Jedi-council style and use non-verbal cues that can help us indicate to (or interpret from) each other whether to pursue a new thread of thought.

Two or more people will be able to read text that floats in the air between them. Within that text you will be able to see exactly where I and others are currently looking, as well as the history of where we have looked before. Translation of this text between languages will be automatic.

When you and I are in different locations, fidelity of volumetric capture and transmission will be sufficient for you to feel that I am present in the same room with you. Key human features, such as eye gaze, facial expressions, and fingers will be captured, transmitted, and reconstructed with high fidelity, whereas other features will be handled with lower fidelity to economize on available bandwidth and computational resources.

You and I will be able to hold hands at a distance, using a robot proxy. Your physical therapist will be able to reassuringly put an arm around you and steady you while you practice relearning how to walk. A loved one can put a loving hand on your shoulder when you have received bad news. You will be able to hand me a physical object, such as a model of an atom for constructing a proposed molecular structure, using a robot proxy. The robots that make this happen will be invisible to us, because they will be visually filtered out by our smart glasses.

Implications for Society

Just as Harry Potter's classmates took for granted that they could always use magic at Hogwarts, interactions that today seem like magic will by 2033 be taken for granted as ordinary reality. Of course the teapot pours itself when you ask it to. Why wouldn't it?

We can anticipate many positive economic shifts, including greater productivity, improved access to high-quality healthcare and education, increased ability for people to work and to age at home or to build communities together, and more fulfilling employment opportunities for older and differently abled people who cannot easily leave their room but are otherwise completely able to participate meaningfully in many working and/or intellectual pursuits they cannot currently access.

Yet there are also challenging implications for privacy, ethics, security, and accessibility. Who should be in control of your data? How much are you willing to give up your privacy for the sake of convenience? Such questions have always been important, but in the coming decade they are going to become even more critical.

Many advances in computer information science and engineering in the current decade will lead to a transformed world.

Perhaps the most fundamental long-term change will be that human natural language will evolve to incorporate greater and richer use of physical gestures, because those gestures will be able to make things happen in our shared computer-mediated physical space. The greatest agents of this change will be small children, because children seven years of age and younger are actually the creators of natural language [2]. Once this technology is in their hands, they will evolve natural language in new and powerful ways that we can hardly imagine.

Of course, in the larger picture, any technology-enhanced powers can be used for either good or evil. In a bad scenario, they could be used to help enable a police state, and to take away values that we cherish, including privacy. Yet those same capabilities could be used to help promote empathy and understanding. Perhaps we will have an enhanced ability to see that a person is becoming visibly angry not because they feel belligerent but because they feel fear and are in need of reassurance.

As with any advance in communication technology, these innovations will provide an opportunity to revisit important conversations around equity and accessibility in computing. Who are the intended stakeholders, and how can we ensure that these technologies can be used by everyone? By designing systems with accessibility as a core value, we can develop new languages of interaction that ensure equity for people with disabilities, and guarantee inclusion regardless of language and culture. Indeed, the decision of how much to integrate these tools into any particular interaction must remain the choice of each user. As always, we will need to be mindful of Kranzberg's first law [3]: Technology is neither good nor bad; nor is it neutral.

Acknowledgements

I would like to thank my colleagues and collaborators both at NYU and elsewhere, for our many inspiring conversations about the future, and for all that I have learned from their collective brilliance and camaraderie.

References

[1] Brucks S. M. and Levav J. Virtual communication curbs creative idea generation. Nature 27, 108–112 (2022).

[2] Pinker, S. The Language Instinct: The new science of language and mind. Allen Lane, The Penguin Press, London, 1994.

[3] Kranzberg, M. Technology and history: "Kranzberg's Laws." Technology and Culture 27, 3 (1986), 544–560.

Author

Ken Perlin, a professor in the Department of Computer Science at New York University, directs the Future Reality Lab, and is a participating faculty member at NYU MAGNET. His research interests include future reality, computer graphics and animation, user interfaces and education. He is chief scientist at Parallux and Tactonic Technologies. He is an advisor for High Fidelity and a Fellow of the National Academy of Inventors. He received an Academy Award for Technical Achievement from the Academy of Motion Picture Arts and Sciences for his noise and turbulence procedural texturing techniques, which are widely used in feature films and television, as well as membership in the ACM/SIGGRAPH Academy, the 2020 New York Visual Effects Society Empire Award the 2008 ACM/SIGGRAPH Computer Graphics Achievement Award, the TrapCode award for achievement in computer graphics research, the NYC Mayor's award for excellence in Science and Technology and the Sokol award for outstanding Science faculty at NYU, and a Presidential Young Investigator Award from the National Science Foundation. He serves on the Advisory Board for the Centre for Digital Media at GNWC. Previously he served on the program committee of the AAAS, was external examiner for the Interactive Digital Media program at Trinity College, general chair of the UIST2010 conference, directed the NYU Center for Advanced Technology and Games for Learning Institute, and has been a featured artist at the Whitney Museum of American Art. He received his Ph.D. in computer science from NYU, and a B.A. in theoretical mathematics from Harvard. Before working at NYU he was head of software development at R/GREENBERG Associates in New York, NY. Prior to that he was the system architect for computer generated animation at MAGI, where he worked on TRON.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Crossroads The ACM Magazine for Students

Magazine: Features The future of experiential computing

FREE CONTENT FEATURE

The future of experiential computing

Magazine: Features
The future of experiential computing