Extended reality technologies for applications in the metaverse

Mukawa, Hiroshi; Hirota, Yoichi; Mizuno, Hiroki; Murata, Makoto; Homma, Fuminori; Mochizuki, Keita; Ogawa, Ryo; Mamishin, Yuki; Aga, Hiroyuki; Yokono, Jun; Shimada, Daiki; Fukuchi, Masaki

doi:10.1038/s44287-025-00211-4

Review Article
Published: 25 September 2025

Extended reality technologies for applications in the metaverse

Hiroshi Mukawa ORCID: orcid.org/0009-0008-6586-7322¹,
Yoichi Hirota²,
Hiroki Mizuno²,
Makoto Murata³,
Fuminori Homma³,
Keita Mochizuki⁴,
Ryo Ogawa⁴,
Yuki Mamishin⁴,
Hiroyuki Aga⁵,
Jun Yokono ORCID: orcid.org/0009-0009-5975-2569⁶,
Daiki Shimada ORCID: orcid.org/0009-0001-5840-0192⁶ &
…
Masaki Fukuchi⁷

Nature Reviews Electrical Engineering (2025)Cite this article

87 Accesses
Metrics details

Subjects

Abstract

The metaverse has gained increasing attention with advances in artificial intelligence (AI), semiconductor devices and high-speed networks. Although the metaverse has potential across various industries and consumer markets, it remains in the early stages of development, with further progress in extended reality (XR) technologies anticipated. In this Review, we provide an overview of essential XR technologies for immersive metaverse experiences enabling human–digital interactions. Motion sensing, eye tracking, pose estimation and 3D mapping, scene understanding, digital humans, conversational AI for metaverse non-player characters, motion-to-photon latency compensation and optical display systems are important for human–digital interaction in the metaverse, with AI accelerating the evolution of these technologies. Key challenges include the accuracy and robustness of sensing and recognition of users and surrounding environments, real-time content generation reflecting the users’ responses and environments, and high-performance XR head-mounted displays with compact form factors. Realizing this potential will enable people to interact more genuinely with each other and digital objects in healthcare, education, retail, manufacturing and everyday life.

Key points

One of the key user values of the metaverse is a sense of immersion and presence.
The technologies to present a sense of immersion and presence to users are extended reality (XR) technologies, which can enhance reality expressions and natural interactions.
An XR workflow consists of sensing and recognition, content generation and output, and various technologies comprise the workflow.
Artificial intelligence (AI) technologies have crucial roles in the sensing and recognition fields, as well as in the XR content generation domain.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The metaverse and extended reality (XR) workflow.**

**Fig. 2: Timeline of motion sensing for professional and mid-level users.**

**Fig. 4: Comparison between the physically based and image-based approaches.**

**Fig. 5: Asynchronous time warp (ATW) for rendering during rotational head motion.**

**Fig. 6: Optical architectures for extended reality (XR) head-mounted displays (HMDs).**

Development of metaverse for intelligent healthcare

Article 15 November 2022

Understanding consumers’ willingness to accept digital exhibitions in the metaverse: evidence from China

Article Open access 29 August 2025

Avatar led interventions in the Metaverse reveal that interpersonal effectiveness can be measured, predicted, and improved

Article Open access 19 December 2022

References

Martí-Testón, A. et al. Using WebXR metaverse platforms to create touristic services and cultural promotion. Appl. Sci. 13, 8544–8548 (2023).
Article Google Scholar
Lin, H. et al. Metaverse in education: vision, opportunities, and challenges. In 2022 IEEE Int. Conf. on Big Data 2857–2866 (IEEE, 2022).
Lee, C. W. Application of metaverse service to healthcare industry: a strategic perspective. Int. J. Environ. Res. Public. Health 19, 13038 (2022).
Article Google Scholar
Sobolev, T. Top 15 metaverse games in 2025. Fgfactory https://fgfactory.com/top-15-metaverse-games-in-2024 (21 April 2025).
GVR. Metaverse market size, share & trends analysis report by product, by platform, by technology (blockchain, virtual reality (VR) & augmented reality (AR), mixed reality (MR)), by application, by end-use, by region, and segment forecasts, 2023–2030. Grand View Research https://www.grandviewresearch.com/industry-analysis/metaverse-market-report (2022).
Nahalingam, K. & Katehi, L. P. A review of the recent developments in the fabrication processes of CMOS image sensors for smartphones. Preprint at https://arxiv.org/abs/2306.05339 (2023).
Sun, Y., Agostini, N. B., Dong, S. & Kaeli, D. Summarizing CPU and GPU design trends with product data. Preprint at https://arxiv.org/abs/1911.11313 (2019).
Kang, C. & Lee, H. Recent progress of organic light-emitting diode microdisplays for augmented reality/virtual reality applications. J. Inf. Disp. 23, 19–32 (2022).
Article Google Scholar
Kshirsagar, P. R., Reddy, D. H., Dhingra, M., Dhabliya, D. & Gupta, A. A review on comparative study of 4G, 5G and 6G networks. In 2022 5th Int. Conf. Contemporary Computing and Informatics 1830–1833 (2022).
Hu, Y., Hu, W. & Quigley, A. Towards using generative AI for facilitating image creation in spatial augmented reality. In 2023 IEEE International Symposium on Mixed and Augmented Reality Adjunct 441–443 (IEEE, 2023).
Tiple, B. et al. AI based augmented reality assistant. Int. J. Intell. Syst. Appl. Eng. 12, 505–516 (2024).
Google Scholar
Sahu, K. C., Young, C. & Rai, R. Artificial intelligence (AI) in augmented reality (AR)-assisted manufacturing applications: a review. Int. J. Prod. Res. 59, 4903–4959 (2021).
Article Google Scholar
Jeremiah, R., Hutson J., and Wright, A. A proposed meta-reality immersive development pipeline: generative AI models and extended reality (XR) content for the metaverse. J. Intell. Learn. Syst. Appl. 15, 24–35 (2023).
Google Scholar
Mukawa, H. Review and perspective of XR technologies for immersive experience. SID Symp. Digest Technical Papers 55, 559–562 (2024).
Article Google Scholar
Wongkitrungrueng, A. and Suprawan, L. Metaverse meets branding: examining consumer responses to immersive brand experiences. Int. J. Hum. Comput. Interact. 40, 2905–2924 (2024).
Article Google Scholar
Jafar, R. M. S. & Ahmad, W. Tourist loyalty in the metaverse: the role of immersive tourism experience and cognitive perceptions. Tour. Rev. 79, 321–336 (2024).
Article Google Scholar
Queiroz, A. et al. Collaborative tasks in immersive virtual reality increase learning. In Proc.16th Int. Conf. Computer-Supported Collaborative Learning—CSCL 2023 27–34 (ISLS, 2023).
Moeslund, B., Adrian, H. & Volker, K. A survey of advances in vision-based human motion capture and analysis. Comp. Vis. Image Underst. 104, 90–126 (2006).
Article Google Scholar
Filippeschi, A. et al. Survey of motion tracking methods based on inertial sensors: a focus on upper limb human motion. Sensors 17, 1257 (2017).
Article Google Scholar
Van, E. & Reijne, M. Accuracy of human motion capture systems for sport applications; state-of-the-art review. Eur. J. Sport. Sci. 18, 806–819 (2018).
Article Google Scholar
Titterton, H. & Weston, L. Strapdown Inertial Navigation Technology (IET, 2004).
Schepers, M., Giuberti, M. & Bellusci, G. Xsens MVN: consistent tracking of human motion using inertial sensing. Xsens Technol. 1, 1–8 (2018).
Google Scholar
Nakano, N. et al. Evaluation of 3D markerless motion capture accuracy using openpose with multiple video cameras. Front. Sports Act. Living 2, 50 (2020).
Article Google Scholar
Cao, Z., Simon, T., Wei, E. & Sheikh, Y. Realtime multi-person 2D pose estimation using part affinity fields. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 7291–7299 (IEEE/CVF, 2017).
Goel, S., Pavlakos, G., Rajasegaran, J., Kanazawa, A. & Malik, J. Humans in 4D: reconstructing and tracking humans with transformers. In Proc. IEEE/CVF Int. Conf. Computer Vision 14783–14794 (IEEE/CVF, 2023).
Huang, Y. et al. Deep inertial poser: learning to reconstruct human pose from sparse inertial measurements in real time. ACM Trans. Graph. 37, 1–15 (2018). This work is an example of how machine learning can substantially reduce the number of sensors in a conventional motion capture system.
Google Scholar
Jiang, J. et al. Avatarposer: articulated full-body pose tracking from sparse motion sensing. In Eur. Conf. Computer Vision 443–460 (2022).
Rhodin, H. et al. Egocap: egocentric marker-less motion capture with two fisheye cameras. ACM Trans. Graph. 35, 1–11 (2016).
Article Google Scholar
Shiratori, T., Park, S., Sigal, L., Sheikh, Y. & Hodgins, K. Motion capture from body-mounted cameras. In ACM SIGGRAPH 2011 Papers Vol. 30 31 (2011).
Kulozik, J., Nathanaël, J. Evaluating the precision of the HTC VIVE ultimate tracker with robotic and human movements under varied environmental conditions. Preprint at https://arxiv.org/abs/2409.01947 (2024).
Mourikis, I. & Roumeliotis, I. A multi-state constraint kalman filter for vision-aided inertial navigation. In Proc. 2007 IEEE Int. Conf. Robotics and Automation 3565–3572 (IEEE, 2007).
Du, Y. et al. Avatars grow legs: generating smooth human motion from sparse tracking inputs with diffusion model. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 481–490 (IEEE/CVF, 2023).
Peng, B., Abbeel, P., Levine, S. & Van, M. Deepmimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37, 1–14 (2018).
Google Scholar
Taheri, O., Choutas, V., Black, J. & Tzionas, Goal: generating 4D whole-body motion for hand-object grasping. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 13263–13273 (IEEE/CVF, 2022).
Zhang, X., Bhatnagar, L., Starke, S., Guzov, V. & Pons-Moll, G. Couch: towards controllable human–chair interactions. In Eur. Conf. Computer Vision 518–535 (2022).
Liang, H., Zhang, W., Li, W., Yu, J. & Xu, L. Intergen: diffusion-based multi-human motion generation under complex interactions. Int. J. Comp. Vis. 132, 3463–3483 (2024).
Article Google Scholar
Braun, J., Christen, S., Kocabas, M., Aksan, E. & Hilliges, O. Physically plausible full-body hand–object interaction synthesis. In Int. Conf. 3D Vision 464–473 (2024).
Lee, S., Starke, S., Ye, Y., Won, J. & Winkler, A. Questenvsim: environment-aware simulated motion tracking from sparse sensors. In ACM SIGGRAPH 2023 Conf. Proc. 62 (Association for Computing Machinery, 2023).
Klein, G. S. & Murray, D. W. Parallel tracking and mapping for small AR workspaces. In 6th IEEE and ACM Int. Symp. Mixed and Augmented Reality 225–234 (IEEE, 2007).
Dai, Y., Wu, J. & Wang, D. A review of common techniques for visual simultaneous localization and mapping. J. Robot. 2023, 8872822 (2023).
Google Scholar
Mur-Artal, R., Montiel, J. M. & Tardós, J. D. ORB-SLAM: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31, 1147–1163 (2015). This work presents ORB-SLAM, a groundbreaking monocular SLAM system that revolutionized the field by introducing a versatile, real-time approach that uses the same ORB features for all SLAM tasks, enabling robust performance across diverse environments and paving the way for future advancements in visual SLAM technology.
Article Google Scholar
Engel, J., Koltun, V. & Cremers, D. Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40, 611–625 (2017).
Article Google Scholar
Qin, T., Li, P. & Shen, S. VINS-Mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 34, 1004–1020 (2018).
Article Google Scholar
Newcombe, R. A. et al. Kinectfusion: real-time dense surface mapping and tracking. In 10th IEEE Int. Symp. Mixed and Augmented Reality 127–136 (IEEE, 2011).
Gallego, G. et al. Event-based vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 154–180 (2020).
Article Google Scholar
SHAN, Tixiao et al. LIO-SAM: tightly-coupled LiDAR inertial odometry via smoothing and mapping. In IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS) 5135–5142 (IEEE, 2020).
Piron, F., Morrison, D., Yuce, M. R. & Redouté, J. M. A review of single-photon avalanche diode time-of-flight imaging sensor arrays. IEEE Sens. J. 21, 12654–12666 (2020).
Article Google Scholar
Kamata, H. et al. MEMS gyro array employing array signal processing for interference and outlier suppression. In IEEE Int. Symp. Inertial Sensors and Systems (INERTIAL) 1–4 (IEEE, 2020).
Sarlin, P. E., DeTone, D., Malisiewicz, T. & Rabinovich, A. SuperGlue: learning feature matching with graph neural networks. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 4938–4947 (IEEE/CVF, 2020).
Teed, Z. & Deng, J. DROID-SLAM: deep visual slam for monocular, stereo, and RGB-D cameras. Adv. Neural Inf. Process. Syst. 34, 16558–16569 (2021).
Google Scholar
Rosinol, A., Leonard, J. J. & Carlone, L. NeRF-SLAM: real-time dense monocular SLAM with neural radiance fields. In IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS) 3437–3444 (IEEE, 2023).
Liu, L. et al. Deep learning for generic object detection: a survey. Int. J. Comput. Vis. 128, 261–318 (2020). This survey traces how deep learning-based object detection became the foundation for scene understanding, context reasoning and scale-aware perception, while spotlighting challenges linking object cues to holistic image semantics.
Article Google Scholar
Viola, P. & Jones, M. Rapid object detection using a boosted cascade of simple features. In Proc. 2001 IEEE Computer Society Conf. Computer Vision and Pattern Recognition (IEEE, 2001).
Papageorgiou, C., Oren, M. & Poggio, T. A general framework for object detection. In Proc. Sixth Int. Conf. Computer Vision 555–562 (1998).
Lowe, D. Object recognition from local scale-invariant features. Proc. Seventh IEEE Int. Conf. Comput. Vis. 2, 1150–1157 (1999).
Article Google Scholar
Schuhmann, C. et al. LAION-5B: an open large-scale dataset for training next generation image-text models. In Proc. 36th Int. Conf. Neural Information Processing Systems (eds Koyejo, S. et al.) 25278–25294 (NeurIPS, 2022).
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th Int. Conf. Machine Learning (eds Meila, M. &Zhang, T.) 8748–8763 (MLRearchPress, 2021).
Li, J., Dongxu, L., Savarese, S. & Hoi, S. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. In Int. Conf. Machine Learning (eds Krause, A. et al.) 19730–19742 (MLRearchPress, 2023).
Wu, J. et al. GRiT: a generative region-to-text transformer for object understanding. In Eur. Conf. Computer Vision (eds Leonardis, A. et al.) 207–224 (Springer Nature, 2024).
Liu, S. et al. Grounding DINO: marrying DINO with grounded pre-training for open-set object detection. In Eur. Conf. Computer Vision (eds Leonardis, A. et al.) 38–55 (Springer Nature, 2023).
Liu, H., Li, C., Wu, Q. & Lee, Y. J. Visual instruction tuning. Adv. Neural Inf. Process. Syst. 36, 34892–34916 (2023).
Google Scholar
Liu, A. et al. Deepseek-v3 technical report. Preprint at https://arxiv.org/abs/2412.19437 (2024).
Guo, D. et al. DeepSeek-R1: incentivizing reasoning capability in LLMs via reinforcement learning. Preprint at https://arxiv.org/abs/2501.12948 (2025).
Dogan, M. et al. Augmented object intelligence with XR-objects. In Proc. 37th Annual ACM Symp. User Interface Software and Technology 19 (Association for Computing Machinery, 2024).
Tang, Y. et al. Empowering LLMs with pseudo-untrimmed videos for audio-visual temporal understanding. Proc. 39th Annu. AAAI Conf. Artif. Intell. 39, 7293–7301 (AAAI Press, 2025).
Gu, A. & Dao, T. Mamba: linear-time sequence modeling with selective state spaces. Preprint at https://arxiv.org/abs/2312.00752 (2023).
Grauman, K. et al. Ego4D: around the world in 3,000 hours of egocentric video. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 18995–19012 (IEEE/CVF, 2022).
Ravi, N. et al. SAM 2: segment anything in images and videos. In Thirteenth Int. Conf. Learning Representations (2025).
Pallotta, E., Azar, S. M., Li, S., Zatsarynna, O. & Gall, J. SyncVP: joint diffusion for synchronous multi-modal video prediction. In Proc. EEE/CVF Conf. Computer Vision and Pattern Recognition 13787–13797 (EEE/CVF, 2025).
Wilson, J. & Lin, M. C. AVOT: audio-visual object tracking of multiple objects for robotics. In IEEE Int. Conf. Robotics and Automation 10045–10051 (IEEE, 2020).
Bao, C., Xu, J., Wang, X., Gupta, A. & Bharadhwaj, H. HandsOnVLM: vision-language models for hand-object interaction prediction. Preprint at https://arxiv.org/abs/2412.13187 (2024).
Wilson, A. & Hua, H. Design and demonstration of a vari-focal optical see-through head-mounted display using freeform alvarez lenses. Opt. Express 27, 15627–15637 (2019).
Article Google Scholar
Guan, P., Mercier, O., Shvartsman, M. & Lanman, D. Perceptual requirements for eye-tracked distortion correction in VR. In ACM SIGGRAPH 2022 Conf. Proc. 51 (Association for Computing Machinery, 2022).
Zhai, S., Morimoto, C. & Ihde, S. Manual and gaze input cascaded (MAGIC) pointing. In Proc. SIGCHI Conf. Human Factors in Computing Systems 246–253 (Association for Computing Machinery, 1999).
Patney, A. et al. Towards foveated rendering for gaze-tracked virtual reality. ACM Trans. Graph. 35, 1–12 (2016). This paper describes the well-known foveated rendering method considering human perception with experimental results.
Article MathSciNet Google Scholar
Hennessey, C., Noureddin, B. & Lawrence, P. A single camera eye-gaze tracking system with free head motion. In Proc. 2006 Symp. Eye Tracking Research & Applications 87–94 (Association for Computing Machinery, 2006).
Dierkes, K., Kassner, M. & Bulling, A. A fast approach to refraction-aware eye-model fitting and gaze prediction. In Proc. 11th ACM Symp. Eye Tracking Research & Applications 23 (Association for Computing Machinery, 2019).
Kothari, R. S. et al. Ellseg: an ellipse segmentation framework for robust gaze tracking. IEEE Trans. Vis. Comput. Graph. 27, 2757–2767 (2021).
Article Google Scholar
Fuhl, W., Kasneci, G. & Kasneci, E. TEyeD: over 20 million real-world eye images with pupil, eyelid, and iris 2D and 3D segmentations, 2D and 3D landmarks, 3D eyeball, gaze vector, and eye movement types. In 2021 IEEE Int. Symp. Mixed and Augmented Reality (ISMAR) 367–375 (IEEE, 2021).
Nair, N. et al. RIT-Eyes: rendering of near-eye images for eye-tracking applications. ACM Symp. Appl. Percept. 2020, 1–9 (2020).
Google Scholar
Rigas, I., Raffle, H. & Komogortsev, O. V. Photosensor oculography: survey and parametric analysis of designs using model-based simulation. IEEE Trans. Human–Machine Syst. 48, 670–681 (2018).
Article Google Scholar
Palmero, C., Komogortsev, O. V., Escalera, S. & Talathi, S. S. Multi-rate sensor fusion for unconstrained near-eye gaze estimation. In Proc. 2023 Symp. Eye Tracking Research and Applications 12 (Association for Computing Machinery, 2023).
Sarkar, N. et al. A resonant eye-tracking microsystem for velocity estimation of saccades and foveated rendering. In 2017 IEEE 30th Int. Conf. Micro Electro Mechanical Systems (MEMS) 304–307 (IEEE, 2017).
Angelopoulos, A. N., Martel, J. N., Kohli, A. P., Conradt, J. & Wetzstein, G. Event based, near eye gaze tracking beyond 10,000 Hz. IEEE Trans. Vis. Comput. Graph. 27, 2577–2586 (2021).
Article Google Scholar
Bonazzi, P. et al. A low-power neuromorphic approach for efficient eye-tracking. Preprint at https://arxiv.org/abs/2312.00425 (2023).
Moreno-Arjonilla, J. et al. Eye-tracking on virtual reality: a survey. Virtual Reality 28, 38 (2024).
Article Google Scholar
Aziz, S. et al. Evaluation of eye tracking signal quality for virtual reality applications: a case study in the meta quest pro. In Proc. 2024 Symp. Eye Tracking Research and Applications 7 (Association for Computing Machinery, 2024).
Hooge, I. T., Niehorster, D. C., Nyström, M., Andersson, R. & Hessels, R. S. Fixation classification: how to merge and select fixation candidates. Behav. Res. Methods 54, 2765–2776 (2022).
Article Google Scholar
David-John, B. et al. Towards gaze-based prediction of the intent to interact in virtual reality. In ACM Symp. Eye Tracking Research and Applications 2 (Association for Computing Machinery, 2021).
Peacock, C. E. et al. Gaze as an indicator of input recognition errors. Proc. ACM Human–Comput. Interact. 6, 1–18 (2022).
Article Google Scholar
Lohr, D. & Komogortsev, O. V. Eye know you too: toward viable end-to-end eye movement biometrics for user authentication. IEEE Trans. Inf. Forensics Security 17, 3151–3164 (2022).
Article Google Scholar
Wilson, E. et al. Eye gaze as a signal for conveying user attention in contextual AI systems. In Proc. 2025 Symp. Eye Tracking Research and Applications 1–7 (2025).
Lengyel, J. The convergence of graphics and vision. Computer 31, 46–53 (1998).
Article Google Scholar
SONY. Digital human technology that faithfully reproduces human facial expressions and movements. Sony.com https://www.sony.com/en/SonyInfo/research/technologies/digital_human/ (2022).
Seymor, M. et al. Meet Mike: epic avatars. In ACM SIGGRAPH 2017 VR Village (ed. Quesnel, D.) 12 (Association for Computing Machinery, 2017).
Alexander, O. et al. The Digital Emily Project: achieving a photorealistic digital actor. IEEE Comput. Graph. Appl. 30, 20–31 (2010).
Article Google Scholar
Klehm, O. et al. Recent advances in facial appearance capture. Comput. Graph. Forum 34, 709–733 (2015).
Article Google Scholar
McAuley, S. et at. Practical physically-based shading in film and game production. In ACM SIGGRAPH 2012 Courses (ed. McNamara, A.) 10 (Association for Computing Machinery, 2012).
Orvalho, V. et al. A facial rigging survey. In Eurographics 2021 - State of the Art Reports (eds Cani, M.-P. & Ganovelli, F.) 183–204 (The Eurographics Association, 2012).
Lewis, J. et al. Practice and theory of blendshape facial models. Eurographics 1, 2 (2014).
Google Scholar
Ekman, P. et al. Manual for the Facial Action Coding System (Consulting Psychologist Press, 1978).
Kanade, T. et al. Virtualized reality: constructing virtual worlds from real scenes. IEEE Multimed. 4, 34–47 (1997).
Article Google Scholar
Laurentini, A. The visual hull concept for silhouette-based image understanding. IEEE Trans. Pattern Anal. Mach. Intell. 16, 150–162 (1994).
Article Google Scholar
Furukawa, U. et al. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1362–1376 (2010).
Article Google Scholar
Collet, A. et al. High-quality streamable free-viewpoint video. ACM Trans. Graph. 34, 1–13 (2015).
Article Google Scholar
SONY. Volumetric capture studio. SonyPcl.jp https://www.sonypcl.jp/kiyosumi-shirakawa/index.html (2022).
Guo, K. et al. The relightables: volumetric performance capture of humans with realistic relighting. ACM Trans. Graph. 38, 1–19 (2019).
Google Scholar
Mildenhall, B. et al. NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 99–106 (2021).
Article Google Scholar
Kerbl, B. et al. 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42, 1–14 (2023).
Article Google Scholar
Cao, C. et al. Authentic volumetric avatars from a phone scan. ACM Trans. Graph. 41, 1–19 (2022).
Google Scholar
Qian, S. et al. GaussianAvatars: photorealistic head avatars with rigged 3D Gaussians. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (eds Akata, Z. et al.) 20299–20309 (IEEE/CVF, 2024).
Wang, R. et al. A survey on 3D human avatar modeling—from reconstruction to generation. Preprint at https://arxiv.org/abs/2406.04253 (2024).
Bagautdinov, T. et al. Driving-signal aware full-body avatars. ACM Trans. Graph. 40, 143 (2021). This work presents one of the recent digital human creation approaches, proposing a method for driving the digital human created from multi-view videos.
Article Google Scholar
Sutskever, I., Vinyals, O. & Le, Q. V. Sequence to sequence learning with neural networks. In Adv. Neural Information Processing Systems 27: Annual Conf. Neural Information Processing Systems (eds Ghahramani, Z. et al.) 3104–3112 (NeurIPS, 2014).
Vaswani, A. et al. Attention is all you need. In Adv. Neural Information Processing Systems 30: Annual Conf. Neural Information Processing Systems (eds Guyon, I. et al.) 5998–6008 (NeurIPS, 2017).
Holtzman, A., Buys, J., Du, L., Forbes, M. & Choi, Y. The curious case of neural text degeneration. OpenReview.net https://openreview.net/forum?id=rygGQyrFvH (2020).
Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conf. North. Am. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol., NAACL-HLT 2019 Vol. 1 (eds Burstein, J. et al.) 4171–4186 (ACL, 2019).
Bommasani, R. et al. On the opportunities and risks of foundation models. Stanford.edu https://crfm.stanford.edu/assets/report.pdf (2021). This paper provides a comprehensive description of the possibilities and risks of the foundation model represented by LLM in terms of capabilities, technical principles, applications and social impacts.
Brown, T. B. et al. Language models are few-shot learners. In Adv. Neural Information Processing Systems 33: Annual Conf. Neural Information Processing Systems 2020 (eds Larochelle, H. et al.) 1877–1901 (NeurIPS, 2020).
Ouyang, L. et al. Training language models to follow instructions with human feedback. In Adv. Neural Information Processing Systems 35: Annual Conf. Neural Information Processing Systems (eds Koyejo, S. et al.) 27730–27744 (NeurIPS, 2022).
Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. In Adv. Neural Information Processing Systems 35: Annual Conf. Neural Information Processing Systems (eds Koyejo, S. et al.) 24824–24837 (NeurIPS, 2022).
Park, J. S. et al. Generative agents: interactive simulacra of human behavior. In Proc. 36th Annual ACM Symp. User Interface Software and Technology (eds Follmer, S. et al.) 2 (Association for Computing Machinery, 2023).
Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Adv. Neural Information Processing Systems 33: Annual Conf. Neural Information Processing System (eds Larochelle, H. et al.) 9459–9474 (NeurIPS, 2020).
Peng, B. et al. RWKV: reinventing RNNs for the transformer era. In Findings of the Association for Computational Linguistics: EMNLP (eds Al-Onaizan, Y. et al.) 14048–14077 (ACL, 2023).
Alayrac, J. B. et al. Flamingo: a visual language model for few-shot learning. In Adv. Neural Information Processing Systems 35: Annual Conf. Neural Information Processing Systems (eds Koyejo, S. et al.) 23716–23736 (NeurIPS, 2022).
Touvron, H. et al. LLaMA: open and efficient foundation language models. Preprint at https://arxiv.org/abs/2302.13971 (2023).
OpenAI. GPT-4V(ision) System Card. OpenAI.com https://cdn.openai.com/papers/GPTV_System_Card.pdf (2023).
Gemini Team Google. Gemini: a family of highly capable multimodal models. Preprint at https://arxiv.org/abs/2312.11805 (2023).
Ng, E. et al. From audio to photoreal embodiment: synthesizing humans in conversations. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 1001–1010 (IEEE/CVF, 2024).
Zhao, Q. et al. Media2Face: co-speech facial animation generation with multi-modality guidance. In ACM SIGGRAPH 2024 Conf. Proc. (eds Burbano, A. et al.) 18 (Association for Computing Machinery, 2024).
Omine, T., Kawabata, N. & Homma, F. Co-speech gesture and facial expression generation for non-photorealistic 3D characters. In ACM Proc. SIGGRAPH Conf. Posters 8 (Association for Computing Machinery, 2025).
Song, Y., Dhariwal, P., Chen, M. & Sutskever, I. Consistency models. In Proc. 40th Int. Conf. Mach. Learn., ICML 2023, Proc. Mach. Learning Research Vol. 202 (eds Krause, A. et al.) 32211–32252 (PMLR, 2023).
Warburton, M. et al. Measuring motion-to-photon latency for sensorimotor experiments with virtual reality systems. Behav. Res. 55, 3658–3678 (2023).
Article Google Scholar
Bailey, R. E., Arthur III, J. J. & Williams, S. P. Latency requirements for head-worn display S/EVS applications. In Enhanced and Synthetic Vision 2004 Vol. 5424 (ed. Verly, J. G.) 98–109 (SPIE, 2004).
Lincoln, P. et al. From motion to photons in 80 microseconds: towards minimal latency for virtual and augmented reality. IEEE Trans. Vis. Comput. Graph. 22, 1367–1376 (2016).
Article Google Scholar
Mark, W. R., McMillan, L. & Bishop, G. Post-rendering 3D warping. In Proc. 1997 Symp. Interactive 3D Graphics (ed. Dam, A. V.) 7–16 (Association for Computing Machinery, 1997).
Waveren, J. V. The asynchronous time warp for virtual reality on consumer hardware. In Proc. 22nd ACM Conf. Virtual Reality Software and Technology (eds Kranzlmüller, D. & Klinker, G.) 37–6 (Association for Computing Machinery, 2016).
Carmack, J. Latency mitigation strategies. #AltDevBlogADay https://web.archive.org/web/20130225013015/http://www.altdevblogaday.com/2013/02/22/latency-mitigation-strategies/ (2013). This article introduces the theoretical framework and practical implementation of time warping that became the foundation for the ATW, establishing the industry-standard approach to motion-to-photon latency compensation in consumer VR headsets.
Ai, T. FPGA design & implementation of a very-low-latency video-see-through (VLLV) head-mount display (HMD) system for mixed reality (MR) applications. Proc. 15th ACM SIGGRAPH Conf. Virtual-Reality Contin. Its Appl. Ind. 1, 39–42 (2016).
Google Scholar
Siegel, M. & Nagata, S. Just enough reality: comfortable 3-D viewing via microstereopsis. IEEE Trans. Circuits Syst. Video Technol. 10, 387–396 (2000).
Article Google Scholar
Macedo, M. C. & Apolinario, A. L. Occlusion handling in augmented reality: past, present and future. IEEE Trans. Vis. Comput. Graph. 29, 1590–1609 (2021).
Article Google Scholar
Chaurasia, G., Nieuwoudt, A., Ichim, A. E., Szeliski, R. & Sorkine-Hornung, A. Passthrough+: real-time stereoscopic view synthesis for mobile mixed reality. Proc. ACM Comput. Graph. Interact. Tech. 3, 1–17 (2020).
Article Google Scholar
Buyssens, P., Meur, O. L., Daisy, M., Tschumperlé, D. & Lézoray, O. Depth-guided disocclusion inpainting of synthesized RGB-D images. IEEE Trans. Image Process. 26, 525–538 (2017).
Article MathSciNet Google Scholar
Ishihara, A. et al. Integrating both parallax and latency compensation into video see-through head-mounted display. IEEE Trans. Visual. Comput. Graph. 29, 2826–2836 (2023).
Article Google Scholar
Smart, B., Zheng, C., Laina, I. & Prisacariu, V. Splatt3R: zero-shot Gaussian splatting from uncalibrated image pairs. Preprint at https://arxiv.org/abs/2408.13912 (2024).
Zhu, Z., Fan, Z., Jiang, Y. & Wang, Z. FSGS: real-time few-shot view synthesis using gaussian splatting. In Computer Vision – ECCV 2024, LNCS Vol. 15097 (eds Leonardis, A. et al.) 145–163 (Springer, 2024).
Adaval, R., Saluja, G. & Jiang, Y. Seeing and thinking in pictures: a review of visual information processing. Consum. Psychol. Rev. 2, 50–69 (2019).
Article Google Scholar
Kress, B. C. Optical Architectures for Augmented-, Virtual-, and Mixed-reality Headsets (SPIE, 2020). This review covers the challenge of basic optical architectures and the working principles of emerging optical technologies in XR HMDs.
Xiong, J., Hsiang, E. L., He, Z., Zhan, T. & Wu, S. T. Augmented reality and virtual reality displays: emerging technologies and future perspectives. Light. Sci. Appl. 10, 216 (2021).
Article Google Scholar
Usukura, N., Minoura, K. & Maruyama, R. Novel pancake-based HMD optics to improve light efficiency. J. Soc. Inf. Disp. 31, 344–354 (2023).
Article Google Scholar
Ding, Y., Luo, Z., Borjigin, G. & Wu, S. T. Breaking the optical efficiency limit of pancake optics in virtual reality. SID Symp. Dig. Tech. Pap. 55, 567–570 (2024).
Article Google Scholar
Chen, B. et al. Ultra-thin, ultra-light, rainbow-free AR glasses based on single-layer full-color SiC diffractive waveguide. Preprint at https://arxiv.org/abs/2409.14487 (2024).
Yoshida, T. et al. A plastic holographic waveguide combiner for light-weight and highly-transparent augmented reality glasses. J. Soc. Inf. Disp. 26, 280–286 (2018).
Article Google Scholar
Chen, X. et al. Grating waveguides by machine learning for augmented reality. Appl. Opt. 62, 2924–2935 (2023).
Article Google Scholar
Maimone, A. and Fuchs, H. Computational augmented reality eyeglasses. In IEEE Int. Symp. Mixed and Augmented Reality (ISMAR) 29–38 (IEEE, 2013).
Rathinavel, K. et al. Varifocal occlusion-capable optical see-through augmented reality display based on focus-tunable optics. IEEE Trans. Vis. Comput. Graph. 25, 3125–3134 (2019).
Article Google Scholar
Itoh, Y. et al. Occlusion leak compensation for optical see-through displays using a single-layer transmissive spatial light modulator. IEEE Trans. Vis. Comput. Graph. 23, 2463–2473 (2017).
Article Google Scholar
Kramida, G. Resolving the vergence-accomodation conflict in head-mounted displays. IEEE Trans. Vis. Comput. Graph. 22, 1912–1931 (2015).
Article Google Scholar
Yeom, H. J., Hong, K. & Park, M. High-quality phase-only Fourier hologram generation with camera-in-the-loop. Opt. Express 33, 6615–6628 (2025).
Article Google Scholar
Kim, D. et al. Holographic parallax improves 3D perceptual realism. ACM Trans. Graph. 43, 1–13 (2024).
Google Scholar
Cheung, S. et al. Non-volatile heterogeneous III-V/Si photonics via optical charge-trap memory. Preprint at https://arxiv.org/abs/2305.17578 (2023).
Liu, J. G. & Ueda, M. High refractive index polymers: fundamental research and practical applications. J. Mater. Chem. 19, 8907–8919 (2009).
Article Google Scholar
Lü, C. & Yang, B. High refractive index organic–inorganic nanocomposites: design, synthesis and application. J. Mater. Chem. 19, 2884–2901 (2009).
Article Google Scholar
Guo, Y. et al. A survey of the state of art in monocular 3D human pose estimation: methods, benchmarks, and challenges. Sensors 25, 2409 (2025).
Article Google Scholar
Hong, J., Choi, R. & Leonard, J. J. Learning from feedback: semantic enhancement for object SLAM using foundation models. Preprint at https://arxiv.org/abs/2411.06752 (2024).
Moon, S., Kim, J., Lee, C. K. & Rho, J. Single-layer waveguide displays using achromatic metagratings for full-colour segmented reality. Nat. Nanotechnol. 20, 747–754 (2025).
Article Google Scholar
Malhotra, Y., Liu, X. & Mi, Z. Design principles and performance limitation of InGaN nanowire photonic crystal micro-LEDs. IEEE Photonics J. 17, 1–8 (2025).
Article Google Scholar
Dua, M., Akanksha & Dua, S. Noise robust automatic speech recognition: review and analysis. Int. J. Speech Technol. 26, 475–519 (2023).
Article Google Scholar
Pillalamarri, R. & Shanmugam, U. A review on EEG-based multimodal learning for emotion recognition. Artif. Intell. Rev. 58, 131 (2025).
Article Google Scholar

Download references

Acknowledgements

The authors thank J. Tanaka, Y. Fukumoto, T. Kitao and K. Akutsu for providing advice regarding digital humans, motion capture, pose estimation and 3D mapping, and eye tracking, respectively.

Author information

Authors and Affiliations

Technology Strategy Division, Digital & Technology Platform, Sony Group Corporation, Tokyo, Japan
Hiroshi Mukawa
Content Technology Research & Development Division, Technology Development Laboratories, Sony Corporation, Tokyo, Japan
Yoichi Hirota & Hiroki Mizuno
Technology Development Division, Business Incubation Platform, Sony Group Corporation, Tokyo, Japan
Makoto Murata & Fuminori Homma
Interaction Technology Research & Development Division, Technology Development Laboratories, Sony Corporation, Tokyo, Japan
Keita Mochizuki, Ryo Ogawa & Yuki Mamishin
Network & System Technology Research & Development Division, Technology Development Laboratories, Sony Corporation, Tokyo, Japan
Hiroyuki Aga
AI Technology Division, Advanced Technology, Digital & Technology Platform, Sony Group Corporation, Tokyo, Japan
Jun Yokono & Daiki Shimada
AI&DX Technology Division, Sony Semiconductor Solutions Corporation, Tokyo, Japan
Masaki Fukuchi

Authors

Hiroshi Mukawa
View author publications
Search author on:PubMed Google Scholar
Yoichi Hirota
View author publications
Search author on:PubMed Google Scholar
Hiroki Mizuno
View author publications
Search author on:PubMed Google Scholar
Makoto Murata
View author publications
Search author on:PubMed Google Scholar
Fuminori Homma
View author publications
Search author on:PubMed Google Scholar
Keita Mochizuki
View author publications
Search author on:PubMed Google Scholar
Ryo Ogawa
View author publications
Search author on:PubMed Google Scholar
Yuki Mamishin
View author publications
Search author on:PubMed Google Scholar
Hiroyuki Aga
View author publications
Search author on:PubMed Google Scholar
Jun Yokono
View author publications
Search author on:PubMed Google Scholar
Daiki Shimada
View author publications
Search author on:PubMed Google Scholar
Masaki Fukuchi
View author publications
Search author on:PubMed Google Scholar

Contributions

H. Mukawa devised the overall structure of the manuscript; contributed to writing ‘Introduction’, ‘Overview of the metaverse’, ‘Extended reality workflow and technologies’, ‘Sensing and recognition technologies’, ‘Content generation technologies’, ‘Output technologies for optical displays’ and ‘Outlook’; and is also responsible for reviewing the entire article. Y.H. and H. Mizuno contributed to writing the ‘Digital replication of humans’ section. M.M. and F.H. contributed to writing ‘Conversational AI for metaverse NPCs’. K.M. contributed to writing the ‘Motion sensing’ section. H.A. and M.F. contributed to writing ‘Pose estimation and 3D mapping‘. H.A. contributed to writing ‘Motion-to-photon latency compensation’. R.O. and Y.M. contributed to writing ‘Eye tracking’. J.Y. and D.S. contributed to writing ‘Scene understanding’.

Corresponding author

Correspondence to Hiroshi Mukawa.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Electrical Engineering thanks Frank Seto, Jeff Stafford and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mukawa, H., Hirota, Y., Mizuno, H. et al. Extended reality technologies for applications in the metaverse. Nat Rev Electr Eng (2025). https://doi.org/10.1038/s44287-025-00211-4

Download citation

Accepted: 19 August 2025
Published: 25 September 2025
DOI: https://doi.org/10.1038/s44287-025-00211-4