亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

EDVAM:a 3D eye-tracking dataset for visual attention modeling in a virtual museum?#

2022-02-18 13:46:56YunzhanZHOUTianFENGShihuiSHUAIXiangdongLILingyunSUNHenryBeenLirnDUH

Frontiers of Information Technology & Electronic Engineering 2022年1期

Yunzhan ZHOU, Tian FENG?, Shihui SHUAI, Xiangdong LI,Lingyun SUN, Henry Been-Lirn DUH

1Department of Computer Science, Durham University, Durham DH1 3LE, UK

2Department of Computer Science and Information Technology, La Trobe University, VIC 3086, Australia

3Alibaba Group, Hangzhou 311121, China

4Department of Digital Media, Zhejiang University, Hangzhou 310027, China

5International Design Institute, Zhejiang University, Hangzhou 310058, China

Abstract: Predicting visual attention facilitates an adaptive virtual museum environment and provides a contextaware and interactive user experience. Explorations toward development of a visual attention mechanism using eye-tracking data have so far been limited to 2D cases, and researchers are yet to approach this topic in a 3D virtual environment and from a spatiotemporal perspective. We present the first 3D Eye-tracking Dataset for Visual Attention modeling in a virtual Museum, known as the EDVAM. In addition, a deep learning model is devised and tested with the EDVAM to predict a user’s subsequent visual attention from previous eye movements. This work provides a reference for visual attention modeling and context-aware interaction in the context of virtual museums.

Key words: Visual attention; Virtual museums; Eye-tracking datasets; Gaze detection; Deep learning

1 Introduction

Supported by head-mounted displays (HMDs),virtual museums can represent real-world cultural and historical exhibits through virtual reality (VR)techniques, and offer an immersive and satisfactory user experience. Researchers have proposed various interaction methods to enhance user experience with an improved sense of presence using haptic feedback(Azmandian et al., 2016; Hirota and Tagawa, 2016;de Jesus Oliveira et al., 2016; Lopes et al., 2017),hand tracking(LaViola,2015;Davis et al.,2016;Hirota and Tagawa, 2016), or motion tracking (Suma et al., 2015; Nielsen et al., 2016). In comparison,context-aware interaction enables more significant improvement on user experience in virtual museums,but it requires acquisition of user behaviors and the corresponding adaptations.

Visual attention is a useful type of user behaviors in VR. Therefore, research on its mechanism and prediction becomes meaningful in the area of context-aware interaction. Current research on the visual attention mechanism focuses on visual saliency detection in images (Cerf et al., 2008;Judd et al.,2009;Jian et al.,2011;Lang et al.,2012;Mathe and Sminchisescu, 2012; Zhao and Koch, 2012; Xu et al.,2015;Zhu et al.,2015;Kruthiventi et al.,2017)and videos (Engelke et al., 2010; Riche et al., 2013;Fang et al.,2016;Fu et al.,2017),but is limited to 2D cases. In addition, prediction of “when” and “what”users will notice in a 3D virtual environment, such as a virtual museum, remains unclear. We have also observed that the related datasets were not labeled regarding timestamps, and could hardly represent sequential behaviors or support context-aware interaction in a 3D virtual environment.

In this study, we present a 3D Eye-tracking Dataset for Visual Attention modeling in a virtual Museum, named EDVAM. The EDVAM includes 9 604 480 visual attention records from users during their navigation. We divide these records into two subsets: the raw subset holds the captured eye movement sequences and the practical subset comprises the processed samples. To build the EDVAM,we use a novel approach to achieve gaze-based 3D interaction, which enables user interaction with virtual objects and acquires visual attention records from real-time eye movements. To our knowledge,the EDVAM is the first 3D eye-tracking dataset in a virtual environment. To illustrate its potential contribution to research on visual attention, we devise a deep learning model to predict a user’s visual attention in the next moment from previous records.Trained on our dataset,this model provides a benchmark and an approach to context-aware interaction(e.g.,displaying interfaces on the next region of space that a user would view).

We summarize the contributions of this study as follows:

1.Construct the first 3D eye-tracking dataset in a virtual museum, with a focus on visual attention modeling;

2. Design a deep learning model to predict user visual attention in the next moment.

This paper is an extension of the work originally presented in Zhou et al.(2019).

2 Related work

Our study relates to topics concerning virtual museums, user experiences in VR, visual attention,and eye-tracking datasets. Selected studies are discussed and compared to ours.

2.1 Virtual museums

Museums present artwork and exhibits to the public and are learning hubs that provide rich interaction experiences, which are now usually integrated with information technologies. For example, a virtual museum, augmented by personal digital assistants (PDAs), provides an intuitive artwork information guide, with which participants retrieve knowledge related to geographic locations (Hou HT et al., 2014). Augmented reality (AR) enables a conventional museum to support direct interactions with exhibits and their augmented images, promoting engagement with content about cultural heritage(Ciolfi et al., 2015). A projector-based virtual museum builds a large-scale museum with a 120°field of view (FoV) for an extraordinary immersive environment (Carrozzino and Bergamasco, 2010; Koskenranta et al., 2013). Recent advances in HMDs represent the entire real world in VR with finely reproduced artworks and an enhanced sense of immersion(Beer,2015;Barbieri et al., 2018).

2.2 User experiences in VR

A sense of presence can improve the user experience in VR and enable users to feel like they are in the real world. This requires the mechanism of human perception and its implementation in VR.

Recent studies have employed haptics to solve the problem as mentioned above. Azmandian et al.(2016) proposed a method to warp a virtual environment to match a physical device’s location in the user’s surrounding for haptic feedbacks. Lopes et al.(2017) used electrical muscle stimulation to provide haptic feedbacks. Hand tracking and motion tracking have also received attention in the field. Hirota and Tagawa (2016) implemented a hand-tracking method using manipulation with a deformable hand,and Davis et al. (2016) accomplished a similar task using 3D gesture recognition. Motion-tracking methods focus on user movements with improved ease and accuracy(Suma et al.,2015;Nielsen et al., 2016).

These studies enhanced the sense of presence in VR for a better user experience. However, they paid less attention to user needs,on which this study focuses for a user-adapted interactive experience in a virtual museum.

2.3 Visual attention

Mechanisms of visual attention can be categorized into two types: bottom-up and top-down(Connor et al., 2004). The bottom-up mechanism depends on raw sensory input and rapid and involuntary attention shifts to salient visual features of potential importance. For instance, salient stimuli popping up in the surroundings could attract users(Itti, 2000). Previous bottom-up models focused on saliency detection (Itti et al., 1998; Shokoufandeh et al., 1999; Kadir and Brady, 2001; Hou XD and Zhang, 2007). Itti et al. (1998) proposed a classical model whose architecture mimics the properties of primate early vision,combining multi-scale image features into a single topographical saliency map.The following models detect salient regions from the perspective of multiple scales (Shokoufandeh et al.,1999; Kadir and Brady, 2001). Recently, Hou XD and Zhang (2007) proposed a fast method of constructing saliency maps based on the image’s spectral residual,outperforming Itti et al.(1998)’s method.

In contrast, the top-down mechanism concentrates on long-term human cognitive strategies and bias attention toward particular objects in a specific situation (e.g., colored spots when hungry, sudden movements when afraid of predators)(Connor et al.,2004). Cerf et al.(2008)presented a significant combined model of face detection and low-level saliency.Some later models detect salient regions using classifiers on an eye-tracking dataset(Judd et al., 2009;Zhao and Koch,2012).

These 2D-oriented methods do not perfectly suit 3D tasks, in which saliency detection relates to the temporal aspect of the visual image and becomes more complicated. Understanding temporal visual attention requires collection of sequential eyetracking data. Specifically, context-aware interaction would be possible if the user’s visual attention in the next moment could be predicted based on the analysis of collected eye-tracking data.

2.4 Eye-tracking datasets

An eye-tracking dataset maps collected data(i.e., features) to targets (i.e., labels), that are feature-label pairs for research on saliency models.Its related research has become active in recent years.

The existing eye-tracking datasets can be summarized into image datasets and video datasets. The majority include lightly compressed data, whereas only a few are made up of uncompressed data(Winkler and Subramanian, 2013). No existing image dataset comprises more than 40 participants (Bruce and Tsotsos, 2006; Ehinger et al., 2009; Liu and Heynderickx, 2009; Ramanathan et al., 2010; Kootstra et al.,2011)and no video dataset includes more than 55 participants(Itti,2004;Carmi and Itti,2006;Alers et al., 2012; Hadizadeh et al., 2012). In comparison, the EDVAM involves the largest group of 63 participants.

Existing datasets,including the recent ones with a 360°feature(Lo et al.,2017;Rai et al.,2017;David et al., 2018; Sitzmann et al., 2018), are still limited to 2D cases. Regardless of the number of 3D objects being viewed, the recorded eye-tracking data would be mapped onto a 2D plane,remaining at the image level and regarded as a 2D projection of 3D objects. In addition, participants are not allowed to freely move, observe, or gaze at objects from different angles and positions during the creation of these 2D datasets. To fill these gaps, we propose the first 3D eye-tracking dataset, including real-time visual attention records in a virtual museum.

3 EDVAM

To build the EDVAM, we collected the data in the context of a virtual museum supported by HMDs(Fig. 1). We assumed that the virtual museum was holding an exhibition focusing on an antique ceramic bowl placed at the center, together with other related exhibits near the surrounding walls. We designed several user interfaces (UIs) involving texts,images,and video clips to describe the ceramic bowl.A user wearing a VR headset navigated inside the virtual museum and freely viewed exhibits during a visit. At the same time, we recorded the real-time eye movements of the user and the exhibits viewed.For details about the virtual museum, we suggest that interested readers refer to Sun et al. (2018).

We divided the collected data into two subsets. The raw subset included the captured sequences of eye movements with 44 attributes as their features.The practical subset comprised 145 370 items sampled from the raw subset. Each item was derived from a fixed-length eye movement sequence and was given an extra label compared to the one in the raw subset. We formatted both in CSV files to ensure data accessibility and compatibility. The dataset is publicly available(https://github.com/YunzhanZHOU/EDVAM).

3.1 Participants and devices

Sixty-three participants studying at a university in East China were recruited for data collection, including 26 female and 37 male (age: mean=23.44,standard deviation=1.81). We paid each participant$8 and required them to list any eye-related disabilities before the task. We also informed them that the task would not involve either violent or sexual content, and they would be able to exit whenever they felt nauseated.

The devices for data collection included an Oculus Rift DK2 VR headset for the participant’s access to the virtual museum,a joystick for the participant’s navigation, and a monitor displaying the real-time video streaming from the participant’s view. To enable eye tracking on the VR headset, we attached a gadget (i.e., a Pupil Labs monocular add-on cup with an infrared (IR) mirror, IR LEDs, and an HD camera) to its left-hand display. In particular, the HD camera tracked the participant’s point-of-gaze(PoG) with a tracking accuracy of less than 1°.

During the task, the VR headset and the eyetracking gadget recorded the participant’s eye movements and activities in the virtual museum. The navigation stage lasted for 3-5 min without any time constraint. The entire task finished within 10 min,including the preparation stage.

Fig. 1 Virtual museum used in this study (from top to bottom: overview, local-view, and top-view)

3.2 Gaze-based 3D interaction in VR

To enable gaze-based 3D interaction with virtual objects, we introduced a novel approach that maps 2D PoG positions to the corresponding 3D positions in VR, which contributed to obtaining realtime eye movements for recording visual attention.Fig.2 illustrates the approach.

Our approach took as input an image that describes the user’s eye movement captured by the eyetracking gadget at a sampling rate of 30 Hz. The image included a set of eye-movement parameters[t,C,P], wheretdenotes the elapsed time since the system’s last restart,Cdenotes the confidence in[0,1] (i.e., 1 equals 100% confidence) of a comprehensive analysis at the level of image processing,andPdenotes the PoG data at timet. These parameters were calibrated and matched to the VR planeαto recognize the corresponding 2D PoG position[t,C,(xα,yα)t,C] onαwithCconfidence at timet.However, this was insufficient for our goal of gazebased 3D interaction in VR, so we conducted spatial mapping to obtain the exact 3D PoG position[t,C,(xs,ys,zs)t,C] being observed in the VR spaceswithCconfidence at timet.

In our approach, the spatial mapping step was based on the ray-object intersection of the ray casting algorithm (Roth, 1982), which was the methodological basis for 3D modeling and 2D image rendering, as shown in Fig. 3.Given the camera’s positionO, we first obtained the 3D coordinates of the 2D PoG positionA(xα,yα) in the VR world spaces, denoted byA′(xs,ys,zs), using Unity’s method Camera.ViewportToWorldPoint(Unity Technologies,2019). This step was necessary because the viewport space and the VR world space shared different coordinate systems, and the clipping volume captured by the camera changed in a real-time manner. A ray was then cast fromOand throughA′to find its first intersection with an object in the VR world space, which was regarded as the 3D PoG positionB(xs,ys,zs).

Fig. 2 Pipeline of gaze-based 3D interaction

The approach’s last step triggered a gaze input[t0,It0] from a series of 3D PoG positions:

Fig.3 Mapping a 2D PoG position to the corresponding 3D PoG position in the VR space

Fig. 4 Gaze cursor presented as a circle at the center

3.3 Task procedure

At the preparation stage, we asked the participants to watch a 3-min introductory video clip about the virtual museum’s activities. Thereafter, they proceeded to a demo scene and familiarized themselves with the VR headset and the joystick for 3 min.Each of them put on and fixed the VR headset, and looked at nine white dots sequentially displayed on the VR screen, fixing the relative positions of the participant’s head and the VR headset to ensure the mapping accuracy.

At the navigation stage, a participant could interact with virtual objects in several ways. For example,she/he might walk through each corner in the virtual museum using the joystick while controlling the speed. She/He was able to use gazes and head movements to support the joystick-based navigation,as if in a real museum. It was also possible to use the gaze cursor to interact with the UIs,as shown in Fig. 4. When the participant gazed at a UI, a circle appeared with a progress bar,triggering an input operation after viewing the interaction area for 2 s.

We allowed participants to choose their own navigation paths,freely interacting with the exhibits,and encouraged them to create unique choices for diversity in the collected data. During the task, we recorded the eye-tracking movement at a frequency of 30 Hz. In addition, we interviewed participants about their experience and recorded their feedback.

3.4 Data collection

As shown in Table 1, we collected two types of data in the task: (1)the VR gaze data,via both the VR device and the eye-tracking gadget, referring to the 3D eye movement sequences with 11 features;(2)the pupil data, containing the 2D gaze information with 33 features.

The VR gaze data recorded participants’ spatiotemporal activities in the 3D VR space. Asshown in Table 2, the timestamp feature represents the temporal dimension with a precision of 0.001 s.Three-dimensional PoG position features indicate the place that the participant is observing in the virtual museum. The camera’s position features show the participant’s position, and its orientation features describe a participant’s head orientation.

Table 1 Types of collected data

The pupil data were recorded in a normalized coordinate system that is irrelevant to the virtual environment. As shown in Table 3,we employed 30 feature channels concerning gaze and the pupil from the eye-tracking gadget (Pupil Labs, 2020). The pupil detector’s measurement confidence was also used as a feature.

3.5 Raw subset

Due to the different timestamps,we merged the data items with similar timestamps and combined both types of data into the raw subset,including the eye-tracking data of 63 participants. Because of the sampling frequency used in data collection, 30 data items were produced per second. Each data item had 44 features with no labels,and each corresponded to a unique timestamp.

3.6 Practical subset

The aim of context-aware interaction requires learning from user behaviors and predicting visualattention in the next moment,which enables the system to adapt to users accordingly and synchronously.For example, a UI is displayed in real time near the next objects in which the user may be interested. To achieve context-aware interaction, we need not only eye movements but also subsequent visual attention.Therefore,the raw subset was further processed into the practical subset, which included the above two pieces of information.

Table 2 Gaze data details

Table 3 Pupil data details

For the previous eye movements, we sampled the raw subset data using a time window of 10 s.Each adjacent time window was 1 frame apart from the next window at the frequency of 30 Hz. We regarded each time window as an input instance and constructed the instance matrixs ∈Rn×fas

whereIi(i= 1,2,...,n) denotes eye movements,nrepresents the number of eye movements in 10 s(i.e.,n=300),andfrefers to the number of features(i.e.,f= 40). An instance matrix indicates the previous eye movements in a fixed duration.

As for subsequent visual attention, we divided the VR space into 12 areas: upper interface, central interface, lower interface, south open space, north pillar, east pillar, southwest exhibit, northwest exhibit, southeast exhibit, piano area, central floor,and central ceiling. The following analysis was conducted to map user fixations in these areas to obtain the position of visual attention.

There are two reasons behind the division of the VR space: (1) Because the EDVAM is the first 3D eye-tracking dataset in a virtual museum, no previous study contributed to the visual attention prediction in a 3D VR space. Hence, we resorted to the most related work on 360°videos (Fan et al.,2017) and proposed that a fixation prediction network could predict the future viewing probability of each video tile. We extended the concept of tile and divided the VR space into various 3D tiles. (2)Each area demonstrates a candidate region and stands for the spatial scope of the adaptation in a context-aware environment.

The next question is about determining the area that locates the fixation,especially when the fixation is at the junction of two areas. We devised a solution to this challenge. Because the fixation included PoGs that depend on its start time and duration, it was possible to calculate the number of PoGs in each area and determine the one to which the fixation most likely belongs as

whereNi(i= 1,2,...,n) denotes the number of PoGs in theitharea. The area with the maximum number of PoGs was regarded as the one containing the fixation.

We observed that the number of PoGs was 0 in an area that was not estimated as one to which the fixation belonged in most cases. This observation supports the reliability of the fixation-based approach to visual attention in the next moment.

We built the practical subset by matching each time window to a visual attention area according to the timestamp, and subdivided it into a training set and a test set,ensuring that the samples were drawn from different participants.

4 Predictive deep learning model

To predict the visual attention and validate the collected dataset,we devised a three-layer long shortterm memory(LSTM)network deep learning model,because LSTM has shown satisfactory performance in classification, processing, and prediction of temporal data(Gers et al.,2000;Eck and Schmidhuber,2002;Chen et al., 2015).

4.1 Feature extraction

Each input instance from the practical subset was a time window with 40 features. To reduce the workload and increase the accuracy,we adopted the most relevant 10 features via pre-experimental analysis: 3D PoG positions(3 feature channels),camera’s position (3 feature channels), camera’s orientation(3 feature channels) from the VR gaze data, and confidence (1 feature channel)from the pupil data.

The influence of previous eye movements on subsequent visual attention was enabled by appending the last 10-s time window with increasing weights.We assigned the highest weight to the latest subwindow based on the assumption that the more recent the experience is, the more influential it may be on the current trial for selecting the target virtual object again in the UI task, as the mechanism of human memory works (Li et al., 2018). Table 4 demonstrates that one frame was sampled per six frames during the first 5 s, and that the sampling was terminated during the latest 0.33 s.

4.2 Model design

Previous research in deep learning proposed that adding more layers to a neural network (NN)would improve its capacity to yield complex behaviors(LeCun et al.,2015). Accordingly,we added two hidden layers to an LSTM network to model highdimensional eye movements. At each step, the network took as input an eye movement itemInwith 10 features, as shown in Fig. 5. We used 3× 78 LSTM cells in the recurrent layer. A hidden state of the current cell propagated to the next cells and its output propagated to deeper cells, because past eye movements might be captured in the user’s current behavior input and the hidden state of the previous steps. We set the hidden dimension to 20. The model predicted the area to be viewed, given the linear transformationLof the output from the last LSTM cell.

4.3 Experiments

We experimented with our model on the practical subset of the EDVAM, and performed trainingand test tasks on a workstation with a 3.7-GHz CPU, 64-GB RAM, and an Nvidia Geforce GTX 1080 graphics card with 8-GB RAM.The model was trained using back-propagation on the training set and optimized via mini-batch gradient descent with a batch size of 128 for 40 epochs.

Table 4 Temporal dimension sampling

Fig. 5 Architecture of the three-layer LSTM network

We computed a cross-entropy loss to measure the performance of the trained model on prediction as

wherefyjdenotes thejthelement of the label score vector,fyirefers to the scores of the correct labels,andNrepresents the batch size. The training was accomplished by minimizing the loss.

We validated the trained model on the test set.Each test instance comprised 10-s eye movements and the corresponding ground truth on visual attention. Fig. 6 illustrates the cross-entropy loss over 40 epochs in a declining trend. The loss converged rapidly in both training and validation. In particular, it underwent fluctuations from the 6thto the 15thepoch after the first five epochs. At this stage,the optimizer hovered around local minimums,looking for the global one. After 15 epochs,both training loss and validation loss converged to the global optimum. The prediction accuracy reached 78.94%,and predicting each sample required less than 0.02 ms,satisfying the real-time requirement.

In the example shown in Fig. 7, our model took as input the 3D PoG position sequences with confidences, camera position sequences, and camera orientation sequences, and then predicted that the next visual attention would be the southeast exhibit,based on the user’s past behaviors around the piano area.

Fig. 6 Per-epoch loss when training and validating the LSTM network

4.4 Analysis of gender effects

Although a previous study indicated that men and women devoted approximately the same amount of attention to a virtual environment (Felnhofer et al.,2012),we further explored whether the gender of a user would affect the prediction accuracy of our model. In particular, we trained it with data from either male users or female users in addition to the instance discussed in Section 4.3. Table 5 shows the performance of our model in the three cases. We observed no significant difference in the prediction accuracy regarding the user’s gender in any instance.This can be interpreted as both female and male users having a similar visual attention pattern while navigating in a less gender-sensitive virtual environment like a museum.

5 Applications

This study, including the eye-tracking dataset and the predictive model, is expected to support a context-aware virtual museum environment with the eye-tracking dataset and the predictive model.Such an interaction system adapts itself based on the results of learning user behaviors. In particular,the adaptation mentioned in this study refers to the real-time and intelligent UI display near the objects that may interest users. Therefore, a potential research direction is to define the rules of the adaptive UIs and improve the model’s prediction accuracy for context-aware interaction.

Saliency-aware rendering can also benefit from this study. An improved sense of immersion in VR demands higher HMD screen resolutions, whereas low-delay images can hardly support it. Previous studies implemented foveated rendering techniques for image synthesis with progressively fewer details outside the eye fixation region. With our predictivemodel for visual attention, it becomes possible to render the potential salient area and blur others in advance,reducing the workload of HMDs notably.

Table 5 Prediction accuracy of the model regarding the user’s gender

Fig. 7 An example of predicting the next visual area: (a) 3D PoG position sequences (white trajectories)and the camera’s position sequences (black trajectory) visualized inside the virtual museum; (b) the camera’s orientation sequences represented by the user’s head motions; (c) the next visual area predicted by the model;(d) the corresponding ground truth

Researchers may exploit our dataset to explore the mechanism of 3D visual attention (e.g., identification and classification of eye movements in a virtual environment, the effects of eye movement features on prediction). Based on the learned knowledge about eye movements, we have confidence in making progress in context modeling, personalized interaction,and virtual museum design.

6 Conclusions and discussion

Previous visual attention studies and datasets have concerned us due to their 2D case limitations,inadequate freedom for users, and lack of consideration of temporal aspects, which are significantly different from the real world. We disagree that these studies are completely capable of enabling contextaware modeling in a 3D virtual environment.

In this paper, we introduced the EDVAM, the first 3D eye-tracking dataset in a virtual museum,to fill the gap, and proposed a predictive model for visual attention based on previous eye movements.Our model, based on the LSTM network, supports fundamental context-aware interactions in a 3D virtual museum. Overall, this study contributes to enabling a virtual museum’s adaptiveness for a contextaware user experience. It helps users interact with virtual objects and adaptive UIs through a personalized virtual museum tour.

A significant limitation of this study lies in the devices used in the task. According to the participants’ feedback, the Oculus Rift DK2 HMD has room for improvement in terms of precision and resolution. For example, some users complained about the coarse detail in the virtual museum caused by the HMD’s low resolution. Although the participants interacted with virtual objects using a joystick and 3D PoGs in the task,the sense of presence still required other interactions (e.g., haptics and hand tracking).The use of VR HMDs with improved hardware and the introduction of multiple interaction methods can improve the 3D virtual environment and the quality of the collected data.

Currently, our model predicts only a limited number of visual areas. Alternatively, each visual area can be divided into more fine-grained subareas for training and improvement of our model’s capability.

Despite the analysis showing no significant gender effect in the trained model instances, it is still worth investigating the generality of our approach(i.e., collecting eye-tracking data in different virtual museums)and the capability of our model to capture the potential individual differences in visual attention (i.e., conducting extended analysis with other user information including age, education, and cultural background).

We expect this study to serve as a reference for visual attention modeling and context-aware interaction in 3D virtual environments other than museums.

Contributors

Yunzhan ZHOU, Tian FENG, and Xiangdong LI designed the research. Yunzhan ZHOU and Shihui SHUAI processed the data. Yunzhan ZHOU and Tian FENG drafted the paper. Xiangdong LI, Lingyun SUN, and Henry Been-Lirn DUH helped organize the paper. Yunzhan ZHOU and Tian FENG revised and finalized the paper.

Compliance with ethics guidelines

Yunzhan ZHOU, Tian FENG, Shihui SHUAI, Xiangdong LI, Lingyun SUN, and Henry Been-Lirn DUH declare that they have no conflict of interest.

Data availability

The dataset that supports the findings of this study is publicly available at https://github.com/YunzhanZHOU/EDVAM.

Frontiers of Information Technology & Electronic Engineering2022年1期

Frontiers of Information Technology & Electronic Engineering的其它文章: Coverage performance of the multilayer UAV-terrestrial HetNet with CoMP transmission scheme?; Design and optimization of a gate-controlled dual direction electro-static discharge device for an industry-level fluorescent optical fiber temperature sensor＊; A relation spectrum inheriting Taylor series:muscle synergy and coupling for hand?; Identification of important factors influencing nonlinear counting systems?; Intelligent radio access networks:architectures,key techniques,and experimental platforms?; An energy-efficient reconfigurable asymmetric modular cryptographic operation unit for RSA and ECC