Optimized viewport‐adaptive 360‐degree video streaming

2021-10-19 01:19:04XiaoleiChenDiWuIshfaqAhmad

CAAI Transactions on Intelligence Technology 2021年3期

Xiaolei Chen| Di Wu | Ishfaq Ahmad

1College of Electrical and Information Engineering,Lanzhou University of Technology,Lanzhou,China

2Department of Computer Science and Engineering,University of Texas at Arlington, Arlington,Texas,USA

Abstract Because of the rapid growth of head‐mounted displays and 5G networking deployment,360‐degree video has become increasingly popular. To generate the real experience of a virtual environment,360‐degree videos require an ultrahigh resolution and frame rate to cover an omnidirectional view. These two prerequisites impose challenges for the transmission bandwidth and storage capacity of 360‐degree video streaming.To reduce bandwidth and storage waste while providing a good immersive experience, we propose an optimized viewport‐adaptive 360‐degree video streaming method using high‐efficiency video coding tiling,motion‐constrained tile sets and MPEG dynamic adaptive streaming over HTTP spatial relationship description.The paper describes the rigorous design of the optimized system,which can assign different bitrates to different tiles in the viewport.The experimental results show that the proposed streaming system compares favourably with existing methods in terms of bitrate savings and storage capacity reduction.

1 | INTRODUCTION

The video technology field is evolving toward providing immersive and interactive virtual reality (VR) experiences for users by using 360‐degree video, also known as panoramic,spherical or omnidirectional video (ODV). Typically, video is captured by multiple cameras that cover 360 degrees of the scene and individual camera views are projected onto a sphere that covers the whole 360 × 180‐degree viewing range,which allow viewers to look around freely in different ways during playback. This results in a more immersive experience than traditional two‐dimensional (2D) video, which covers only a limited plane. Different devices can be used for 360‐degree video;they range from traditional desktop computers to smart phones and tablets and to professional head‐mounted displays(HMD) such as HTC Vive, Oculus Rift, Samsung Gear VR,and Google Cardboard.When desktop computers are used,the mouse or keyboard can be employed to interact with the 360‐degree video. On smart phones and tablets, the viewing direction can be changed by touch interaction or by moving around the smart phones and tablets.On an HMD,the viewing direction can be changed with viewers' head movements.

For an immersive visual experience, a 360‐degree video should have high resolution (4K is widely viewed as a functional minimum resolution, and 8K or higher is desired) to achieve high fidelity and a high frame rate (e.g. 90 frames per second [fps]) to avoid motion sickness for the viewers. This leads to high video bitrates, causing severe resource consumption on storage and bandwidth that hinder service providers such as YouTube from effectively delivering 360‐degree video over bandwidth‐constrained networks. A conventional approach for delivering 360‐degree video can be realized by encoding the content as a single‐layer bitstream and transmitting it to the receiver. The receiver decodes the full 360‐degree bitstream,and the region that corresponds to the user's viewing orientation is rendered and displayed by using an HMD. Because users can view only a portion of the 360‐degree video at each time instant, streaming the whole 360‐degree content in the highest resolution and quality consumes an unnecessarily high bandwidth of the network. To tackle this problem, viewport‐adaptive 360‐degree video streaming methods have been proposed. The underlying idea of these methods is that,based on the viewer's current viewing direction, the server streams the viewport of the 360‐degree video at a higher bitrate or resolution and the non‐viewport parts of the 360‐degree video at a lower bitrate or resolution.Compared with the viewport agnostic streaming methods,viewport adaptive methods provide bandwidth‐efficient 360‐degree video streaming while keeping a good degree of quality of experience.However, these improvements still do not fully solve the bandwidth and storage problem. Therefore, the 360‐degree video streaming still requires a considerably large bandwidth.This is because current viewport adaptive 360‐degree video streaming methods cannot distinguish the quality of different regions in the viewport. In reality, even within the viewport,the gaze part that is the centre of a viewer's viewport should have better quality than non‐gaze parts of the viewport[1].We solved these neglected problems by proposing a method that exploits high‐efficiency video coding (HEVC) tiling, MCTSs and MPEG‐ dynamic adaptive streaming (DASH) SRD to assign different bitrates to different tiles in the viewport. Our approach spatially divides 360‐degree videos into multiple tiles while encoding and packaging, then uses MPEG‐DASH SRD to design an MPD file that can allocate different bitrates to different tiles in the viewport of 360‐degree videos. The experimental results indicate that our method can save on the bitrate and reduce the transferred MPD files from server to client without affecting the quality of the streaming 360‐degree video compared with existing methods.

? We propose basic relationships between tiles in the viewport and divide the tiles in a viewport into three categories:a gaze tile that has the highest navigation likelihood, the four horizontal and vertical neighbour tiles that have the medium navigation likelihood and the four diagonal neighbour tiles that have the least navigation likelihood.

? We define three priority levels for different tiles in viewport according to their different navigation likelihoods. During streaming, different tiles are assigned different bitrates according to their priority.

? We propose an optimized viewport‐adaptive 360‐degree video streaming system with MCTSs. Based on the user's navigation likelihoods, the gaze tile of the viewport is chosen from a track that is encoded by a high bitrate.The four horizontal and vertical neighbour tiles are selected from tracks that are encoded by medium bitrate. The four diagonal neighbour tiles are selected from tracks that are encoded by a low bitrate.

? We implement the adaptive 360‐degree video streaming system and demonstrate significant reductions in bitrate and transferred MPD files from server to client.

The rest of the paper is organized as follows: Section 2 briefly presents the background and related works. In Section 3, the proposed optimized viewport‐adaptive 360‐degree video streaming method is described. The implementation details of the proposed method are described in Section 4.The experiments and evaluation results are presented in Section 5.Finally, conclusions are given in Section 6.

2 | BACKGROUND AND RELATED WORK

In this section, the basic concepts of HEVC tiling and MPEG‐DASH SRD are reviewed. Then, the related work of viewport‐adaptive 360‐degree video streaming is briefly introduced.

2.1 | High‐efficiency video coding tiling

In the HEVC standard,a video frame is partitioned into square regions called coding tree units (CTUs), which represent the basis of the coding process. Several rectangular sets of CTUs are aggregated into units called tiles [2] to form CTU‐aligned frame partitioning. A video frame can be horizontally and vertically divided into columns and rows using tiling. Tiles provide more flexibility to partitioning and induce less compression penalty because tiles do not contain a header.Intraprediction does not cross tile boundaries and the entropy coding state is reinitialized at the beginning of every tile.Hence, each constructed tile is independently decodable from other tiles within the same frame. This characteristic can be used to address and process individual portions of a frame separately.In terms of interprediction,by using the concept of MCTS, motion vectors are allowed to cross tile boundaries within a set of tiles but not across the tile set,which enables the support for decoding a freely selected set of tiles. Tiling has two advantages: first, tiles have better rate‐distortion performance in case of high‐level parallelization. Second, tiling can be used for additional region of interest(ROI)functionality,to ensure that the ROI tiles are independently decodable from non‐ROI tiles and that temporal and spatial predictions within the ROI do not refer to pixels outside the ROI. These two advantages make tiling suitable for viewport‐adaptive 360‐degree video streaming,because the visible region of the video is only a small part of the whole video content.

A 360‐degree video frame with equirectangular projection(ERP) [3] format is divided into tiles. When the user changes his or her head to another direction from the current view angle, the viewport will be switched accordingly. Because different tile partitions will result in a different panoramic experience, we will compare the influence of different tile partitioning methods on the performance of the proposed algorithm.

2.2 | MPEG‐dynamic adaptive streaming spatial relationship description

MPEG‐DASH[4]is an ISO standard(ISO/IEC 23009‐1)that enables effective adaptive video streaming over HTTP. In MPEG‐DASH, each given video has a set of DASH representations that contain its different bitrate levels. Each DASH representation, which has its bitrate level, consists of multiple self‐decodable time segments(chunks),which can be requested individually and decoded by DASH players.In this context,to reduce both the bitrate consumption of the end user and the visual distortion of the viewport, as well as to improve the bitstream decoding performance using parallel decoding features, 360‐degree video frames in each chunk can also be divided into self‐decodable spatial regions.The quality(bitrate,resolution,etc.)of the delivered video can be widely adapted to the viewers' viewport, clients' capabilities and network conditions. The main concept of MPEG‐DASH is shown in Figure 1.

F I G U R E 1 Illustration of MPEG‐dynamic adaptive streaming system [5]

Multimedia content is stored on an HTTP server and is accompanied by a MPD file, which is an XML file that represents how a video is composed of available video adaptation sets and the different characteristics of these sets, such as bitrates, resolutions, HTTP uniform resource locator addresses, and so on. This structure provides the binding of the segments to the bitrate (resolution, etc.) and temporal information (e.g. start time and duration of the segments). The MPD file is fed to the client at the beginning of video playback.On the client side,the video player first requests the MPD that contains the temporal and spatial information for the media content. Based on that information, it will request the individual segments that best fit its requirements. Then, the selected video segments from a video server are sent to the client. The main aim of MPEG‐DASH is to provide a high‐quality streaming experience based on the client's bandwidth.

As a part of ISO/IEC 23009‐1:2015, a new amendment called SRD[6]has been added to the MPEG‐DASH standard.SRD explores how to define spatial relationships among partitioned video tiles.In particular,the SRD is fully integrated in the MPD of MPEG‐DASH and is used to describe a grid of rectangular tiles that allows client implementation to request only a given ROI typically associated with a contiguous set of tiles.This feature of SRD can be used to provide the necessary signaling to transmit each tile of a given content and reconstruct the full 360 degrees of the scene. We will use MPEG‐DASH SRD to explore the prioritization of the tiles in the viewport by assigning different bitrates to different tiles in the viewport based on the navigation likelihood of the viewer.

2.3 | Viewport adaptive 360‐degree video streaming

Recently,various viewport‐adaptive 360‐degree video streaming methods have been proposed. According to Nguyen et al. [7],these methods can be classified into two groups: estimation‐based and history‐based. In estimation‐based methods, tiles closer to the estimated viewport position are assigned higher versions. As a result, the tiles' versions will gradually decrease fromthe estimatedviewport position.In history‐basedmethods,the version of a tile is decided based on how that tile has been viewed by past users.If a tile has been watched by most users,that tile will be assigned a high version.In contrast,tiles that have been rarely watched will have lower versions.

2.3.1 | Estimation‐based methods

Xie et al. [8] developed a perceptual model considering the adaptation quality for viewport‐adaptive 360‐degree video streaming to maximize the adaptation quality by refining the quality from a low‐quality scale to a high‐quality one.The high‐quality scale is applied for the content in a current viewport but a lo‐quality scale elsewhere.They further enhanced the adaptation quality by applying the hidden Markov model–based viewport prediction and high‐quality content prefetching. Ozcinar et al.[9,10]proposed adaptive 360‐degree video streaming system to compute optimal DASH representations by using the characterization provided by visual attention maps. To achieve this goal,the visual attention spherical weighted based quality measurement has been proposed;then,the use of tiling schemes to represent the 360‐degree video content is considered by means of variable‐sized and non‐overlapping tiles. The proposed system is able to determine optimal pairs of tiling scheme and non‐uniform bitrate allocation within tiles per each chunk of every representation. Huang et al. [11] proposed a tile‐based 360‐degree VR video transmission scheme and a corresponding buffer strategy on heterogeneous networks with multiuser access.To improve the experience of users,they jointly consideredsaliency invideos,field of viewand the channel quality states of users. The proposed scheme adaptively chooses the most appropriate Wi‐Fi access point connection and allocates heterogeneous long‐term evolution‐wireless local area network resources at the same time for each tile of each user.Moreover,they proposed a highly effective heuristic search algorithm to solve an NP‐hard mixed‐integer problem with low complexity.Moreover, a novel buffer updating strategy is proposed to tackle the buffering problem of 360‐degree video streaming.Ozcinar et al.[12]introduced a new adaptive 360‐degree video streaming system, using visual attention maps. The developed system was aimed at an enhanced quality of ODV streaming viewed in HMDs. For this, the proposed method used dynamic‐sized tiles per chunk and a varying bitrate allocation per tile by consulting the estimated visual attention maps. Ozcinar et al. [13] introduced a novel end‐to‐end streaming system for VR,which resulted in enhanced viewport quality under varying bandwidths and different viewport trajectories. The proposed system includes tiling,a novel MPD for DASH,and viewport‐aware bitrate level selection methods. The integration of these methods enables the proposed system to deliver very high‐resolution 360‐degree video at good visual quality. Sanchez et al.[14]described an optimization problem of tiled streaming for 360‐degree video that targets finding the random access point period for each sequence that leads to the minimum transmitted bitrate,while ensuring that most of the time,users watch high‐resolution content.Zare et al.[15]proposed storing two versions of the same video content at different resolutions,each divided into multiple tiles using the HEVC standard.According to the user's current viewport, a set of tiles is transmitted in the highest captured resolution, while the remaining parts are transmitted from the low‐resolution version of the same content.To enable different combinations to be chosen selectively, the tile sets are encoded to be independently decodable.They further studied the trade‐off in the choice of tiling scheme and its impact on compression and streaming bitrate performance.Corbillon et al.[16]proposed a viewport‐adaptive 360‐degree video streaming system. In this system, the server prepares multiple video representations,which differ not only by their bitrate but also by the qualities of different scene regions.The client chooses a representation for the next segment such that its bitrate fits the available throughput,and a full quality region matches its viewing.Graf et al. [17] explored various options enabling the bandwidth‐efficient adaptive streaming of 360‐degree video over HTTP.They presented a system architecture and implemented basic tools to facilitate the evaluation of different encoding and streaming options using tiles within HEVC/H.265. Hosseini et al. [18] demonstrated an adaptive bandwidth‐efficient 360‐degree video streaming system based on MPEG‐DASH SRD.They extended MPEG‐DASH SRD to the 3D space of 360‐degree videos and showcased a dynamic viewpoint‐aware adaptation technique to tackle the high‐bandwidth demands of streaming 360‐degree videos to wireless VR headsets. They spatially partitioned the underlying 3D mesh into multiple 3D submeshes and constructed an efficient 3D geometry mesh called a hexaface sphere to represent tiled 360‐degree videos optimally in the 3D space. They use MPEG‐DASH SRD to describe the spatial relationship of tiles in the 3D space and prioritized the tiles in the field of view for viewpoint‐aware adaptation. Feuvre et al. [19] described how spatial access can be performed in an adaptive HTTP streaming context, using tiling of the source content, MPEG‐DASH and its SRD extensions.

F I G U R E 2 Overview of the proposed adaptive 360‐degree video streaming system

2.3.2 | History‐based methods

Chakareski et al. [20] proposed a framework for viewport‐driven rate distortion‐optimized 360‐degree video streaming that integrates the user view navigation pattern and the spatiotemporal rate distortion characteristics of the 360‐degree video content to maximize the delivered user quality of experience for the given network or system resources. The framework is composed of a method for constructing heat maps that capture the likelihood of navigating different spatial segments of a 360‐degree video over time by the user, an analysis and characterization of its spatiotemporal rate distortion characteristics that leverage preprocessed spatial tilling of the 360‐degree view sphere, and an optimization problem formulation that characterizes the delivered user quality of experience given the user navigation patterns, 360‐degree video encoding decisions, and the available system/network resources. Nguyen et al. [21] proposed a new adaptation approach for viewport‐adaptive streaming of 360‐degree videos over the Internet. The proposed approach is systematically able to decide versions of tiles according to user head movements and network conditions by considering not only viewport estimation errors but also users’ head movements in each segment duration.

3 | PROPOSED METHOD

We present our optimal approach to streaming HEVC tiled 360‐degree video using MPEG‐DASH SRD, as depicted in Figure 2.

First,on the server side,the raw ERP 360‐degree videos is encoded using the H.265 standard and divided into 3×3,4×4 or 5 × 5 spatial tiles (Section 3.1). Second, the encoded motion‐constrained tiled videos are packaged into MP4 containers.Third,the packaged video is dashed to split the tiles of each frame into different tracks and to generate the corresponding MPD files;these tracks and the MPD files are stored at an HTTP server for streaming (Section 3.2). On the user client side, according to the viewer's current viewport, the client first downloads the MPD file using the HTTP protocol from the server.Then,it will parse the MPD file and a tile set request is signaled to the HTTP server. Based on the request,the tiles used to form the viewport are adaptively chosen from different tracks that are encoded by different bitrates and sent to the client.

F I G U R E 3 Tilling for viewport of input 360‐degree video

To avoid high delay,the selected tiles of each frame are sent at the same time. The client receives and decodes the tiles'versions and then stitches the decoded tiles' versions to reconstruct the 360‐degree video.

3.1 | H.265 video tiling and encoding

Tiling is an innovation introduced by the HEVC standard to divide the video spatially into regions called tiles. The tiles can be physically separated from each other and reconstructed in a common stream that can be decoded by a single decoder. We used fixed tiling to cut the video frames evenly into 3 × 3, 4 × 4 or 5 × 5 uniform tiles. Figure 3 shows the 3 × 3 tiling for the viewport of an input 360‐degree video at a given time. Then, the tiled 360‐degree video is encoded three times with different bitrates to generate three HEVC bitstreams.

3.2 | Tile priority and MPD extension for 360‐degree video

Inspired by basic relationships between pixels in a digital image processing field, we divide nine tiles in a viewport into three categories,as shown in Figure 4:gaze tile(red tile in Figure 4),denoted byt;four horizontal and vertical neighbor tiles(yellow tiles in Figure 4) oft, denoted byN4(t); and four diagonal neighbor tiles (green tiles in Figure 4) oft, denoted byND(t).

F I G U R E 4 Three categories of tiles in viewport

F I G U R E 5 Non‐adaptive and adaptive MPD composition. (a) Non‐adaptive MPD composition and (b) adaptive MPD composition

According to observations made in Chakareski et al. [20],in most instances,a gaze tilet(Tile5)has the highest navigation likelihood,the four horizontal and vertical neighbor tiles(N4[t][Tile2,Tile4,Tile6 and Tile8])have the medium likelihood,and the four diagonal neighbor tiles (ND[t] [Tile1, Tile3, Tile7 and Tile9]) have the least likelihood. Therefore, we defined three priority levels for different tiles in the viewport according to their different navigation likelihoods. The priorities rank from L1 (highest priority) to L2 (medium priority) and L3 (lowest priority).During streaming,different tiles are assigned different bitrates according to their priority.

However, in the default MPD file, all tile tracks are selected only from L1 or L2 or L3 encoded video.The cons of the non‐adaptive method are wasted network bandwidth,increased transferred MPD files from server to client and decreased visual quality of the viewport. We use the MPEG‐DASH SRD to modify the non‐adaptive MPD file into the adaptive MPD file, as illustrated in Figure 5. Owing to space limitations, we only gave the L1 case for non‐adaptive MPD composition.

In the adaptive MPD file, the gaze tile of the viewport is chosen from the input video, which is encoded by the L1 bitrate. The four horizontal and vertical neighbor tiles of the gaze tile are selected from the input video,which is encoded by the L2 bitrate.The four diagonal neighbor tiles of the gaze tile are selected from the input video,which is encoded by the L3 bitrate.The L1,L2 and L3 bitrates are calculated as following:in which Rcurrentis the current available bandwidth. ⊿1and⊿2are two parameters determined by experiments on the training dataset, and in this work, ⊿1is set to 2 and ⊿1is set to 20.

An example of an adaptive MPD file is depicted in Listing 1; because of limited space, some content is omitted.

4 | IMPLEMENTATION

In this section, we provide a detailed description of the procedure to enable reproducibility of the experiments performed in this work.

4.1 | Server implementation

On the server side, a 360‐degree video downloaded from YouTube is processed into MPD files and media segments withmultiple quality representations in four steps. The server performs these steps using key components:

TA B L E 1 Encoding time using different bitrates for Mega coaster

(1) Raw 360‐degree video extractor: The 360‐degree video downloaded from YouTube is in the MP4 format. To generate a tile‐based 360‐degree video, it is necessary to extract the raw 360‐degree video from the MP4 video using the command: C:ffmpeg ‐i ‘C: est.mp4’ ‐pix_fmt yuv420p ‘C: test.yuv’

(2) HEVC tile‐based 360‐degree video generator: The raw 360‐degree video is re‐encoded into motion‐constrained tiling‐based HEVC 360‐degree video using the command:C:kvazaar‐‐input test.yuv‐‐input‐res 3840x2160‐‐input‐fps 30 ‐‐bitrate L1or L2 or L3 ‐‐tiles 3x3 or 4x4 or 5x5‐‐slices tiles ‐‐mv‐constraint frametilemargin ‐‐output test.hevc

The desired bitrate and tiling scheme can be adjusted by modifying the ‘‐‐bitrate’ and ‘‐‐tiles’ parameters in the command. This command needs to be executed three times to generate three HEVC 360‐degree videos with different bitrates. Assuming the current available network bandwidth is 20 Mbps,according to Equation(1),the bitrates for L1,L2 and L3 are 20, 10 and 1 Mbps, respectively.

(3) MP4 tile‐based 360‐degree video packer: The re‐encoded HEVC 360‐degree video is packaged into MP4 containers to split the tiles of each frame into different tracks using the command:C:mp4box‐add test.hevc:split_tiles‐fps 30‐new testpackage.mp4

This command needs to be executed three times to pack each of the HEVC 360‐degree videos obtained in the second step into MP4 containers.

(4) MPD Generator: The MPD files that support tile‐based 360‐degree video dashing is obtained using the command:C:mp4box ‐dash 1000 ‐profile live ‐out test.mpd ‐segment‐name segment/%s testpackage.mp4

This command can generate two files named xxx.mpd and xxx_set1_init.mp4, and a folder named ‘segment,’ which contains multiple xxxpackage_track_y_z.m4s files and multiple xxxpackage_track_y_init.mp4 files.This command needs to be executed three times to generate three MPD files with multiple quality representations.

(5) HTTP Server:The HTTP server stores the MPD files and media segments.

TA B L E 2 Specifications of 360‐degree videos in test dataset

TA B L E 3 Average bitrate and quality of dashed video for different methods using 3 × 3 tiling

TA B L E 4 Bitrate and transferred MPD file savings for different methods with the same quality using 3 × 3 tiling

4.2 | Client implementation

We implement the 360‐degree video player based on the multimedia open‐source project(GPAC).1https://gpac.wp.imt.fr/The playercan request m4s files from different segment folders stored at the server side according to the type of tile(gaze tile,four horizontal and vertical neighbor tiles of the gaze tile,or four diagonal neighbor tiles of the gaze tile) in the viewport to generate a new MPD file for playing. It is also responsible for measuring and recording the 360‐degree video playout performance.

4.3 | Complexity of proposed method

F I G U R E 6 Visual comparison of a specific frame within Kangaroo island for different methods using 3×3 tiling.(a)Original method and(b)Proposed method

F I G U R E 7 Visual comparison of a specific frame within Drone shot for different methods using 3 × 3 tiling. (a) Original method and (b) Proposed method.

F I G U R E 8 Visual comparison of a specific frame within Blue angels for different methods using 3×3 tiling.(a)Original method and(b)Proposed method

F I G U R E 9 Visual comparison of a specific frame within Help for different methods using 3 × 3 tiling. (a) Original method and (b) Proposed method

F I G U R E 1 0 Visual comparison of a specific frame within Mega coaster for different methods using 3×3 tiling.(a)Original method,(b)Proposed method,(c) Sanchez et al. [14], (d) Xiao et al. [28], (e) Details of (a), (f) Details of (b), (g) Details of (c) and (h) Details of (d)

F I G U R E 1 1 Comparison of average bitrate and MPD files storage saving rate under different tiling scheme

We counted the execution time of the implementation steps described in Sections 4.1 and 4.2. Among these steps,because the execution time of the second step (HEVC tile‐based 360‐degree video generator) accounts for the vast majority and the execution time of other steps is negligible compared with it, we use the execution time of the second step to illustrate the complexity of the proposed method.Table 1 shows the encoding time for the test sequence Mega coaster, which is one of the 360‐degree videos in our test dataset. For the 20‐s segment from Mega coaster, when the bitrate is 1, 10 and 20 Mbps, the encoding time is 13.083, 36.025 and 47.908 s, respectively. Therefore, the total encoding time is 97.016 s. For test sequence Drone shot, Kangaroo island, Blue angels and Help, the total encoding times are 96.413, 94.218, 93.368, and 105.361 s,respectively.

5 | EXPERIMENTS RESULTS AND DISCUSSION

5.1 | Experimental environment

We used a laptop whose configuration includes an Intel Corei5 CPU with 1.6 GHz, 8 GB DDR4 and a 256‐GB solid‐state drive as the HTTP‐based 360‐degree video streaming server. We used the 360‐degree video player to collect statistics for the average bitrate and quality of dashed videos, which are encoded by different methods. We also compared our results with the no‐adaptation approach and two representative viewport‐adaptive 360‐degree video streaming methods.

We chose eight 360‐degree videos from the original dataset in Lo et al. [22] as our training dataset. These 360‐degree videos are divided into three categories: (1) computer‐generated fast‐paced, (2) natural image fast‐paced and (3)natural image slow‐paced. They were encoded in H.264 and stored in MP4 format.

We downloaded five different 360‐degree videos [23–27]from YouTube to form our test dataset.They were also divided into three categories as the training dataset and in MP4 format.Table 2 summarizes information about the 360‐degree videos in our test dataset. All of the videos are in 4K or 720P resolution at 30 fps. The videos come in different lengths, so we extracted a 20‐s segment from each for experiments.

6 | RESULTS AND DISCUSSION

As Table 3 shows,using 3×3 tiling,on the one hand,when all of the tiles were encoded by the same bitrate, the average bitrate and quality2In a GPAC‐based 360‐degree video player, the quality state is the sum of the entire bandwidth of the selected object, including dependent objects.(or rather, objective quality) of the dashed video increased as the bitrate increased. On the other hand,compared with the method when all of the tiles were encoded by the highest bitrate level and the method of Sanchez et al.[14], the proposed 360‐degree video streaming method obtained the same quality with a lower bitrate. For example, for the video named Kangaroo island,when the quality was 18,710 Kbps,the average bitrates achieved by the three methods were 231,040, 174,736 and 109,690 Kbps, respectively.

We calculated the bitrate and file size savings for different methods,as shown in Table 4 In the table,savings 1 represents the bitrate or file size saving rate achieved by the proposed method compared with the original method, and savings 2 represents the bitrate or file size saving rate achieved by the proposed method compared with other state‐of‐art methods.

The table shows that with the same quality,compared with the original method, for the five 360‐degree videos in the test dataset, the average bitrate savings achieved by the proposed method are 52.52%, 52.98%, 53.54%, 54.98% and 56.86%,respectively. The file size savings achieved by the proposed method are 57.97%, 51.70%, 55.17%, 56.56% and 51.60%,respectively.Compared with the method of Sanchez et al.[14],the average bitrate savings achieved by the proposed method are 37.23%,37.49%,37.74%,39.05%and 40.18%,respectively.Compared with the method of Xiao et al [28], the file size savings achieved by the proposed method are 17.53%,16.83%,15.81%, 17.13% and 19.10%, respectively.

Except for the objective quality, we evaluated the subject quality under different methods,as shown in Figure 6–10.The screenshots indicate that compared with the no‐adaptation method, there are minor visual changes from a user's perspective that are sometimes imperceptible when using our adaptation method. Compared with the methods of Sanchez et al. [14] and Xiao et al [28], our method can achieve better subject quality, as shown in Figure 7.

Furthermore,we performed experiments using 4×4 or 5× 5 tiling. We counted the average bitrate savings and transferred MPD files savings rate under different tiling schemes.The experimental results are shown in Figure 11. Compared with the 4 × 4 or 5 × 5 tiling scheme, the 3 × 3 scheme achieved a better average bitrate saving rate and transferred MPD files saving rate for the five test videos.For example,for Kangaroo island, the average bitrate savings for 3 × 3, 4 × 4 and 5×5 tiling was 52.52%,34.42%and 49.72%,respectively.The MPD files storage saving rate for 3×3,4×4 and 5×5 tiling was 57.95%, 41.47% and 52%, respectively.

7 | CONCLUSION

We proposed an optimized viewport‐adaptive 360‐degree video streaming method with the aim of reducing the high bandwidth and storage capacity requirements of current 360‐degree video streaming solutions. The proposed method can assign different bitrates to different tiles in the viewport. The experimental results demonstrate that our method can further improve the performance of existing viewport‐adaptive 360‐degree video streaming methods in terms of bitrate savings and a reduction in the storage capacity.In future work,we will use deep learning to cope better with 360‐degree video streaming.

ACKNOWLEDGEMENTS

The authors would like to thank the National Natural Science Foundation of China for funding this work (Nos. 61967012,61866022 and 61861027).

ORCID

Xiaolei Chenhttps://orcid.org/0000-0001-9060-5369

Appendices

TA B L E A5 Average bitrate and quality of dashed video for different methods using 4 × 4 tiling

TA B L E A6 Bitrate and transferred MPD files savings for different methods with the same quality using 4 × 4 tiling

TA B L E A7 Average bitrate and quality of dashed video for different methods using 5 × 5 tiling

TA B L E A8 Bitrate and transferred MPD files savings for different methods with the same quality using 5 × 5 tiling

CAAI Transactions on Intelligence Technology2021年3期

CAAI Transactions on Intelligence Technology的其它文章: Relating brain structure images to personality characteristics using 3D convolution neural network; Comparative study for machine learning classifier recommendation to predict political affiliation based on online reviews; Consistent image processing based on co‐saliency; Accurate detection method of pig's temperature based on non‐point source thermal infrared image; Performance evaluation of deep neural networks for forecasting time‐series with multiple structural breaks and high volatility; Contrast of multi‐resolution analysis approach to transhumeral phantom motion decoding

亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放