Ping Wu and Ming Li
(R&D Center,ZTE Corporation,Nanjing 210012,China)
Abstract The high-efficiency video coding(HEVC)standard is the newest video coding standard currently under joint development by ITU-TVideo Coding Experts Group(VCEG)and ISO/IECMoving Picture Experts Group(MPEG).HEVCis the next-generation video coding standard after H.264/AVC.The goals of the HEVCstandardization effort are to double the video coding efficiency of existing H.264/AVCwhile supporting all the recognized potentialapplications,such as,video telephony,storage,broadcast,streaming,especially for large picture size video(4k×2k).The HEVC standard will be completed as an ISO/IECand ITU-Tstandard in January 2013.In February 2012,the HEVC standardization process reached its committee draft(CD)stage.The ever-improving HEVCstandard has demonstrated a significant gain in coding efficiency in rate-distortion efficiency relative to the existing H.264/AVC.This paper provides an overview of the technical features of HEVCclose to HEVCCD stage,covering high-level structure,coding units,prediction units,transform units,spatial signal transformation and PCM representation,intra-picture prediction,inter-picture prediction,entropy coding and in-loop filtering.The HEVCcoding efficiency performances comparing with H.264/AVC are also provided.
Keyw ords HEVC;JCTVC;AVC;H.264;MPEG-2;MPEG-4;standards;video
T he high-efficiency video coding(HEVC)standard is the newest video coding standard and is undergoing the development in the Joint Collaborative Team on Video Coding(JCT-VC)of ITU-TSG16 WP3 Video Coding Expert Group(VCEG)and ISO/IECJTC1/SC29/WG11(also known as Moving Picture Expert Group(MPEG))[1].HEVC aims to be the next generation video coding standard after the existing H.264/AVC(Advanced Video Coding-ISO/IEC 14496-10)[2].It is well known that H.264/AVC is superior to MPEG-2 in video coding efficiency by 50%,i.e.similar quality but half the bitrate.The target of HEVCis to double the video coding efficiency of H.264/AVC[3].The quality is mainly on subjective quality,while some peak signal-to-noise ratio(PSNR)based objective quality assessment methods are also employed during the standardisation process.The HEVC standardisation process reached Committee Draft(CD)stage in February 2012,and will become International Standard(ISO/IEC)by July 2013.However,usually when a CD stage is reached,the general form of a video coding standard will be stable.
Video coding standardisation for telecommunications has evolved through ITU-TH.261[4],H.262(MPEG-2)[5],H.263(with its later enhancements of H.263+and H.263++)[6]and H.264/AVC[2].The video coding standards,e.g.MPEG-2 and H.264/AVC,are now widely used for the transmission of standard-definition(SD)and high-definition(HD)TVsignals over satellite,cable,and terrestrial emission and for the storage of high-quality video signals on ordinary mediums,such as DVDs,blue-ray disks,hard disks and so on.The ever-growing demands of high-definition 4K×2K video with better resolution and wider colour gamut as well as flexible streaming in various application scenarios continually raise the requirements of higher video coding efficiency with a flexible coding structure arrangement,which have been well covered in HEVC standardisation process.
The Call for Proposals on HEVC[1]was finally issued in January 2010.The test model under consideration(TMuC)was established three months later[7].In October 2010,the first HEVC test model(HM)was created[8].After 5 versions of working drafts(WD)[9]-[13]and HM reference software,the HEVC codec has been improved continuously,which ensures a matured design for HEVC CD stage.
In this paper,the main structure and the video coding tools in HEVC will be described in details which will be accurate at least close to HEVC CD stage.The remainder of this paper is organized as follows.In section 2,some applications and HEVC high-level structures willbe briefly presented.Insection 3,the major features in HEVC design are described.Performance comparisons between the draft HEVC standard and H.264/AVC High Profile is reported in section 4.Finally,section 5 concludes this paper.
As with previously successful video coding standards,the HEVC standard is designed to provide technical solutions for at least the following application areas:
·cable TV(CATV)over opticaland copper networks
·direct broadcast satellite(DBS)video services
·digital subscriber line(DSL)video services
·digitalterrestrial television broadcasting(DTTB)
·interactive storage media(ISM),for example,opticaldisks
·multimedia mailing(MMM)
·multimedia services over packet networks(MSPN)
·real-time conversational(RTC)services,for example,video conferencing and video phone
·remote video surveillance(RVS)
·serial storage media(SSM),for example,digital VTR.
All these applications may be deployed in existing and future networks,which raises the question of how to handle a variety of applications and networks.To address this requirement for flexibility the HEVC design covers a video coding layer(VCL)as wellas a network abstraction layer(NAL),which is the same as the layer structure in H.264/AVC.In the high level structure of an HEVC encoder,NAL is located below VCLto provide“network friendliness”to support simple and effective customization of the use of the VCLfor a broad variety of systems,where the similar concepts to the counterparts in H.264/AVC,such as NALunit,access unit,etc.,are used.In VCL,the same concepts and applications of sequence parameter set(SPS)and picture parameter set(PPS)are adopted by JCT-VCinto HEVCto convey the information which rarely changes and is referred to in decoding a large number of VCL NAL units.
To achieve high coding efficiency,HEVC introduces several picture-levelcoding tools,including scaling list[14],sample adaptive offset(SAO)[15],and adaptive loop filter(ALF)[16],whose parameters may stay the same when coding the slices in a picture but change among pictures.To share such information among slices effectively to support processing slices in parallel while facilitating the updating and referring to the tool parameters,adaptation parameter set(APS)[17]is designed and introduced to HEVC.As a new parameter set used for picture adaptive data(especially ALFdata),APS forms a major feature in HEVC parameter set structure.
In an HEVC codec,each coded picture is represented in block-shaped units of associated luma and chroma samples called coding units(CUs)[18],[19].The sizes of the largest CU(LCU)and the smallest CU(SCU)can be flexibly set in SPS,which is different from the concept of macroblock(MB)of a fixed number of 16×16 square pixels in the prior standards.A quad-tree based recursive splitting approach[18],[19]is used to partition the LCUuntilthe partitions reach SCU size.The basic source coding algorithm in HEVC is a hybrid intra-and inter-picture prediction exploiting spatial and temporalstatistical dependencies and transform coding of the prediction residual removing spatial statistical redundancies left in residuals after prediction.A unified entropy coding method,i.e.context-based adaptive binary arithmetic coding(CABAC)similar to that in H.264/AVC[20],is employed to generate the coded bit-streams.
To deal with the constraints of the maximum transmission unit(MTU)of the network,slices are introduced into HEVC.In certain configurations,a slice can be used as an independent decoding unit to provide error resilience and parallel processing with the help of parameter sets of information sharing.In HEVC,a slice usually contains a sequence of LCUs in raster scanning order,but it can also be non-LCU-aligned.A leaf-CU-aligned slice application with leaf-CU granularity of no less than 16×16 can be configured in SPS,which makes the encoder much more flexible in terms of implementing slices for transmission and error resilience[21].
Besides slices,tiles have also been developed so that HEVC can support MTU matching,error resilience,and parallel processing[22].Intersecting column and row boundaries divide a picture into rectangular regions called tiles each containing an integer number of LCUs.The raster scanning order is applied in coding tiles within a picture and then LCUs within a tile.Similar to slice boundaries,tile boundaries also have an ability to break prediction mechanisms(e.g.,intra prediction and motion vector(MV)prediction)pending tiles configurations unless indicated otherwise.Therefore,tiles are able to support sub-picture based coding when processing high resolution video(e.g.ultra high definition(UHD)video).
Although slices and tiles with proper configurations can provide parallel processing to the codec,they always lead to performance loss because of the boundary restrictions on the prediction mechanisms.To keep coding efficiency while obtaining somewhat parallel processing features,entropy slices[23]and wavefront parallelprocessing(WPP)[24]are integrated in HEVC.These two methods both aim at entropy coding in parallel,while utilizing the available information from adjacent blocks for high-efficiency prediction.The boundaries of an entropy slice only force flushing and re-initializing the entropy engine,which is also one of the functionalities of an ordinary slice.WPPis designed for multi-core architectures,which is believed to become more and more widely used on a variety of devices including mobile terminals.WPPinitializes the CABAC probabilities of the first LCU of each line with the probabilities updated after the second LCU of the upper line has been processed,which carries out parallel processing of LCU lines but with two LCUs'delay among neighbouring LCU lines.
The HEVC standard adopts the well-known block-based hybrid coding scheme that relies on motion-compensated prediction and transform coding with high-performanceentropy coder.However,a series of newly developed techniques and modules are applied to this traditional structure in HEVCto make an accumulation of the highly achieved improvement in coding efficiency.
▲Figure 1.CUsplitting.
In this section,the major features of HEVCare briefly described from a perspective of encoding.And most of the decoding algorithms are already the reverse processes of the encoding counterparts,which are elaborated in the latest working draft document for HEVC[13].
CU is the basic unit of region splitting used for inter/intra coding.It is always square and may take a size from 8 x 8 luma samples up to the size of the LCUwhose size can be set to 64×64,32×32 or 16×16.Aquad-tree based recursive splitting approach is designed to partition an LCU into four equally sized blocks with the limitation of an SCU with its minimum allowable size of 8×8.Aparameter of maximum splitting depth refers to the quad-tree depth from LCU to SCU.By using this mechanism,a picture can be flexibly partitioned to meet the characteristics of the input video,as illustrated in Fig.1.
Prediction unit(PU)is the basic unit used to carry information related to prediction processes.Each CU may contain one or more PU.In general,PU is not restricted to being square in shape,in order to facilitate partitioning which matches the boundaries of real objects in the picture.Generally,all kinds of PU partitions can be employed in inter-picture prediction,while only square partitions are used in intra prediction.Besides the symmetric PU partitions dividing a CU into two or four blocks of equal size,four asymmetric PU partitions are also adopted by JCT-VC into HEVC to further improve the inter-picture prediction performance.The PU partitions in HEVC are presented in Figs.2 and 3.Note that N×N PU partition is only applied to SCU.
Transform unit(TU)is the basic unit used for the transform and quantization processes.TU shape depends on PU partitioning mode.When PU is square,TU is also square,and TU may range in sizes from 4×4 to 32×32 luma samples.When the PU is not square,TU may be non-square with its sizes of 32×8,8×32,16×4,or 4×16 luma samples.Each CU may contain one or more TUs,and multiple TUs may be arranged in a quad-tree structure.The recursive quad-tree transform(RQT)[16]and non-square quad-tree transform(NSQT)[25]can be used in TU splitting as depicted in Figs.4 and 5.
Similar to the 4×4 and 8×8 transforms in H.264/AVC,the core transforms in HEVC[26]are also derived from discrete cosine transform(DCT)with integer precision and operations from 4×4 to 32×32.The core transform designs in HEVChave many advanced properties for software and hardware implementation.For instance,it has 16 bit data representation(independent of the internal bit depth)before and after each transform stage.It also has 16 bit multipliers for all internal multiplications.There is no need for correction of different norms of basis vectors during quantization/dequantization.It allows reusing arithmetic operations for smaller transform sizes.It also allows implementations to use either pure matrix multiplication or a combination of matrix multiplication and feasible butterfly structures.
Figure 3.?Asymmetric PUpartitions.
▲Figure 2.Symmetric PUpartitions.
As in H.264/AVC,a scalar quantization method is also employed in HEVC after transform and can be implemented together with integer transform in an integrated module.
▲Figure 4.Recursive quad-tree transform.
The PCM representation in HEVC transmits sample values of its associated CU without prediction,transform coding and entropy coding.Thus,the PCM representation allows an encoder to adjust the number of bits of a CU or less without complicated computation,which is similar to the PCM mode in H.264/AVC.However,considering the features of CU structure,a more sophisticated method of PCM-mode coding is designed and implemented in the HEVC codec restricting the signalling of PCM mode flag in the bit-stream based on the CU splitting information[27].
The unified intra-prediction coding tool provides up to 35 directional prediction modes,including DC and planar modes for the luma component of each PU[28].The 33 possible intra prediction directions are illustrated in Fig.6.The derivation of the prediction when using planar mode[29]is given in Fig.7.To further improve the intra prediction efficiency of the chroma components,besides the increasing prediction directions,HEVC introduces a new intra chroma prediction mode which utilizes the correlation between chroma and luma samples[30].Chroma samples are predicted from the reconstructed luma samples around the prediction block by modelling chroma samples as a linear function of luma samples.The model parameters are determined by linear regression using the neighbouring reconstructed pixels of the current coding luma and chroma blocks.
▲Figure 5.Non-square quad-tree transform:(a)2N×N partition,(b)N×N partition.
Combining the quad-tree CU splitting with the designed PU partitioning methods equips HEVC with an ability of coping with various local features of video texture.Meanwhile,the spatial statistical redundancy of the prediction residuals is reduced by employing RQTwith varied transforms.Besides the above mentioned features that bring much improvement to the performance of inter-picture prediction in HEVC,some well-developed coding tools also make a large contribution for the finally achieved high coding efficiency from different aspects,such as motion information representation,de-aliasing filtering,fractionalpixel motion prediction and compensation,and so on.
▲Figure 6.Intraprediction directions in HEVC.
To effectively code the information of MVs,advanced MV prediction(AMVP)[13]is proposed to conduct adaptive MV prediction by exploiting spatial-temporal correlation of MVs from neighbouring PUs.AMVPis used in deriving the predictor for the current MV.AMVPscans first the MVs from spatial PUs and then temporal neighbouring PU positions in some specified locations and orders to construct MV predictor candidate list.Then,encoder selects the best predictor from the candidate list for the current coding MV and codes corresponding index indicating chosen candidate,as wellas the MVdifference,in the bit-stream.
Besides AMVP,merge and skip modes[13]are also adopted into HEVCfor implicitly signalling motion information(including inter-picture prediction direction,MVand reference index)of a PU.When using merge mode,encoder picks out the motion information from both spatial and temporal PUs neighbouring to the current PUin a pre-defined pattern to construct a motion information candidate list.Then encoder selects the best candidate and directly employs that motion information in the current PU's motion compensated prediction(MCP)process and codes the candidate index instead of the motion information into the bit-stream.The derivation of the motion information for skip mode is similar to that of merge mode,with the only difference that the prediction residualis not encoded but assumed to be 0.
To suppress the aliasing artefacts in the reference pictures,interpolation filters(IFs)are always employed to generate fractional samples for high precision inter-picture prediction.In HEVC,separable filters using fixed coefficients are derived based on fractional point DCTtransforms,namely DCT-IF[31].The prediction precision can be 1/4-pel for luma samples and 1/8-pel for chroma samples.Different from the cascading interpolation filters in H.264/AVC,where 1/4-pel luma reference pixels are obtained firstly performing 1/2-pel interpolation and then getting 1/4-pel samples using bi-linear filters,HEVC directly calculates both 1/2 and 1/4-pel reference samples by the designed set of filters,which helps to achieve superior performance.
In order to overcome the shortcomings of the relative manner(through memory management control operations(MMCO)and sliding window)for decoded picture buffer(DPB)management method in H.264/AVC,such as vulnerability to losses of pictures that contains MMCO commands,restrictions on the encoder in its selection of coding structure and reference picture usage when temporal scalability is used,etc.,reference picture set(RPS)[32]is developed and integrated in HEVC.RPSdescribes the reference pictures in the DPB in an absolute manner in each slice header of a picture.It contains a list of delta picture order count(deltaPOC)information of all reference pictures that the decoder shall keep.The deltaPOC is used to calculate the picture order count(POC)value of a reference picture as POCreference=POCreference+deltaPOC.Therefore,POC is not only used by the decoder to deliver the pictures in the correct order for display but also for identification of reference pictures during reference picture list construction and decoded reference picture marking.
▲Figure 7.Derivation of the prediction using planar mode.
CABAC is determined as the single entropy coding method for HEVC.CABAC combines an adaptive binary arithmetic coding engine with context modelling and achieves a high degree of adaptation and redundancy reduction.Generally,there are four stages for CABAC to code a value of a syntax element.First,CABAC uses a binarization algorithm selected according to the input syntax to convert its value into bins suitable for entropy coding.Then,CABAC chooses statistical model,namely context model,for the current coding one or several bins by referencing the available information from the adjacent coded blocks and bins.And arithmetic entropy coding using the selected context model is performed to generate coding bits.Finally,the context modelis updated according to the information collected during the actual coding process.
In-loop filtering module is introduced in the HEVCcodec to suppress noise and artifacts brought about by lossy compression.The filtered pictures can be used as references in the MCPloop for encoding and decoding successive pictures.Deblocking filter,SAO,and ALFare the in-loop filters cascaded for HEVC.Every filter has various candidate working modes to cope with different kinds of compression noises in pictures with diversified features,which gives flexibility to an encoder to achieve desired coding qualities under the considerations of the requirements of applications.
Blocking artifacts are often observed at the boundaries of TUs,which is caused by the discontinuity of the quantization errors between adjacent TUs.The deblocking filter in HEVC[13]is designed based on that of H.264/AVCwith some modifications and improvements to fit the HEVC coding features.
SAO and ALFare two filters aiming at removing the quantization noise.SAO is applied to the reconstruction signal after the deblocking filter.SAO classifies reconstructed pixels into categories and reduces the distortion by adding a corresponding offset to pixels of each category in different picture regions.There are two kinds of offsets in SAO,i.e.band offset(BO)and edge offset(EO).BO classifies all pixels of a region into multiple bands where each band contains pixels in the same intensity interval.The intensity range is equally divided into a pre-defined number of intervals from zero to the maximum intensity value(e.g.255 for 8-bit pixels),and each intervalhas an offset.Then the bands are divided into two groups.One group consists of the central half of the total bands,while the other group consists of the remaining half.Only offsets in one group are transmitted.EO uses four 1-D 3-pixel patterns representing the typical edge directional features for pixel classification(Fig.8).The encoder selects one pattern for each region of a picture to classify pixels into multiple categories by comparing each pixel with its two neighbouring pixels,and sends.The selection in bit-stream as side information.
ALFis a Wiener filter based algorithm which is applied to the reconstructed signal after the SAO and/or the deblocking filter.The filtering process uses 2D filters for luma and chroma samples.The encoder calculates the filter coefficients using Wiener based adaptive filtering algorithms,makes a decision on whether or not ALFis applied via a rate-distortion optimization process,and finally codes both filter coefficients and control flags into bit-streams.At the receiver,the decoder first obtains the filter coefficients from parsing APS while the decoder obtains the CU control information from the slice header,and then applies ALFfiltering process to the reconstructed pictures.
▲Figure 8.Four 1Dthree-pixel patterns for the pixelclassification in EO:(a)0 degrees,(b)90 degrees,(c)135 degrees,(d)45 degrees.
Performance comparisons are reported in[33]between the HEVC draft standard and an anchor reference of the H.264/AVC High Profile.The conditions used for the comparison tests were reportedly designed to reflect relevant application scenarios,while enabling a fair comparison to the maximum extent feasible,i.e.using comparable quantization settings,reference frame buffering,etc.Severalof the encoder optimizations currently found in the HEVC software were tested and reportedly shown to be helpful to improve the H.264/AVC anchor performance.Therefore,the testing was indicated to be generally configured in favour of using a relatively strong H.264/AVC anchor reference.When compared to the improved anchor encoder configurations,the HEVC draft standard design reportedly provides a bit rate savings of about 39%for equalcoding quality measured in PSNRfor random access applications,44%for low-delay applications,and 25%for all-intra use cases.
The coming HEVC video coding standard is being jointly developed by ITU-TVCEG and ISO/IEC MPEG organizations.HEVC represents a number of advances in standard video coding technology,in terms of both coding efficiency enhancement and flexibility for effective applications.Its VCL design is based on conventionalblock-based motion-compensated hybrid video coding concepts,but with some important differences relative to prior standards as summarized below:
·APScontaining common information of picture adaptive tools
·quad-tree based CU splitting·asymmetric PU partitioning·quad-tree based TU partitioning
·unified intraprediction,including planar mode and chroma linear prediction using luma samples
·AMVP,merge and skip modes for inter-picture prediction