Gerard Fernando
(ZTE USA Inc.,2425,N.Central Expressway,TX 75080,USA)
Abstract In this paper,we discuss the development of MPEG media transport(MMT),which is a next-generation media transport standard effort by ISO/MPEG.The architecture and functional areas of MMTare described.The functionality of existing media transport is analyzed to determine whether there is a need for this new media standard.From this analysis,potential areas for standardization in MMThave been identified.
Keyw ords MPEG;RTP;media transport
T he International Standards Organization/Moving Picture Experts Group(ISO/MPEG)has started developing a new media transport standard called MPEG media transport(MMT).ZTE Corporation and other companies are involved in this work.In this paper,we report on the progress of MMT standardization.We also analyze existing standards and protocols to determine whether there is indeed a need for this new standard.
It has been nearly 20 years since MPEG developed the MPEG-2 transport stream(TS)standard[1].This standard is used widely in media-delivery solutions,such as cable,satellite,and terrestrial delivery of entertainment video.It is also used in some stored-media solutions.However,some aspects of MPEG-2 TSrequire updating to accommodate changed conditions.In the following,we discuss the reasons for updating MPEG-2 TS.
As with MPEG-2 TS,the real-time transport protocol(RTP)of the Internet Engineering Taskforce(IETF)[2]was developed in the mid 1990s.Since then,RTPhas been regularly updated with adaptation formats for recently developed media compression standards.However,there are some key functionalities missing from both RTPand MPEG-2 TS,and this is arguably a good reason for developing MMT.
In section 2,we describe the features of existing media-delivery standards and highlight key changes in MPEG-2 TSand RTP.In section 3,we describe the architecture of MMT.In section 4,we list the functional areas of MMT.In section 5,we describe some key features that would be included in MMT.In section 6,we make some concluding comments on the viability of MMT.
The MPEG-2 TSstandard supports the combining of one or more elementary audio-video and data streams into single or multiple streams that are suitable for storage or transmission.This standard provides details about decoder buffer management to ensure media can be played back without buffer overflow or underflow.MPEG-2 TSprovides the delivery clock as an in-band data channel that can be used by the receiving client to determine delay and jitter in the network.
MPEG-2 TSgeneration involves two stages.In the first stage,the media data is put into packetized elementary stream(PES)packets.This is analogous to encapsulation in MMT(section 4).Elementary-stream media data is packetized into PESstreams according to access-unit boundaries,and presentation and decoding timestamps are inserted into the PESpacket header.The second stage is delivery packetization.PESpackets corresponding to audio,video,and other data formats are further packetized into smaller-sized packets of 188 bytes,and delivery timestamps are added.
MPEG-2 TShas withstood the test of time and is still highly relevant to the media-delivery industry.The main media-delivery services at the time of MPEG-2 TS standardization were terrestrial broadcasting,and cable and satellite delivery.Since then,IPTVover multicast and,very recently,media delivery over Hypertext Transport Protocol(HTTP)have become important.Both HTTPlive streaming(HLS)from Apple Computer and the DASHspecification from ISO/MPEG[3]use MPEG-2 TSas a media format that is segmented for delivery over HTTP.Additionally,there are emerging hybrid services that use combinations of traditional delivery methods,and these require complex control and related operations on the part of the delivery standard.There are two areas where MPEG-2 TSis lacking:error control and support for network quality of service(QoS).
▲Figure 1.MMTArchitecture.
MPEG-2 TShas been updated several times(updates are referred to as amendments by the ISO)to include media types beyond those in the initial MPEG-2 TS.For some complex media formats,such as multiview coding(MVC)and scalable video coding(SVC),the original buffer modelis inadequate.Furthermore,when MPEG-2 TSwas developed,the required bitrates were on the order of 2 Mbit/s to approximately 20 Mbit/s,and this was for video with resolutions up to 1920×1080.With ultrahigh definition(UHD),the maximum bitrate to be supported is now near to 100 Mbit/s,and for these bitrates,the TSand PESpacket sizes are not suitable.
All these reasons point towards the need for an updated MPEG-2 TS,and this is the justification for MMT.
The real-time transport protocol(RTP)provides end-to-end delivery services for real-time data such as interactive audio and video.These services include payload type identification,sequence numbering,delivery timestamp insertion,and delivery monitoring.Initially,the developers of RTPdid not want to support multiplexing of media in RTP;however,because there was a demand,several proposals were put forward that included multiplexing in RTP.
A key feature missing from RTPis quality of service(QoS)guarantee,and this is a feature that will be included in the new MMTstandard.
The architecture for MMTis orthodox in terms of networking and media transport.Fig.1 shows the architecture that has been agreed upon by experts participating in MMT standardization.
The architecture is divided into three functionalareas:encapsulation,delivery,and signaling.
The encapsulation function defines the format for the encapsulation of encoded media data to be stored or to be carried as the payload of delivery protocols and networks.The delivery function provides formats and functionalities for transferring encapsulated media data from one network entity to another.The signaling function signals and controls delivery and consumption of the media.
There is a clear separation of the functions of media encapsulation,media delivery and signaling within MMT.
Encapsulation involves the following operations:
·Media packetization
·Media fragmentation
·Media synchronization
·Media multiplexing
·Insertion of timestamps to enable media synchronization,like lip synchronization
·Insertion of composition information.This includes spatial and temporal location of media objects in a given scene.
·Content protection.This includes conditional access and digitalrights management.
·Container format that can be stored or packetized for delivery.
Encapsulation defines a media container that is not itself a physicalstorage format.Instead,this container may be stored or with further processing in the delivery layers of MMT,it may be ready to deliver.The main functions of the encapsulation layer are similar to PESencapsulation in MPEG-2 TS.The output of encapsulation is an MMTpackage.
The MMTgroup has done much valuable work in the area of scene composition.The group have taken existing standards such as SMIL 2.0[4]and LaSeR[5]as the basis for MMT composition and only made additions when there is broad consensus that such functionaladditions would enhance the MMTsolution.
In delivery,the encapsulated MMTpackage is taken as input,and the following operations are performed:
·Network packetization
·Network flow multiplexing
·Insertion of delivery stamps for use by client device to determine jitter etc.
·QoSoperations
·Error handling.This includes application-layer forward-error correction(AL-FEC)and retransmission-based error handling,which is often referred to as automatic repeat request for re-transmission(ARQ).
▲Figure 2.Layers of the delivery functionalarea.
The delivery functional area is subdivided into D.1,D.2 and D.3 layers(Fig.2).
Each delivery operation is performed in one of these three layers.The D.3 layer is not the same as the other two layers because its main function is to deliver messages between the other two delivery layers to enable cross-layer optimization.
4.2.1 D.1 Layer
The main data fields inserted into the D.1 layer are payload identification,fragmentation and aggregation of transport packets,information to enable content protection,and AL-FECand information to enable random-access operations The D.1 layer generates the MMTpayload.
Payload identification is required to determine the type of payload,including whether it is a media or signaling payload.Fragmentation and aggregation information allows the encapsulated media packets to be suitably structured according to the specific transport environment.Content protection and AL-FEC are required for media delivery;hence,the D.1 layer header indicates these functions.For random access operations to be performed efficiently on media data,information about random-access capability for a given media packet needs to be available at the transport level,and this obviates the need to inspect the media payload to determine random-access capability.
4.2.2 D.2 Layer
The D.2 layer header provides the delivery timestamp and QoSparameters.The D.2 layer function generates the MMT transport packet.
MMTrequires a delivery timing model that provides a timestamp for synchronizing media streams and that calculates delay and jitter in networks.In the event required timing tolerances are not satisfied,any element on the delivery path should be able to readjust the timing relationships.At the current stage of the MMTdevelopment,it is assumed that each element on the delivery path has access to the universal time clock(UTC)from a remote clock source that operates the network time protocol(NTP)[6].An alternative approach is for the delivery clock to be transmitted in-band with the media data,which is similar to using a program clock reference to derive the delivery clock in MPEG-2 TSsolutions.With in-band clock delivery,there is no requirement for each element in the delivery path to have access to the UTC clock from a remote NTPserver.Arguably,relaxing this requirement could make MMTmore widely deployable.
In the D.2 layer,QoSfields are available so that network filtering can be performed based on these fields.It is expected that the QoSfields from the D.2 layer will be mapped to the corresponding fields in the IPv4 or IPv6 protocols.
4.2.3 D.3 Layer
The D.3 layer is also referred to as the cross-layer function because it provides the means of supporting cross-layer optimization.This requires exchanging QoS-related information between the application layer and underlying network layers.QoS-related information could be used for QoSmanagement and adaptation such as flow control,session management,session monitoring,and error control.
The signaling function is divided between the S.1 layer and S.2 layer(Fig.3).
4.3.1 S.1 Layer This layer is used for presentation session management.Signaling messages are exchanged between applications in the client device for media presentation,session management,and provision of information for media consumption.
4.3.2 S.2 Layer This layer manages delivery sessions,which includes managing signaling messages that are exchanged between delivery end-points.These signaling messages are used for flow control,delivery session management,delivery session monitoring,error control,and hybrid network synchronization control.This is an important function for media delivery over hybrid networks.
Figure 3.?Signaling functionsassociated with encapsulation and delivery functions.
Media delivery services need to work effectively in error-prone networks.However,where media delivery occurs in a pull data model,with TCPas the underlying transport protocol,there is no need for explicit error control because this function is inherent in the packet loss detection and re-transmission in the TCPprotocol.
For data delivery using the push model,where error control is not built into the underlying transport protocol,there is a need for an error-control function,and this is an area where the MMTstandard could provide valuable added functionality.Any packet loss that may occur in the delivery stage is detected at the client device,then there are several methods that could be followed to mitigate packet loss.The following is a list of solutions to address such packet loss:
·AL-FEC
·ARQ
·Built-in error resilience at the codec level.This includes data portioning and redundant data generation.Scalable video coding could also be structured to enable error resilience.
·Error concealment at the client device
AL-FECand ARQ require explicit signaling as well as extra media or supporting data to be delivered for such functions to be effective.Using ARQ for error control is very effective but is limited to services that do not need real-time response becauase this method requires retransmission of packets.
Error concealment and error recovery,which invlove introducing error resilience at the codec level,are commonly used in today's media delivery solutions.These techniques are reviewed in[7].Because these approaches do not require any explicit signaling or extra media data transport there is no need to go into further detail here.
With AL-FEC,the server first adds some redundant data to the transmitted packets using a predetermined FEC algorithm.There are several contenders for such FECalgorithms.At the receiving client device,once packet loss has been detected,missing information may be reconstructed.There are two different methods of signaling the client device about the specific FEC algorithm being used:The signaling information can be carried out-of-band or the signaling information can be carried in band.The use case would determine which of the two methods is applicable for a given media delivery solution.The server deployment for in-band signaling is simpler compared to that for out-of-band FEC signaling.As the bandwidth requirement increases for in-band signaling,the FEC parameters need to be signaled frequently in order to enable functions such as channelswitching.
These two approaches to FEC signaling are prime candidates for standardization in MMT.Signaling data would be delivered within the signaling layers.Whether this signaling data is delivered in the same physical channel as the media data or whether it is delivered over a separate physical channeldepends on the deployment scenarios.
ARQ has particular benefits for media delivery services that do not require real-time response.In ARQ,the client device constantly sends acknowledgements to the server.If the server does not receive the acknowledgements in an expected time interval,it re-transmits the particular media data packet.The parameters of the ARQ process,for example,timeout duration,need to be signaled to the client device from the server.In this respect,the process of ARQ parameter signaling is similar to that for FEC signaling.The same trade-offs as for FEC are applicable for ARQ.
Hence,the same principle for standardization of the signaling information can be adopted in MMT.
In this paper,we have described the work that is currently being done within ISO/MPEG on the development of the MMT standard.The two main media-delivery standards that are in use today are MPEG-2 TSand RTP,but these have certain limitations.There is justification for MMTstandardization focusing on the limitations that have been identified in this paper.
In addition to encapsulation and delivery functionalities,MMTalso has signaling and composition functions.With these two areas,similar gap analyses with respect to existing standards and protocols need to be carried out in order to determine whether MMTshould include such functionality or whether it is possible to rely on existing standards and protocols.
The MMTeffort is a work in progress.Success of MMT depends to a large extent on whether the standards so far developed fill an actualgap in technology.ZTECorporation and other companies are playing an active role in this standards effort,and this will help ensure MMTgives rise to a technology that is useful.