Zhang Ruoying Xu Zhifa Lu Chuncong
(Telecommunication Planning Research Institute of MII, Beijing 100037, China)
AbstractWith more and more network services developed,users now have higher requirements on the Quality of Service(QoS).Service Level Agreement(SLA)is thereby proposed to manage telecom services with guaranteed QoS and to address the QoS issues between Service Providers(SP)and users.SLA representation template,violation process and metrics evaluation are three key techniques to implement SLA.A typical SLA management system includes SLA data management,SLA problem management,and SLA management.SLA research is still in its primary stage and SLA management is yet to be standardized.With general and special management approaches to be defined in the future,the unified industry standard will finally take shape.
A s Quality of Service(QoS)[1]becomes increasingly important,Service Level Agreement(SLA)[2]is proposed to manage telecom services with guaranteed QoSin order to straighten up the service quality issues between Service Providers(SP)and end users.An SLAis a formal contract(or part of an agreement)between an SP and a user that specifies,usually in measurable terms,the service quality level,priority,and obligations.It is a standard evaluating telecom service.
The objective of SLA is to build up a healthy network operation environment where users can enjoy services guaranteed by regulations instead of verbal“promises”,therefore,their rights are well protected.With the SLA,sophisticated telecom carriers are able to attract VIPusers who bring stable revenues while start-up carriers may sharpen their competitive edges.
An SLA involves SLAparameters'definition and calculation,SLA representation method,and SLA management methods.Presently,numerous research works are done on SLA.TMF701 of TeleManagement Forum(TMF)discusses two management methods—SLA parameter frame work and SLAlife cycle;GB917[3]defines service availability parameter and performance report contents not referring the representation;IETFdrafts[4,5]put forward the necessity and importance of SLA representation.However,the key implementation technology and system application of SLA(e.g.,SLAparameter selection/measurement,SLA representation method/format,and SLA violation process)are yet to be defined.
In order to exactly define the requirements of SLAmanagement,there should be in-depth comprehension and analysis.The SLA management requirements are involved in four aspects:service fulfillment,service assurance,user interface management,and other management needs.The first three comes from the definition of enhanced Telecom Operations Map(eTOM).
In the service fulfillment stage,SLA management focuses on SLA negotiation and subscription.The SLAshould clearly define the following:measurable service performance metrics and parameters that are understandable to the user;user and SPobligations;service performance measurement method,measurement cycle,and report cycle;SPoperations triggered by SLAviolation;service-related report types(including report content,format,destination address,conditions,and transfer mode);SLAservices'definitions and expiration dates.Concerning of any services,a user should be able to select the parameters to be guaranteed and the value range of parameters.
The service assurance stage is when the services have been configured and the SPis providing its user with service guarantee.The SLAin this stage should monitor the service quality level and offer information report to the user.The SP must follow the SLAto monitor and measure the actual service performances in a range acceptable to the user or an entrusted third party.All SLA parameter-related user-oriented service information must be sent in time to the user according to SLA terms.The SP should set soft threshold for each parameter and give warning whenever the threshold is exceeded.According to SLA terms,the user should also be told the possibility of SLAviolation caused by service degradation.
▲Figure 1. SLA template composition.
For user interface management,the SLAneeds pay attention to interfaces between the user and the SP,as well as how the SPresponds to user services and SLAqueries.The SPshould respond quickly to the queries about the level of user service quality.Meanwhile,the user should be able to report problems/faults,request for processing,query service status through phone,fax and emailmeans,and get answers/responses through various ways.
Other management requirements include:
·SLA should define and uniquely identify each service module;
·The performance report should use the service identifiers defined in SLA;
·The exceptional service or performance processing and user obligations should also be defined,such as the priority selection mode for reporting problem to the SPand provision of the contact mode.
The network performance is divided into service performance and network performance on account of different objects,attributes,functionalscopes,and measurement scopes.The two performance levels are described in performance factors,which in turn is the combination of different parameters.
The SLA is a formal and negotiatory agreement between SPs and users to define services,priority,and responsibility,which focuses on the service level.Figure 1 shows the basic 3 aspects of the SLA template:service level objective,violation process and force majeure statement.The service level objective is described with defined SLAparameters.The SLA parameters include QoS,service level priority,parameter weights,and other high-level parameters obtained by computing existing SLA parameters to evaluate the overall operation quality,such as the parameter of service availability.
ITU-TE.860[6]defines QoSas the consistent degree of service performance and negotiated terms in SLA.The QoSlevel is obtained by comparing the objective QoSto the measured QoSwhich is used to value the overallservice performance.QoSis the metrics of a specific service for the NGI services,negotiated and defined in SLA.It is related to service quality level and network status and should be guaranteed.It is all or part of the performance factors that define service performance and network performance.QoSparameters can be used or ignored according to the actual circumstance.Therefore,SLA,QoS,and IPnetwork performance collectively evaluate the network quality and service quality in a scientific and measurable way.
The SLA representation template is a method used by both SPs and users to define the SLAcontent of a certain service from the aspects of service level,service quality,priority,and obligations.All aspects are described in this template.
The difficulty in SLA negotiation lies in the shortage of a referential SLA representation template.Without a common template,the SPhas to work out a new SLA whenever he negotiates SLA issues with the user.It adds the workload and slows down the exploiture course.Worse,still,as different SPs describe the SLA in their own terminologies,the SLA metrics of the same service may mean distinct differently to the users.It's understandable that the lack of a common representation template will affect the user's satisfaction and no commonly acceptable contracts could be signed.
The SLA representation template uses a template to present SLA contents.It simplifies SLAnegotiation process and standardizes the service flow,thus better guarantees service for SPs and users.The SLArepresentation template defined in this article is a common one that depends on neutral service or implementation technology and thus is quite useful in the early stage of SLA negotiation[7].We give a sample SLA representation template,shown in Figure 2,to standardize the SLA representation method and contents,which adopts the object-oriented design.Its construction process is the modeling process of SLA representation template,and has 4 parts:service part,technology part,business part,and QoS report part.
·The service part breaks into information identification,service scope,service grade,and service accounting.The information identification describes the basic information of the service,SPs and users.The service scope is the network scope of the service(between service access points or between network access points)defined by the SP when he specifies the service levelto the user.According to the scope of service,the proper service parameters are selected,and related network devices,lines,and users are determined.The service grade,as selected by the user,corresponds to relevant service quality and charges.The service accounting is to propose user account bills for the service by certain billing rates and service levels.
▲Figure 2. SLA representation template.
·The technology part consists of the QoSmetrics,network topology,and performance monitoring.The QoS metrics are the summary of all quality-related metrics,including service metrics,traffic metrics,and network performance metrics.The network topology offers visual network impression for easy understanding of the general network.The performance monitoring is to monitor the service quality data and adjust the service level accordingly.
·The business part includes the violation process and force majeure.The violation process describes the violation conditions and steps to be taken in case of violation.The force majeure statement gives additional description of the“act of God”—exceptional violations that can exempt SPs from penalties.
·The QoSreport is a classified service quality report offered to the users and SPs.It extracts data from the service part,technology part,and business part and gives the service quality data and statistics as specified in the SLA for further evaluation.
The SLA violation process sets the content and means of the penalty paid by SPto the user in case the SPfails to provide the expected service level.This violation process is supposed to ensure the fairness,legality and effectiveness of SLAnegotiation and is very important in the negotiation process.It also helps to build fair rewards and punishment mechanism and protects user's benefits as well.
The violation process should describe the violation conditions and steps taken in case of violation.When the agreed traffic mode or service quality fails,the SLA is violated.
There are two kinds of SLA violation.One,SPfinds out that the network status is not able to meet the SLA terms,in case of network traffic congestion or network disconnection;An SPor a user finds that the service quality is degraded below the level as set in the SLA.The other,a user finds out that the service quality is degraded.Figure 3 shows the violation process flow.Three measuring parameters are used here:the service parameter,traffic parameter,and performance parameter.Whether or not a violation is occurred depends on the parameter values against their thresholds.Steps should then be taken accordingly.Different penalty algorithms should apply to different violations.The violation severity determines whether the warning information should be sent.The parameter's deterioration degree determines the necessity of traffic shaping,and service suspension or termination.The process will ultimately be output in a report(to the threshold module,for instance)as the basis for adjusting the thresholds of some parameters.
As the SLA management requires managing all related network data,service data,and user data,it's necessary to collect mass original data from different layers and processes,compute,and manage them.Figure 4 shows the SLA metrics evaluation system.In case of a single SP,the system is composed of three layers:resource management,service management,and customer management.Data of these layers are collected,analyzed,computed,and transferred among the layers.
·The resource management layer does data measurement,filtering,and computing.End-to-end data,including network performance,traffic and service reliability are generated in this layer to affect the SLA.It sends to the service management layer the network performance notification,traffic notification,network fault notification,and network troubleshooting notification.
▲Figure 3. Violation process flow.
▲Figure 4. SLA metrics evaluation system.
·The service management layer does data analysis and computing.It sends to the customer management layer the network performance report,network performance degradation notification and network performance degradation report,traffic notification,fault notification,and troubleshooting notification.
·The customer management layer does data analysis and computing.It outputs various SLA quality evaluation reports to the SPand user.
With the SLA system application,users can clearly state their requirements and check the SLA implementation.The application helps the SPunderstand user requirements and use states of the network,thus the SPis able to plan its service quality management development and improve its services and competitiveness[8].The universal SLA management system functional architecture is composed of three parts:SLAmanagement,SLA problem management,and SLA data management,as shown in Figure 5.
▲Figure 5. Functional architecture of SLA management system.
The SLA management module monitors,manages,and outputs reports for the service quality.It compares the quality information against the defined SLA parameters to generate a report whethes there is an SLA violation.This process is concerned with the metrics of a service instance that are related to the SLA,including network performance parameters(delay,jitter),and service performance parameters(service availability,mean time to repair).If the service provided by the SPfails to meet the SLA requirement,billing adjusting may be resulted in.
Functions of SLA management module are:
(1)SLAEvaluation
Manage service quality,ensuring that the service quality meets the agreement signed between the SPand the user.Check the quality-related service data coming from other processes and give alarm to the functional module if the data is not satisfactory.
(2)SLAViolation Management
Ensure to notify the user and related module of the service degradation and violation,and ensure that steps are taken to solve the degradation and violation problems.Analyze the SLA violation information,handle the violation issue and notify the user of the service quality and violation process information when there is an SLAviolation.
(3)QoSReport
Report service quality,generate and describe the reports on service level,and customize the quality report and report query based on user demands.
The SLA problem management module reacts promptly to service-affecting faults,and invokes service configuration module or triggers problem solving process.
Functions of the SLAproblem management module are:
(1)Problem Diagnosis
Confirm an existing problem that is reported from the resource management layer and notify the user of the problem.Request the resource management layer to check the user complaint to make sure whether the problem exists and feeds back the problem.
(2)Problem Solving
Propose a solution of the problem based on the fault information and performance information,and trigger related modules to carry out the solution scheme.
(3)Problem Closure and Reporting
Perform necessary tests to ensure that the services will be restored to normal quality level.End the problem solving and send fault clearance report to the user.
(4)Fault Information and User Complaint Query Query fault cause,the current fault that affects the service,the service configuration,and the performance information that related with the current fault and the user complaint information.
The SLA data management module collects and processes information of the network configuration,performance,fault,and billing,and transfers information to relevant processes.It also traces network traffic changes,monitors network faults,estimates network resource use,and sends performance data to the SLA management functional modules and the SLA problem management functionalmodule.
The software part of SLA management system may adapt the 3-layer(data collection,resource management,and user management)structure.It is recommended to use a distributed measurement and centralized management mode.With the distributed measurement,data is obtained regularly for mapping,computing and analysis.The result of data analysis together with the network operation features and SLA requirements will collectively reflect the network status,the service level,and the SLA implementation status.
The article has discussed the research work done by international bodies,and the existing problems regarding SLA.It covers the SLAmanagement requirements,relationship between SLA,QoS,and IPnetwork performance.It has discussed three technologies required to implement SLAmanagement(SLA representation template,violation process,and metrics evaluation)and the SLA system application as well.
SLA research is still in its early stage;SLAmanagement is yet to be standardized.With general and special management approaches to be defined in the future,the commonly used industry standard will finally take shape.