Xuehua Tang, Zhongyuan Wang*, Xiaojun Li, Zhen Han, Zheng He, Youming Fu1 School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 0079, China
2 Research Institute of Wuhan University in Shenzhen, Shenzhen 518000, China
3 School of Computer Science, Wuhan University, Wuhan 430072, China
4 State Grid Economic and Technological Research Institute CO., LTD, Beijing 102209, China
* The corresponding author, email: wzy_hope@163.com
Abstract: Software performance evaluation in multimedia communication systems is typically formulated into a multi-layered client-server queuing network (MLCSQN)problem. However, the existing analytical methods to MLCSQN model cannot provide satisfactory solution in terms of accuracy,convergence and consideration of interlocking effects. To this end, this paper proposes a heuristic solving method for MLCSQN model to boost the performance prediction of distributed multimedia software systems. The core concept of this method is referred to as the basic model, which can be further decomposed into two sub-models: client sub-model and server sub-model. The client sub-model calculates think time for server sub-model,and the server sub-model calculates waiting time for client sub-model. Using a breadthfi rst traversal from leaf nodes to the root node and vice versa, the basic model is then adapted to MLCSQN, with net sub-models iteratively resolved. Similarly, the interlocking problem is effectively addressed with the help of the basic model. This analytical solver enjoys advantages of fast convergence, independence on specific average value analysis (MVA)methods and eliminating interlocking effects.Numerical experimental results on accuracy and computation efficiency verify its superiority over anchors.
Keywords: multimedia communication system; queuing network; performance evaluation
Recent years have witnessed the popularity of large-scale multimedia services, such as city-level video surveillances, social networks,video sharing sites and large-scale multimedia archive inquiry systems. Confronting massive service requests and volume video/audio data,the efficiency of system resource scheduling is increasingly being challenged. In a distributed multimedia system, a task can either act as a client receiving services provided by other tasks, or can act as a server providing services to other tasks. The Architecture of a system often determines its performance,including the number of customers’ requests to a server within a certain time, the number of concurrent tasks in the server, and the task distribution in a multi-processor system or network. Besides, such key metrics as scheduling delay, throughput, bandwidth utilization are especially concerned for delivering real-time multimedia contents. Because multi-layered client-server interactions often happen in a system, for instance, tasks queuing in the middle layers of the software and the hardware layer, it is difficult to intuitively evaluate system performance [1]. A quantitative method evaluating the impact of these factors on the system performance is essentially required.
Performance evaluation of the concurrent or distributed system is a widely studied topic.Performance evaluation methods mainly fall into two categories: deterministic and stochastic approaches. Queuing network model is one of the representative stochastic methods,which uses the average value analysis (MVA)method [2] to handle the resource scheduling problem. Traditional queuing network models can be used to estimate the conflicts among independent tasks (or hardware devices),but they fail to consider the conflicts among interdependent tasks. Nevertheless, most distributed system involves the synchronous resource consumption interaction, in which the client and server tasks are occupied by another task simultaneously. To overcome their shortages, some agent-based approaches[3, 4] have been proposed and made perfect successively. Such methods divide queuing latency into delay of major resources and delay of secondary resources in concept, and solve them through the establishment of a “major resource conflict model” and a “secondary resource conflict model”, respectively. Results are iterated between two models until a convergence solution is ultimately achieved.Agent-based analysis methods are suitable for three-layered client-server queuing networks,but are difficult to be extended to more layers of client-server queuing network systems. The multi-layered client-server queuing network(MLCSQN) model, such as stochastic rendezvous network model (SRVN) [5], offers different services for different request portals and thus breaks the restriction that a software task merely provides a sole service in a model. SRVN model also takes into account the request priority as well as the interlocking effects emerging in multi-layered queuing networks, having the execution of software tasks divided into two or more stages to enhance ability to describe the model. Mathematically,SRVN model adopts a similar MVA approximation algorithm [6] to perform analysis of queuing delay, whose key issue is to estimate the probabilities of various situations where requests arrive instantly. In the method of layers (MOL) [7, 8], a MLCSQN is divided into “Software conflict model” and “Device conflict model”, respectively. A linear algorithm [9] is employed to obtain a convergence solution by iteratively solving these two models. Different from SRVN model, MOL allows multiple concurrent instances in software tasks, but demands that the top task only call closely successive server tasks. On the basis of the SRVN and MOL models, Franks et al.were engaged in the layered queuing network(LQN) model [10] and its performance enhancement [11] as well. LQN model decomposes a client-server queuing network model into a series of sub-models which can be iteratively solved within sub-models with MVA.Following their pioneering work, a technique similar to workload concealment was further introduced to handle asynchronous and synchronization messages in double-layered [12]and multi-layered [13] client-server queuing networks. In practice, LQN models have been widely used for the performance evaluation of distributed parallel software and web systems[14-16]. For example, Shoaib et al. [16] applied LQN performance model to an Apache-PHP web application and showed comparable results with load test measurement in terms of the average error and response time. Wang et al. [17] studied the performance of opportunistic scheduling in wireless multimedia and data networks based on popular stochastic network calculus, which establishes stochastic arrival curve by bounding traffic arrival process with exponentially bounded burstiness traffic model.
In the very recent works [18, 19], Zhou introduced the multi-layered model into the delay estimation and announcement problem on media cloud environment. Ref. [18] partic-ularly exploits the data-driven delay estimation for heterogeneous multimedia services in the cloud with an updated data-driven method to balance the computational complexity and estimation accuracy. Ref. [19] further analytically studies the characteristics of delay announcement by analyzing the components of the user response. Different from them, this paper focuses on overall performance evaluation and prediction of distributed multimedia software systems with client-server queuing behavior, thus involving more various factors than delay, such as throughput, resource utilization, queuing time, access delay, and residing time.
The aforementioned methods are committed to solving the MLCSQN problem in software performance evaluation from their own point of view. However, they often fail to provide a satisfactory solution in terms of accuracy and convergence rate under the constraints of acceptable running time. Especially for the case of interlocking effects since most existing methods do not attach importance to the issue of interlocking. In this paper, we focus on the modeling problem of distributed systems and propose a new analytical solver to MLCSQN model. We use a similar approach to LQN, and yet make differences in solving strategy and iterative decomposition algorithm. The advantages of our approach lie in three-folds: 1) improving the convergence rate and accuracy; 2) an effective solution to addressing the interlocking problem; 3) better universality provided that it does not depend on any specific underlying analysis methods used by sub-models. This method is suitable for analyzing and predicting the performance of distributed multimedia software system running on multiple processors or networks.
The remainder of this paper is organized as follows. Section II briefly introduces the MLCSQN model and its description notations.Section III presents the analytical solving method along with the interlocking issue.Section IV shows the comparison results. In Section V, we conclude the paper.
The resource scheduling problem in high-performance multimedia communication systems can be gracefully modeled via MLCSQN. MLCSQN is a directed acyclic graph, where an entity represents a software or hardware task executing concurrently, which has no internal concurrency control, but can send or receive a service request. Roughly,a task can be classified into three categories:“pure client task”, only issuing service requests; “active server task”, either issuing or accepting service requests to/from other tasks;“pure server tasks”, only accepting service requests. Tasks are used to describe software processes, threads, or hardware devices. For example, the processor is usually represented by a pure server task. Each task contains one or more entries having their own execution time, which stands for variable services provided by the task. Tasks communicate with each other by a send-receive-reply mode [1],in which the task waits till receiving a response message before continuing execution after the request message is issued. Messages are sent to an entry of another task from the entry of one task in afirst-in-first-out (FIFO)queuing rule. Thus, a MLCSQN model constitutes a directed acyclic graph.
Fig. 1. An instance of MLCSQN model.
Figure 1 shows an instance of MLCSQN model, where there are two pure client tasks interacting with one active server task and two pure server tasks. Two pure server tasks have two respective entries, while any of other tasks has only one entry. The numbers in square brackets of entries stand for average service time, which is assumed to be exponentially distributed. Arc lines with an arrow indicate requests of tasks, with the digits in parentheses denoting the average number of requests during the execution. The distribution of request numbers can be geometrically distributed or in a deterministic constant. The principal parameters with respect to MLCSQN model are tabulated in table 1.
In this section, we first establish the basic model and deduce its solution using a similar synchronization method for addressing resource consumption [2]. Different from the agent-based approach [3], the solving approach for basic model is suitable for extension to the multilayered client-server queuing network. By extending the solving algorithm of a basic model to a MLCSQN model, we ultimately obtain the solution to the overall model.
Fig. 2. Basic model.
First of all, we use a simple client-server example to illustrate the basic model as well as its solving idea. Figure 2 shows a basic model which consists of a three-layered client-server queuing network. Typically, basic model is related to synchronous resource consumption, in which customers accessing the client “clientA”needs to access server “server” at the same time. In the preliminary MLCSQN model (as shown in figure 1), the service at a center is dependent on other nodes and forking generates customers while joining eliminates them.In terminology, these situations are called the simultaneous resource possession and forkjoin behavior. Previous studies [10] have shown that the queuing network system with simultaneous resource possession or fork-join behavior cannot be modeled directly by MVA due to the service dependence and the routing diversity. To convert our problem into an analytically tractable problem with MVA, we provide a deep insight into the key factor delay.Conceptually, the average delay of customers accessing clients contains several parts: customers’ queuing time at the client, the client’s execution time, the client’s queuing time of accessing server and the server’s execution time.Among them, customers’ queuing time at the client refers to as the spent time by customer queuing up in the client, for example, customers lining up in banking services. The client’s execution time includes the client’s own think time plus the time to access the server. For the case without any service dependence, the server’s execution time is actually its think time. The time that only spends at the client or server node is called think time.
However, we cannot observe or detect each fraction separately in an actual system. Therefore, we consider the case that the basic model is decomposed into two sub-models: client sub-model and server sub-model (as shown in figure 3). When the task is treated as a client of sub-model, it is assumed to be a server without queuing delay. In client sub-model,MVA method can be used directly to resolve the client’s queuing delay (note that the queuing conflict of client accessing server is ignored). Similarly, in server sub-model, MVA method can be directly used for obtaining the client’s queuing delay at server (here ignoring the think time of client). When calculating server sub-model, as customers’ queuing time at client is unknown, the client’s think time at server sub-model is unknown yet.
To address this problem, MVA method is used for solving the delay tasof client accessing server each time. Average residing time of accessing entry e is given by
where average queuing time tedreads:
The client’s residing time xais calculated with Eq. (1) while the client’s think time in the server sub-model is ignored. Because the client’s think time in server sub-model is ignored, the obtained taswill be higher than its actual value. In client sub-model, the same MVA method is employed to compute client’s utilization Ua. According to the definition of utilization, client’s think time Zascan be obtained as follows:
Because the obtained xais higher than its actual value in server sub-model, the computedin client sub-model will be inevitably smaller than its actual value.is substituted into the server sub-model to recalculate the delay tasof client accessing server each time.Thus, iteration optimization is carried out between two sub-models until the client’s utilization Uais less than a preset value. Since the obtainedin client sub-model is smaller than its actual value, tasin server sub-model is higher than its actual value but is less than the outcome of the previous iteration. In this way,continues growing while being less than its actual value, and hence the convergence will be satisfied after a number of iterations.
Table 2 shows the process of solving the basic model in figure 2, which carries out calculation in each row from left to right, with the input of each sub-model produced by another sub-model. For example, to obtain the client’s service time in client sub-model, we get the delay of client accessing server from the left server sub-model (4.123 time units) and then plus the client’s own execution time (8 time units) using Eq. (1). To acquire the client’sthink time in the server sub-model, we use the client’s utilization (0.993) from the top client sub-model and then calculate it using Eq. (3).
Table I. Notations on MLCSQN.
Table II. Solving client and server sub-models.
Fig. 3. Decomposition of basic model.
The solving method for the basic model can be readily extended to the MLCSQN model.As mentioned before, a MLCSQN model can be seen as a directed acyclic graph, where the leaf nodes are actually pure server tasks and the root node is a pure client task.
The solving steps for an entire MLCSQN model are summarized as follows:
1) Initialization. From the beginning of leaf nodes in a bottom-up manner, Eq. (1) is used to initialize the service time xeof each node entry, with the queuing delay of leaf nodes ignored. The average delay tedfrom this node entry to each of other node entries is recorded, where d∈E. The think time of the node entries beyond the root node is initialized to zero.
2) Solving the server sub-model. Starting from each leaf node in a bottom-up manner,all the direct parent nodes of all entries are located to form a server sub-model. In server sub-model, the parent node is viewed as a server without queuing delay. The think time of a parent node is equal to the outcome that the last obtained service time xeminuses queuing delay ted, namely,
where tedis computed by MVA method and the service time xeof each parent node entry is updated for each iteration. Starting from the leaf nodes, the entire directed graph is traversed in breadth-first search strategy, with server sub-models formed in each node solved successively.
3) Solving the client sub-model. Starting from each root node in a top-down manner,wefind itsfirst child node, and then collect all parent nodes to form a client sub-model.Based on the service time xdgenerated by server sub-model, the utilization of node entry d is calculated with MVA method and the think time Zdof d is then given by Eq.(3). Starting from the root node, the entire directed graph is traversed in breadth-first search strategy, with client sub-models formed in each node solved successively.
4) Loop. Steps 2) and 3) are repeated until the node utilization is less than a pre-determined threshold.
We see MLCSQN model as a whole directed graph, and iteratively resolve it in an order from leaf nodes to the root node and vice versa. Sub-model selection criteria are based on the server nodes arising queuing conflicts,i.e., the node and all its direct parent nodes are solved as a sub-model. In this way, if the client’s think time is accurate enough, the client’s queuing delay in server can be directly calculated with traditional queuing network methods.
Throughout the solving process, there is a certain overlap between the sub-models. Because the service time of the parent node entries is updated after server sub-model solved each time, the think time of the client node relative to other server sub-models can be smartly recalculated with the value of service time. This will accelerate approximation to the real value, and ultimately speed up the entire iterative process. In addition, our algorithm does not depend on the specific MVA methods, while the accuracy is closely related to the employed MVA method.
Interlocking will occur when the client task and its server tasks share a common server tasks. The request streams reaching the common server task are usually assumed to be irrelevant. However, they are usually correlated with each other, thus leading to analytical errors. As shown in figure 4, customers’ requests arrive at the server via client A, but another path is cut off. Similarly, the same phenomenon happens when the requests arrive at server via client B. Such a request path from the same source and to the same destination is called interlocked one. Ignoring interlocking effects will result in pessimistic analytical results [8,10,11]. The method in [11] discards the affected request streams by interlocking effect, but it has at least two drawbacks. First,a complex graph search algorithm is required to identify the interlocked stream. Second,how to remove the interlocked stream is an experience-based practice.
In fact, this problem is no longer difficult to resolve by making use of our established concept of basic model. As illustrated in figure 4, we suggest a sub-model based approach to deal with the interlocking problem. To facilitate statement, we illustrate the processflow in table 3, proceeding from left to right in each row. During the iteration, the input of each sub-model comes from another sub-model. For example, in order to obtain the service time of client A, we calculate the delay of its accessing server (0.257 time units) and then plus its own execution time (0.2 time units) according to Eq. (1).To get the think time of client B, we obtain the delay of customers accessing the client A (0.457 time units), and then plus the customers’ own think time (1 time unit). To obtain service time of client B, we calculate the delay of client B accessing server (0.301 time units), and then plus the client’s own execution time (0.5 time units).
In the basic client-server model, we have analyzed its iterative convergence. To examine the convergence of interlocking case, we still focus on client sub-models A and B in figure 4. Since the output tcaof the client sub-model A is a part of customers’ think time of sub-model B, the overestimation of tcaby client A will lead to reduced tcband Ub, and vice versa. So,the solution eventually converges after a couple of iterations.
Fig. 4. Interlocking.
In this section, we conducted numerical experiments to verify performance of the proposed method, including the cases without and with interlocking. The comparison is mainly concerned to analysis accuracy and convergence speed. Current performance prediction tools rely either on analytical or on simulation models. In experiments, the simulation results are used for the benchmark of analysis accuracy,which are produced by the well-known LQN simulator lqsim [20]. Note that simulation models always enjoy higher accuracy than analytical solvers since the latter are usually based on simplified assumptions but a mathematical formula cannot cover all complexities of a real system. Thus, if the results by our method are experimentally verified to be close enough to those of simulator lqsim, it turns out competitive and applicable.
First, the case without interlocking is testifi ed, which represents the system comprising one server task, one client task and one cus-tomer task. Table 4 shows the comparison of our analytical results with simulation results.The 95% confidence interval for simulation method is ±0.001 and the error is calculated by
Table III. Solving client-server queuing network with interlocking.
Recall the model parameters in table 1, four significant output parameters with respect to throughput (λc), utilization (Uaand Us) and delay (tca) are provided for comparison. It is clearly seen that the overall error is far less than 1%.
Then, an additional experiment is particularly conducted to evaluate the solution to MLCSQN with interlocking. This experimental system involves four tasks, including one server task, two client tasks and one customer task. Similarly,five significant output parameters with respect to throughput (λc), utilization(Uaand Us) and residing time (xaand xb) are concerned. Table 5 shows a comparison of our results with the simulation results, where the error is less than 6%, being a little larger than the case without interlocking. In contrast, the analytical errors of the similar cases in [11]total up to 30%~50%, partially due to the inappropriate dismission to interlocked streams.
Complexity of the algorithm primarily depends on the number of sub-models and their complexity as well as iterations. Specifically,computational complexity can be expressed as a product form of these three factors. Amongthem, the complexity of sub-models is determined by the specific underlying MVA method. For example, as shown in [10], the operation cost for the exact MVA technique is represented as, under the situation of M tasks, K routing chains and Nkcustomers in chain k. In experiments, our algorithm ends executing in seconds, while the simulation program ranges from a few minutes to tens of minutes. As for convergence, in similar examples, Ref. [3] typically requires 12 iterations while our algorithm (both without and with interlocking) requires only 4 iterations. Note that, the computational workload in a single iteration maintains identical in both our method and the anchor.
Table IV. Comparison of our method with simulation without interlocking.
Table V. Comparison of our method with simulation with interlocking.
In addition, the utilization of server tasks falls between 0.2 and 0.9, when the maximum error is less than 10%. The method discussed in this paper can be applied to the analysis of closed-loop networks and multi-server concurrency and synchronization queuing networks.In fact, though not further presented, with the use of workload concealment [2] technique,our method can be readily extended to openloop streams and asynchronous queuing networks.
In this paper, we have proposed a novel method to resolve MLCSQN. The basic model helps us understand the nature of MLCSQN model. By decomposing the basic model into client and server sub-models, we develop an iterative solving algorithm based on MVA method. The basic model approach is then extended to MLCSQN model, which can be viewed as a directed acyclic graph with each server task and all its direct parent nodes constituting sub-models. With the utilization of partial overlap among sub-models, the iterative approximation process can be speeded up.The proposed method converges quickly and meanwhile solves the interlocking problem in multi-layered queuing networks effectively.A variety of test examples show that the max-imum difference between simulation results and analytical results is below 10%.
ACKNOWLEDGMENTS
The research was supported by the Application Research of the Remote Sensing Technology on Global Energy Internet (JYYKJXM(2017)011), the National Natural Science Foundation of China (61671332, 41701518,41771452,41771454,U1736206), National key R & D Project (2016YFE0202300), Hubei Province Technological Innovation Major Project (2017AAA123), Applied Basic Research Program of Wuhan City (2016010101010025),Basic Research Program of Shenzhen(JCYJ20170306171431656), and the Fundamental Research Funds for the Central Universities (2042016gf0033).