亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

Calculate Joint Probability Distribution of Steady Directed Cyclic Graph with Local Data and Domain Casual Knowledge

2018-07-24 00:46:50QinZhangKunQiuZhanZhang

China Communications 2018年7期

Qin Zhang＊, Kun Qiu Zhan Zhang

1 School of Computer Science and Engineering, Beihang University, Beijing 100191, China

2 Tsingrui Intelligence Technology, Ltd., Beijing 100191, China

Abstract: It is desired to obtain the joint probability distribution (JPD) over a set of random variables with local data, so as to avoid the hard work to collect statistical data in the scale of all variables. A lot of work has been done when all variables are in a known directed acyclic graph (DAG). However,steady directed cyclic graphs (DCGs) may be involved when we simply combine modules containing local data together, where a module is composed of a child variable and its parent variables. So far, the physical and statistical meaning of steady DCGs remain unclear and unsolved. This paper illustrates the physical and statistical meaning of steady DCGs, and presents a method to calculate the JPD with local data, given that all variables are in a known single-valued Dynamic Uncertain Causality Graph (S-DUCG), and thus de fines a new Bayesian Network with steady DCGs.The so-called single-valued means that only the causes of the true state of a variable are speci fied, while the false state is the complement of the true state.

Keywords: directed cyclic graph; probabilistic reasoning; parameter learning; causality;complex network

I. INTRODUCTION

Bayesian network (BN) is a well-known model to represent uncertain causalities and make probabilistic reasoning [1]-[15]. As being defined, a BN is based on a DAG and represents a JPD over a set of random variablesX1,X2,…,Xn. According to DAG, JPD is factorized as the multiplication of a group of conditional probabilities contained in the corresponding conditional probability tables (CPTs) as shown in (1), in whichxiis an instance ofXi,pa(xi)denotes the parents ofxi, and P(xi|pa(xi)) denotes the conditional probability ofxi.

A CPT corresponds to a module composed of a child variable and its parent variables, and can be obtained from the local statistical data within the module. A DAG is composed of a set of modules. Each module corresponds to a CPT that is equivalent to a local JPD over the variables included in the module. In [16],module is called family, which means that the modules are in an order and cannot form any steady/static DCG when they are combined together by fusing the same variables in dif-ferent modules, i.e. family can only be used in DAGs. When steady DCGs are involved during the module combination, family is not a proper word. For a known DAG, once we collect the local statistical data for modules respectively, we can easily obtain the local CPTs or JPDs, and thus get the final JPD over all variables according to (1). It is not necessary to collect the statistical data in the scale of all variables, because according to(1), we need only to know the local CPTs or JPDs of modules. This makes the collection of statistical data much easier than to collect the statistical data with regard to all variables,because the latter means to exponentially increase the amount and difficculty of collecting statistical data. For example, suppose 1) there are 100 variables contained in 20 modules,2) each module has 10 variables (there are some variable overlap among modules), 3)every variable has two states, and 4) each state combination appears 10 times in the statistical data. The number of samples to get the final JPD over the 100 variables directly is 2100×10>1.267×1031. If we can calculate the final JPD from the local JPDs of modules, the number of samples is reduced to 20×210×10=2.048×105.

A lot of work has been done in this area[17]-[22]. They are all DAGs, or extended as DAGs with time tags in dynamic cases. However, steady DCGs may appear in some cases[23]-[24]. An example is as shown in figure 1,which is a SOS DNA repair network, in which nodes represent genes and directed arcs represent the regulatory relationships [24].

One of the reasons of steady DCGs is that measurements cannot be so accurate to distinguish the earlier cause and later consequence when feedback loops exist [25]. Another reason is that the cause and consequence may be immediate (static), e.g. action force and reaction force, while the cause and consequence are reversal in different instances. Some work related to steady DCGs has been done. Ref.[26] presents that DCG is a family of probabilistic distributions rather than a single distribution, and gives only the consistency checking without calculations. Ref. [27] proposes to use the conditional independence and non-recursive linear structural equation to handle DCGs, without showing the detail how to calculate JPD. Ref. [28] de fines a dependence network (DN) allowing DCGs, in which the Gibbs sampling method is used to get the samples of DN, and then the samples are used to represent a JPD over random variables. However, DN is viewed as a group of conditional probability distributions rather than a unique joint distribution. Ref. [29] regards DCG as a re- parameterization of an undirected model and represents the joint distribution as a normalized product of non-negative interventional potential functions. Ref. [30] uses Gibbs sampling to generate data, and then calculate the frequency of samples (JPD), in which a sample ordering has to be determined, because different sampling sequence may result in different distributions, and even cannot get equilibrium distribution. In general, the physical and statistical meanings of DCGs remain unclear and unsolved.

This paper demonstrates the physical and statistical meaning of steady DCGs, and presents a method to calculate the JPD with only local statistical data in modules composing steady DCGs.

Fig. 1. Regulatory interactions of a repair network.

Fig. 2. Illustration to S-DUCG model.

Fig. 3. S-DUCG with DCG.

Fig. 4. Three modules contained in figure 3.

In [31]-[38], a model called Dynamic Uncertain Causality Graph (DUCD) is presented to represent the uncertain causalities explicitly, which is composed of single-valued DUCG (S-DUCG) and multiple-valued DUCG (M-DUCG). The so calledS-DUCG means that only the causes of the true state of a variable can be speci fied, while the false state is the complement of the true state and the causes of the false state cannot be specified separately. The so calledM-DUCG may have more than two states and the causes of all states can be speci fied separately. Usually,a DUCG corresponds to a BN. But as shown in [34], DUCG allows steady/static DCGs,which is not allowed in BN. Therefore, DUCG is more flexible than BN. However, so far the parameters in a DUCG are assumed known or be given by domain experts. How to use local data to obtain the JPD over all variables is still unsolved.

The purpose of this paper is to model the local data in terms ofS-DUCG and calculate the JPD over all variables in steady DCG cases, given thatS-DUCG is the causal structure based on domain expert knowledge. The realistic/physical and statistical meanings of steady DCGs are also illustrated. Note that theS-DUCG with steady DCGs corresponds to a BN with DCGs, which is not allowed in the de finition of BN. This means that theS-DUCG with steady DCGs defines a new type BN allowing DCGs and provides an approach to solve this newly de fined BN.

Section II introducesS-DUCG model briefly. Section III provides an accurate example with steady DCG, which illustrates the reality and statistical meaning of steady DCGs. Section IV presents how to learn the parameters in anS-DUCG given by domain experts, and then calculate the JPD over all variables. Section V brie fly summarizes this paper and outlines the future work.

II. BRIEF INTRODUCTION TO S-DUCG

The simplifiedS-DUCG model is introduced as follows (see [32] for the wholeS-DUCG model): DenoteXnas the child event of modulen,Vias parent eventiofXn, whereV∈{B,X}. In general,Bdenotes a basic/root event that does not have any input/cause;Xdenotes an event that has at least one input and may or may not have output/consequence. As assumed, there must be at least oneB-type event as the ancestor of anyX-type event. The simpli fied model ofS-DUCG is illustrated as shown in figure 2 and (2) in terms of modulen.

The corresponding equation inM-DUCG is as shown in (2’) and the explanation is ignored in this paper (see [32]-[38] for the explanation), because it is out of the scope of this paper.

Figure 2(b) illustrates figure 2(a): eventXnis divided as a set of eventsXn;icaused byVirespectively, the uncertain causality betweenXn;iandViis modeled as a virtual random eventPn;i, i.e.Xn;i=Pn;iVi, andXn;iare in OR relationship, which is actually the Noisy-OR case in [1].

III. REALITY AND STATISTICAL MEANING OF S-DUCG WITH STEADY DCGS

Suppose the local statistical data of the three modules in figure 3 or figure 4 are as shown in table 1, in which the first column indicates the state combination, the second column indicates the number of samples, and the third column indicates the corresponding local JPD.The three groups of data are collected by three domain experts concerning three subjects respectively:X1and its causes/parentsB5andX2,X2and its causes/parentsB4andX3, andX3and its causes/parentsX1andX2. According to (2),we have

We can further expand (3) as (4)-(6) by substituting the equations in (3) into each other,while applying Assumption 41Assumption 4: Any state of a variable cannot be the cause of any state of the same variable at a same time.Where a state of a variable is equivalent to an event. For simplicity, the assumptions of DUCG are indexed in a series of papers.[34] to discard logic cycles.

In which, “+” means XOR, where the disjoint cut-set techniques expressed as two equations2Where is the second subscript of variable Vi, and C is usually called cut-set that is an event product.in [32] are applied. Meanwhile, accordingto the de finition ofS-DUCG, we have (7)-(9).

Table. I. Local samples and JPDs of three modules respectively.

On the left and right sides of (4)-(9), by replacing the upper case letters with lower case letters, we get the corresponding probability expressions, in which,It is seen that only eight parametersp1;2,p1;5,p2;3,p2;4,p3;1,p3;2,b4andb5need to be obtained from the local statistical data in table 1.

To find the eight parameters, we can have the local JPD expressions equal to the corresponding local JPDs in table 1. For example,by applying (4) and (5), we have

Similarly, we have additional 23 equations as shown in (11)-(33) in the supplementary material. It is validated that whenp1;2=0.8,p1;5=0.5,p2;3=0.5,p2;4=0.9,p3;1=0.5,p3;2=0.6,b4=0.5 andb5=0.5, (10)-(33) are exactly satisfied. We will discuss how to obtain the eight parameters from (10)-(33) in the next section.

Based on the obtained eight parameters and(4)-(9), or equivalently, based on either figure 3 or figure 4 with the obtained eight parameters, we can calculate the JPD over all five variables exactly as shown in table 2.

For example, from (4)-(6), we have

The other similar equations (35)-(65) are given in the supplementary material of this paper.

It is easy to see that the local JPDs shown in table 1 are just the marginal distributions in table 2. For example,from table 2, we can easily calculate the marginal distributionequal to the local JPD in table 1(a) exactly.

This example exhibits the realistic and the statistical meanings of steady DCGs. Note that the local data or JPDs observed as shown in table 1 are actually the results affected by the other two modules through the steady DCGs.Therefore, we must have the expression corresponding to the local JPD with the wholeS-DUCG structure and parameters as illustrated in (10).

The corresponding BN with steady DCGs newly defined by theS-DUCG in figure 3 is shown in figure 3’, in which, the CPTs can be obtained either from table 1 directly or calculated from theS-DUCG with parameters learned from table 1. The results are shown in table 3. Meanwhile, the probability distributions ofB4andB5remain the same, i.e. {0.5,0.5} and {0.5, 0.5} respectively.

For example, according to table 1(a), we have

While according to theS-DUCG shown in figure 3 and (5), (10) with the 8 learned parameters, we have

They are the same. However, the BN shown in figure 3’ is not defined before, because it includes DCGs. In other words, (1) is not satis fied.

IV. PARAMETER LEARNING FROM LOCAL DATA

Given a knownS-DUCG by domain experts,the left task is to learn the parameters of theS-DUCG from the local data collected in the modules composing theS-DUCG. The meth-ods are many, such as those in [39]-[41]. As an example, we may use the least-square method.For our example shown in Section III, the goal functiong(p, b) to be minimized in the learning process is de fined as in (66).

Table II. JPD over all the five variables in fig. 3 or fig. 4.

Table III. CPTs of the newly de fined BN with steady DCGs in fig. 3.

Fig. 3’. The newly de fined BN with steady DCGs corresponding to Fig. 3.

In whichyiis the local JPD in table 1 indexed byi∈{1,…,24} corresponding to the 24 equations (10)-(33) respectively, e.g.y1=0.237 in (10), while the left side of (10) isf1(p, b)calculated from the eight parameters denotedas (p, b), where p≡{p1;2,p1;5,p2;3,p2;4,p3;1,p3;2} and b≡{b4,b5}. Our purpose is to find a set of (p, b) that makesg(p, b) the least.Whenmin{g(p, b)}=0, we find the exact (p,b) satisfying both the 24 local JPDs and the 24 equations. Whenmin{g(p, b)}≠0, the (p,b) is an approximation. The error can be either the inexactness of local statistical data or theS-DUCG given by domain experts. In this paper, we assume that theS-DUCG shown in figure 3 is exact.

Table IV. Local samples and JPDs of three modules respectively with 10% disturbance to table 1.

Fig. 5. Comparison between table 2 and table 5.

We choose MatLab as the tool of learning(p, b), LM (Levenberg-Marquardt) [42] as the gradient descent iterative optimization algorithm for findingmin{g(p, b)}, and the convergence error ofg(p, b) as 10?5. To start the learning process, we take p={0.5, 0.5, 0.5, 0.5,0.5, 0.5} and b={0.5, 0.5} as the initial (p, b).

Two experiments are performed: 1) with the local data shown in table 1, and 2) with the local data shown in table 4 that is a result of disturbing the data in table 1 at a ratio of 10%randomly.

In experiment 1, we get p={0.8, 0.5, 0.5,0.9, 0.5, 0.6} and b={0.5, 0.5}, which are the exact parameters mentioned in Section III. In experiment 2, we get p={0.7871,0.4891, 0.4899, 0.8728, 0.5008, 0.6004}and b={0.4944, 0.4584} withmin{g(p,b)}=0.0016.

According to (34)-(65) and the parameters in experiment 2, we calculate the JPD over all the five variables as shown in table 5. The comparison between table 2 and table 5 is as shown in figure 5.

V. CONCLUSION

This paper demonstrates the reality and statistical meaning of steady DCGs, in the case that the causal structure is inS-DUCG model. By combining statistical data and domain casual knowledge, this paper also presents the method to calculate the JPD over all variables with only local statistical data in modules composing steady DCGs through an example, which provides a great convenience in data collection. The BN with steady DCGs is de fined in this paper as the correspondingS-DUCG. This methodology may help to solve the feedback loop problems in such areas as “biological sys-tems involving multiple cell populations” [25],and is therefore with a signi ficant importance.Note that theS-DUCG model introduced in this paper is simplified. The original is more complex, including logic gates and conditional directed arcs additionally [32]. We believe that the originalS-DUCG is also applicable in solving similar problems.

Unfortunately, we do not have the domain knowledge to provide a more realistic example to apply this methodology. We expect that someone knowable may find it useful to solve his/her domain problems.

It should be mentioned that theM-DUCG without DCGs (i.e. DAG) is also applicable to model the JPD over all variables with only local statistical data, although we do not discuss this case due to the limited length of the paper.It should also be mentioned that we fail to validate theM-DUCG with DCGs, which might be the future work.

ACKNOWLEDGEMENT

This work was supported by the National Natural Science Foundation of China under Grant 71671103.

Supporting Information

The supplementary materials are available online at ieeexplore.ieee.org. The supplementary materials are published as submitted, without typesetting or editing. The responsibility for scienti fic accuracy and content remains entirely with the authors.

China Communications2018年7期

China Communications的其它文章: Secret Key Generation Based on Two-Way Randomness for TDD-SISO System; Exploiting Geo-Social Correlations to Improve Pairwise Ranking for Point-of-Interest Recommendation; Design and Implementation of an Adaptive Feedback Queue Algorithm over OpenFlow Networks; Robust Background Subtraction Method via Low-Rank and Structured Sparse Decomposition; A Controller-Based Architecture for Information Centric Network Construction and Topology management; Recommending Authors and Papers Based on ACTTM Community and Bilayer Citation Network