Abstract: There are some shortages to ascertain attribute weight based on rough set in current studies. In this paper, attribute importance represented by rough set is studied deeply. Aiming at the existing problems, algebra presentation of rough sets is proved to be more comprehensive than its information presentation, then a new method of ascertaining attribute weigh is put forward based on rough set conditional entropy. Finally, it is shown that the new method is more reasonable than the old one by an example.
Key words: Weight; Rough Sets; Attribute Significance Degree; Decision Table
doi:10.3969/j.issn.1673-0194.2009.15.034
CLC number: TP224.0Article character:AArticle ID:1673-0194(2009)15-0112-03
1 PREFACES
Weight, which reflects positions and functions of various elements in the process of judging and decision-making, is very crucial and its accuracy directly affects final results. There are several common methods to ascertain attribute weight, such as experts scoring, the fuzzy statistics and the sort contrast dualism. But in these common methods, ascertaining attribute weight is affected excessively on experts experience and knowledge, so that they sometimes are not able to reflect the actual situation objectively. However, rough sets theory fully reflects the objectivity of the data, without offering any more prior information except the data set to be dealt with. Therefore, some experts have researched on the method of ascertaining attribute weight based on rough sets theory.
Rough sets theory [1] is a method about expressing, studying and generalizing to study incomplete and uncertain knowledge and data. This was first put forward by professor Pawlak from Warsaw University of Technology in Poland in the early 1980’s. Rough sets has been successfully applied in many areas, such as expert systems, machine learning and pattern recognition, all because of its character——no need of prior information [2-6]. The method to ascertain condition attribute weight in the decision table has been introduced in the document [7], and it has been quoted in different areas by many scholars. This paper analyzes the shortages of the method in the document [7], gives a new method to ascertain attribute weight based on rough set, and proves the new method reasonable.
2 ANALYSIS ON THE ORIGINAL METHOD OF ASCERTAINING ATTRIBUTE WEIGHT BASED ON ROUGH SETS THEORY
The importance of various attributes (indicators) needs to be ascertained because they are not in the same. In the rough sets theory, we should take away an attribute first and then consider the change in classification without the attribute. If the classification changes too much after removing the attribute, it is of bigger intensity and greater importance. Otherwise, it is of smaller intensity and lower importance [4-7]. According to this characteristic, the literature [7] defines the importance of the attributes (indicators) as followed.
Definition 1[1]: In the decision table S=(U,C,D,V,f), the dependence gB(D) of decision attribute D to conditional attribute sets (indicators) BC is defined as follows:
gB(D)=|POSB(D)|/|U|
Definition 2[7]: (Algebra presentation definition of attribute) In the decision table S=(U,C,D,V,f), the significance degree of conditional attribute (indicator) c is defined as follows:
Sig(c)=gc(D)-gc-{c}(D).
The weight W0(c) of conditional attribute (indicator) c is defined as:
W0(c)=Sig(c)∑a∈CSig(a)
Note: The definition illustrates that the greater Sig(c) is, the more important the conditional attribute (indicator) c is, so as to the weight of the attribute.
In order to explain the shortage of the definition better, take an example as follows.
CHINA MANAGEMENT INFORMATIONIZATION /
/ CHINA MANAGEMENT INFORMATIONIZATION
U/{a,b,c}={{x1,x2},{x3},{x4},{x5},{x6,x7},{x8,x9}, {x10,x11}}.
U/D={{x1,x5,x6,x8,x11},{x2,x3,x4,x7,x9,x10}}.
Therefore:POS(a,b,c)(D)= {x3,x4,x5}; POS(a,b)(D)= {x3,x4,x5}; POS(b,c)(D)= {x3,x4,x5}; POS(a,c)(D)= {x3,x4,x5}. As a result:
ga,b,c(D)=|POS{a,b,c}(D)|/|U|=3/11.
Similar that:ga,b,c(D)=g{a,c}(D)=g{b,c}(D)=3/11.
Then:Sig(a)=g{a,b,c}(D)-g{b,c}(D)=0;Sig(b)=g{a,b,c}(D)-g{a,c}(D)=0; Sig(c)=g{a,b,c}(D)-g{a,b}(D)=0.
From the example above, we can see that the weight of conditional attributes can not be calculated accurately in this case. The conditional attributes of a, b, c are unnecessary in the decision Table 1 in respect to simple attribute. The literature[8] simplified the decision table and figured out the importance of the attributes for this problem. At this time, the importance of at least one attributes is not 0, and then calculated it with the formula of ascertaining weight in literature[7]. Although it can guarantee at least one weight not to be 0, it ignores the practical significance of the attributes form the 0 weight and the reduction weight. In order to solve this problem, we offer a new method to ascertain weight.
3A METHOD TO ASCERTAIN WEIGHT BASED ON CONDITION INFORMATION ENTROPY
3.1 Condition information entropy
The following definition establishes the relationship between knowledge of rough sets theory and information entropy, so that we can express the main concepts and operations of rough sets theory from the information point of view, which is usually called information presentation of the rough sets theory.
Definition 3: In the decision table S=(U,C,D,V,f), any attribution set SC∪D in U is a random variable defined in the algebra subset of U. The probability distributing of the set could be confirmed as followed:
[S∶p]SiS2…Si
p(Si) p(S2) … p(Si)
In this formula, p(Sj)=|Sj|/|U|,j=1,2,…,t.
Definition 4: In the decision table S=(U,C,D,V,f),compared to the conditional attribute sets C(U/C={C1,C2,…,Cm}), the conditional entropy H1(D|C) of decision attribute sets D(U/D={D1,D2,…,Dk}) is defined as followed:
H(D/C)=-∑mi=1p(Ci)∑kj=1p(Dj|Ci)logp(Dj|Ci)
In this formula, p(Dj|Ci)=|Dj∩Ci|/|Ci|(i=1,2,…,n;j=1,2,…,n).
3.2 Attribute important degree of condition information entropy
Definition 5: (information presentation definition of attribute) In the decision table S=(U,C,D,V,f), , the significance degree of conditional attribute (indicator) c is defined as follows:
Sig(c)=I(D|(C-{c}))-I(D|C)
The greater the value of Sig, on the known of C conditions, the more important the c to the decision D. Compared to the importance of attributes in the form of algebra, while its definition considered the attribute impact on the subset determined by discussion domain, information definition considered uncertain classified subset. It means that, though the important degree of attribute is 0 under the definition of algebra, it is not necessarily 0 under the definition of information, but if it is 0 under the definition of information, it is definitely 0 under the definition of algebra[9].
3.3 The method to ascertain weight based on condition information entropy
Definition 6: In the decision table S=(U,C,D,V,f), , the significance degree of conditional attribute (indicator) c is defined as follows:
NewSig(c)=I(D|(C-{c}))-I(D|C)
w(b)=SigNew(b)∑a∈CSigNew(a)
4 EN EXAMPLE
Now, the new method to ascertain weigh is used to calculate the weights of several attributes in the decision table 1.
5 CONCLUSION
We analyze the existing method to ascertain the weight based on rough set theory, aiming at its shortage, work out a new way based on information entropy with the character that the rough set’s information presentation is more comprehensive than the algebra presentation, and verify it with an example. The new method of ascertaining weight is more comprehensive, rational, universal and reasonable than the existing method.
References
[1] Pawlak Z. Rough Sets: Probability Versus Deterministic Approach[J]. International Journal of Man-Machine Studies, 1988, 29(1): 81-95.
[2] Hu X H,Cercone N. Learning in Relational Data Bases: A Rough Set Approach[J]. Computational Inteligence, 1995, 11( 2) : 323-338.
[3] Swiniarski S, Hargis L. Rough Set as a Front End of Neural-Networks Texture Classifiers[J]. Neuro -computing, 2001, 36(1): 85-102.
[4] Zhang W X, Wu W Z, Liang J Y. The Theory and Method of Rough Sets[M]. Beijing: Science Press, 2000.
[5] Zhang W X, Chou G F. Uncertain Decision Based on Rough Sets[M]. Beijing: Tsinghua University Press, 2005.
[6] Miao D Q, Fan S D. The Calculation of Knowledge Granulation and Its Application[J]. Systems Engineering: Theory Practice, 2002(1):48-56.
[7] Cao X Y, Liang J G. The Method of Ascertaining Weight Based on Rough Sets Theory[J]. Chinese Journal of Management Science, 2002,10(5): 98-100.
[8] Zhou A F, Chen Z Y. How to Choose the SC Partner Based on Tough Set[J]. Logistics Technology, 2007, 26(8): 178-181.
[9] Wang G Y, Yu H, Yang D C. Decision Table Reduction based on Conditional Information Entropy[J]. Chinese Journal of Computers, 2002, 25(7):759-766.