Cheng DAI, Ling ZHUO, Hui-qian GONGLI, Yong LIAO, Xue-wu DAI
(1Chongqing Power Information & Communication Branch Company, Chongqing 401120, China) (2Center of Communication and TT&C, Chongqing University, Chongqing 400444, China) (3Faculty of Engineering and Environment, Northumbria University, Newcastle upon Tyne NE1 8ST, United Kingdom)
Abstract: State grid Chongqing electric power company information and communication branch (referred to as the company) is responsible for the customer service of Chongqing electric power information and communication system. With the improvement of service quality and customer satisfaction requirements, customer service work should be changed from passively accepting customer business consulting to actively providing customers with business services. However, at present, the company does not carry out hierarchical management of customers, and all customers carry out services according to unified standards, which is not conducive to the effective allocation of customer service resources. In order to provide better experience for customers and achieve accurate service, this paper proposes an algorithm combining spectral clustering with fuzzy clustering (fuzzy spectral clustering algorithm for short), which is based on feature vector extraction to cluster the customers of power companies and excavate potential information in a large number of customer data. The case analysis shows that compared with other typical clustering indexes, the proposed algorithm can cluster the customers of power companies more effectively.
Key words: Power company, Customer group, Fuzzy spectral clustering, Accurate service
Power customer behavior analysis is to analyze the relationship between the relevance and the similarity of power consumption data, to find out the customer’s potential behavior habits, and to segment customer detailed, which is important for guiding customers’ electricity consumption behavior and energy-saving transformation[1]. The original method of customer division is extensive. Customers are generally divided only by a single feature attribute. For example, early power companies usually divide customers into diamond customers, platinum customers and gold customers according to the amount of customers’ consumption. The method of subdividing the amount is very simple and easy to implement. But with the diversification of needs and product services and customer, this simple method of division shows many shortcomings and defects. For example, the needs of platinum customers are not exactly the same. Some platinum customers have more demand for electricity stability or electricity consumption. Customers have requirements for products or services. Customers’ requirements for products or services become more and more professional and severe. There are many problems in traditional data analysis and processing methods, such as insufficient computing power and low processing efficiency, which can’t fully meet the needs of rapid analysis of power company customer data in big data environment.
As a key branch of data mining[2], clustering analysis has been applied in the field of power customer behavior analysis because it can analyze the data owned globally to obtain the distribution characteristics of the data. Reference [3] uses clustering algorithm to classify users based on traditional industry division, but it only analyzes the load characteristics of power customers, and does not consider the customer’s electricity habits. In [4], the fuzzy C-means clustering algorithm is proposed to classify substation loads, and the substation load is divided into industrial, agricultural, municipal and other categories. It is concluded that this method is significantly better than other clustering methods based on equivalence relations. In [5], the fuzzy clustering method is applied to the field of power sales. It provides a reference for the power supply enterprises to establish reasonable electricity prices and improve the effectiveness of load management, and the load curve characteristics are used to classify power users. The above traditional clustering methods do not consider the challenges of intelligent power behavior analysis in the reliable storage, efficient management and rapid analysis of massive data in the big data environment.
In addition, there are many clustering algorithms applied to customer data analysis in other fields. In [6], the clustering method based on swarm intelligence is used in the analysis of customer behavior. Clustering is performed by selecting different group similarity coefficients, and recursive algorithm is used to collect clustering results to obtain customer groups with different consumption characteristics. Literature [7] analyzes customer data and uses density-based clustering methods to collect customer characteristics of high-end consumption patterns, and then use these characteristics to grasp customer consumption patterns and provide corresponding products and services to improve satisfaction. In [8], the fuzzy C-means clustering algorithm is used as the method of customer clustering, and the various characteristics of the customer group are quantitatively analyzed. Finally, the satisfactory customer clustering results are obtained. Literature [9] proposed an algorithm based on minimum clustering unit, which is based onK-Means clustering algorithm and comprehensive analysis grid clustering algorithm to optimize the large impact which caused by the initial point of improper classification results inK-Means clustering algorithm, and it is applied to business customer relationship management. However, the results of the above clustering methods require further improvement in cluster compactness and resolution.
In order to further improve the user experience of power company customers and solve the problems of poor optimization ability and compactness of existing clustering algorithms, this paper proposes an algorithm combining spectral clustering and fuzzy clustering, which is called fuzzy spectrum algorithm. There are two steps: (1) rationally optimize the feature vector selection process in the NJW (Ng-Jordan-Weiss) algorithm[10] according to the structural characteristics of the algorithm application data; (2) in the formed feature vector cluster space, The fuzzy clustering optimization algorithm is used to replace the last stepK-Means clustering in the traditional NJW algorithm to improve the global optimization ability of the traditional NJW algorithm and achieve reasonable clustering analysis of the data. This method is applied to the data analysis of power companies, and the customer clustering of power companies is obtained, and the algorithm is compared with other algorithms through internal evaluation indicators, external evaluation indicators and global evaluation indicators. The feasibility of the proposed algorithm is verified.
After the clustering test results are obtained, it is necessary to compare the clustering results of the method proposed in this paper with appropriate test results indicators for comparison and comprehensive evaluation and analysis. There are many kinds of clustering quality evaluation indicators. In this paper, we choose three kinds of commonly used evaluation indicators as the evaluation indicators of the clustering results. We selected the Compactness Index (CI) as an internal evaluation indicator, Fowlkes-Mallows (FM) and Adjusted-Rand (AR) as external evaluation indicators, and Degree of Separation (DS) as a global evaluation index.
A.Spectral clustering algorithm
The spectral clustering algorithm does not require data distribution in the dataset. But it can identify any type of clusters, and achieve good clustering effects when clustering high-dimensional datasets.
The core of the spectral clustering algorithm is to divide the data set. The actual classification criteria are different, but different spectral clustering data sets include the following three steps:
(1) The construction of matrixCis used to represent the relationship between samples in the data set;
(2) Calculating the firstmeigenvalues ofCto obtain the corresponding feature vector, which is used to construct the feature vector spaceSof the data set;
(3) Clustering the data points in the feature vector spaceSby using other clustering algorithms, and the obtained results are remapped back to the original data space.
In [11], a better spectral clustering algorithm NJW algorithm is proposed. The specific process is to first give the final cluster numbermbefore the algorithm is implemented. The NJW algorithm calculates themsmallest features of the standard Laplacian Matrix Value, and then obtain the feature vector spaceSaccording to the feature vector corresponding to the feature values, and then use theK-Means algorithm to cluster the data points in the obtained feature vector space, and map the final results back to the original data.
B.Obtaining feature space
In the actual application process, the data structure characteristics of different types of data sets are generally different, but the NJW algorithm always clusters the vector space formed by the firstmfeature vectors, and the obtained clustering results are often not ideal. In [12], it is proved that the NJW algorithm can obtain better clustering results when the difference between them-th andm+1th eigenvalues of the standard Laplacian matrix is sufficiently large, and in the literature [11] a hypothesis is proposed: the setting of the parameterM(M>m) should be determined according to the actual situation. At this time, the firstMfeature vectors are used to form a new feature vector space. The simplified selection method for the parameterMis:
Given the final clustering result numbermand data setY={Y1,Y2,…,Yi,…,Yn}, whereYi={Yi1,Yi2,…,Yik}, i.e. each sample in the data set, has a k-dimensional attribute. A specific method of generating the feature vector spaceSis as follows.
(1) Construct a similarity matrixWbetween samples in the dataset, matrix elementWij=exp[-L2(yi,yj)/(2σ2)], whereL2(yi,yj) is the Euclidean distance between theiandjsample points, andσis the standard deviation of the sample data of all sample data in the dataset.
(2) Construct a standard Laplacian matrixLstd. The diagonal elements of the identity matrixEare replaced by the sum of the elements of the similarity matrixWto form a new matrixA. The standard Laplacian matrix is shown in equation (1):
Lstd=A-0.5WA-0.5
(1)
(3) Determination of the parameterMrange:
1) First, all the eigenvalues ofLstdare obtained, and the eigenvalues are arranged in ascending order, that is,π={γ1,γ2,…,γn}, andγ1≤γ2≤…≤γn;
2) Next, calculate the difference between adjacent feature values inπto obtain a sequence, that is,O={o1,o2,…,on-1}, whereoi=oi+1-oi,(1≤i≤n-1);
3) Determine whetheromsatisfiesom-1
4) Letiincrement fromm, know that wheni=l1is the first maximum of sequenceO, continue to increase the value ofiuntil the second maximum is found, at this timei=l2. The value range of the obtained parameterMis [l1+1,l2+1], wherem (4) After determining the parameterMaccording to the actual data set in combination with the above steps, the eigenvector matrix corresponding to the firstMeigenvalues is obtained, and the eigenvector matrixSis formed by the eigenvector matrix. The matrixXconsisting of the firstMeigenvectors obtained in the above steps, the traditional NJW algorithm finally uses theK-Means algorithm to cluster the matrixX, and when thej-throw ofXis divided into the classi, the data is concentrated. SampleYjis assigned to clusteri. However, the selection of the initial center of theK-Means algorithm has a great influence on the final clustering result, and the method of finding the optimal result by climbing the mountain often cannot obtain the global optimal solution, which leads to the unsatisfactory result of the NJW algorithm. Therefore, we introduce the fuzzy clustering algorithm into the NJW algorithm, and replace theK-Means clustering of the last step in the algorithm to complete the feature vector space clustering. A.Data normalization From the feature matrixXobtained in the previous step, there areMobjects in the matrix, and each object haskfeatures, namely: Wherexijrepresents thej-th feature of thei-th object. Then all the feature indicators of theMobjects form a matrix, denoted asX*=(xij)M×k, and thenX*is the feature index matrix ofX. Since the dimensions and magnitudes of thekfeature indicators are not necessarily the same, in the operation process, the effect of a certain size index on the classification may be highlighted, and the effect of some small-scale features may be reduced or even excluded. Data normalization aligns each indicator value to a common range of numerical characteristics. The specific normalization methods are: (1) Standard normalization: Calculate the mean and variance for thej-th column of the feature matrix A1, and then transform, (2) (2) Mean normalization: Calculate the standard deviationσjfor thej-th column of the feature index matrixX*, and then transform, (3) (4) (4) Normalization of maximum value: For thej-th column of the feature index matrixX*, the maximum valueMj=max{x1j,x2j,…,xMj},j=1,2,…,kis calculated and then transformed: (5) The specification method adopted in this paper is a standardized method, and the element values in the feature index matrix are normalized by the two parameters of mean and variance. B.Constructing fuzzy similar matrix By judging the similarity between the elements in the feature index matrix, the similarity is classified into one class. The degree of similarity between the elementsxiandxjinX*is represented by the numberrijin [0, 1]. The degree of similarity betweenxi=(xi1,xi2,…,xik) andxj=(xj1,xj2,…,xjm) in the normalized feature index matrixX*is denoted asrij∈[0,1], and the fuzzy similarity matrixR=(rij)n×n*between the objects is obtained. The method for determining the similarity coefficient is: the closeness method. When the characteristic index vectorxi=(xi1,xi2,…,xik) of the objectxiis a fuzzy vector, that is, xif∈[0,1](i=1,2,…,M;f=1,2,…,k) The similarity degreerijofxiandxjcan be regarded as the closeness of the fuzzy subsetxiandxj, and the calculation method is as formula (6): (6) We can determine the similarityrijby distance betweenxiandxjdue to the largerd(xi,xj), the lower the similarity ofxiandxj. In general, we let rij=1-c(d(xi,xj))α Wherecandαare positive numbers such thatrij∈[0,1]. The distance between the two subsets is calculated as shown in equation (7): (7) C.Fuzzy clustering We use the fuzzy transitive closure method to cluster the transitive closuret(R) of the fuzzy similar matrixR. The steps are as follows: (1) Use the square self-synthesis method to find the transitive closuret(R) of the fuzzy similar matrixR; (2) Select the confidence level valueγ∈[0,1] to find the truncation matrix oft(R), which is an equivalent Boole matrix onX; So far, we obtain the clustering result of fuzzy clustering on feature vector space, and the result is mapped back to the original dataset. The algorithm flow chart of the above process is shown in Fig.1. Fig.1 Fuzzy spectral clustering algorithm flow chart In this paper, we selected the Compactness Index (CI) as an internal evaluation indicator, Fowlkes-Mallows (FM) and Adjusted-Rand (AR) as external evaluation indicators, and Degree of Separation (DS) as a global evaluation index. CI: The CI indicator is used to describe the degree of concentration of the data set in the cluster. The more similar the data in the cluster is, the higher the compactness will be, indicating that the clustering effect is better. The calculation method of the compactness in the overall cluster is as shown in the formula (8), wherenrepresents the number of samples of the cluster, each sample is ap-dimensional row vectorX={x1,x2,…,xn},X?Rp, andcrepresents the number of clusters.uijis thec×nmatrix, the sample belongs to the degree of membership of the fuzzy subset;uijis the support of the datajfor thei-th class, the larger the value, the larger the information amount;Vis thec×pmatrix, indicating the clustering prototype. It can be seen from the formula that the calculation of the CI value is composed of two parts. The first part is the minimum and maximum values of the membership degree and the membership degree complement, which are defined by the ratio of the two. The smaller the value of this part, the clearer the cluster after clustering and the better the compactness. The second part is the average distance between two data points in the data set sub-cluster sample as the criterion for judging. If the distance between two or two data points in a sub-cluster is smaller, it means that the compactness is better, andn(i) is the number ofi-th data points. This indicator can increase the sensitivity of the indicator, so that the validity indicator can correctly determine the existence of the small class, and then determine the optimal number of clusters. The overall compactness is obtained by (8) summing the compactness of each sub-cluster, and the smaller its value, the more compact the representative cluster. The external evaluation index refers to first applying the clustering algorithm to the standard test data set with clear categories, and then using the relevant indicators to calculate the accuracy of the algorithm on the data set. Typical external clustering evaluation indicators include FM indicators and AR indicators, which are classic indicators for testing the accuracy of clustering results. (1) FM indicator The calculation method of the FM indicator is as shown in formula (9). (9) The value of the FM indicator is between 0 and 1. The larger the value, the closer the cluster after clustering is to the standard cluster. If and only if the clustering result is completely consistent with the standard cluster,IFM=1. (2) AR indicator First, we set the relevant parameters as follows: (10) (11) (12) (13) (14) The range of the AR indicator value is also between 0 and 1, and the larger the value, the closer to 1, the better the division result. Degree of Separation (DS): The degree of separation indicates the degree of separation between clusters. The clearer separation between the clusters, the higher the separation. The formula for separation between clusters is shown in equations (15) and (16): (15) (16) The resolution of all sub-clusters is added to obtain the overall resolution, which is calculated as shown in equation (17): (17) WhereFijis used to indicate fuzzy deviation, andαis used to indicate penalty factor. The default value is 0.5, and the fuzzy deviation is used to amplify the characteristics of the membership matrix. The multiplication of the fuzzy deviations of the two clusters indicates the degree of separation of the two fuzzy sets. The clearer separation between the clusters, the higher the degree of separation, and the smaller the value. Before the data clustering, the indispensable link is data preprocessing. The data preprocessing work mainly unifies and standardizes various forms of data and non-standard data. This makes it possible to use these data directly during clustering. A.Data preparation The main work of data preparation is to obtain customer information through various channels, select appropriate features according to the influence of variables on the clustering results as the main object of cluster analysis, and lay a data foundation for cluster analysis processing. (1) Selection of feature attributes The data selected in the data preparation process should meet the requirements of the module. The feature data cannot be selected arbitrarily. It is necessary to select some relevant feature data with research significance. Secondly, the selected data is as concise as possible, so as to reduce the work for the later data processing. After investigating and researching the service content and customer behavior attributes of the power ICT, this paper has developed indicators that can reflect the analysis of customer behavior patterns, conduct customer behavior analysis and customer segmentation, so that managers can optimize business organization and improve. Service mode. As shown in Table 1, the clustering index data of the customer segmentation based on clustering can be divided into four parts: First, the level of customer consumption, including the form of payment, monthly average consumption and monthly average business expansion, this part of the indicators reflect the customer consumption level; Second, the customer’s sensitivity to power failure, including the type of electricity, complaints after power failure The number of times and contract capacity, this part of the indicator reflects the urgency of the customer’s demand for electricity; Third is the risk of customer arrears, including the average number of days of arrears, the number of arrears and the monthly arrears, which reflects The customer has a credibility in terms of payment; Fourth is the customer equipment risk situation, including the number of unqualified safety inspections, the number of stolen electricity and illegal electricity use, and the matching degree between actual electricity consumption and contract electricity consumption. It is the consciousness of customers to use electricity safely. Through the analysis of the indicators of these four parts, it can be to analyze the behavior of the power company’s ICT customers more comprehensively. Table 1 Customer cluster evaluation factors (2) Data collection Based on the customer cluster evaluation factors compiled, the corresponding data of 2 000 customers were collected. Table 2 shows the consumption level information of some customers. The customer’s power failure sensitivity information mainly includes the type of power consumption, the number of complaints after power failure and the contract capacity. Some details are shown in Table 3. The data collected by the customer’s arrears risk information section is shown in Table 4. The indicators listed in the table can more accurately reflect the timeliness of customer payment. Table 2 Customer consumption level information Table 3 Customer power loss sensitivity information Table 4 Customer arrears risk information Table 5 shows the data of various customer stealing risk indicators. The formula for calculating the matching degree between actual electricity consumption and contract electricity consumption is shown in formula (18): (18) Whereλindicates the matching degree, andsindicates the actual electricity consumption, which indicates the contract electricity consumption. Table 5 Customer equipment risk information B.Data preprocessing Data preprocessing refers to a series of processing of data collected in the previous period, improving some missing or wrong information by deleting, estimating, etc., the data formulas are unified by some methods, and making all kinds of complex data difficult to be processed by machine algorithm into data easy to be operated by machine algorithm. It mainly includes data cleaning and data conversion. (1) Data cleaning When we initially get the data, we find that the integrity and form of the data are not what we ultimately need. There is some vacant information, which beyond the scope of the data and completely inconsistent with the selected content. These data will have a great impact on our clustering results. In order to get a better clustering result, we need to deal with these data accordingly, so that these irregular data can be standardized. 1) Processing of vacancy value In the data collected from 20 000 customers, we found that there are quite a few incomplete information, such as some customers’ monthly average consumption is not recorded, some customers’ electricity consumption types are not recorded, etc. in order to ensure the accuracy of clustering results, we adopted the method of mean value interpolation to fill in the vacant data. The specific method is to use the mode of the attribute to impute the missing values and fill in the incomplete data. For the information in which the missing amount of attribute data is more than two, we choose to delete the whole information. 2) Noise data processing Among the much data we collect, there will be abnormal data that is inconsistent with the fact and beyond the normal range. We use box chart analysis to automatically identify the abnormal data collected. The specific method is: firstly, we define the upper quartileUand the lower quartileL. The value ofUindicates that only one quarter of all samples of the attribute is greater than it, and the value ofLindicates the attribute. Only one quarter of the data in all the samples is smaller than it; secondly, we set the interpolation of the upper and lower quartiles asIQR, i.e.IQR=U-L; then, the upper boundUk=U+1.5IQR, the lower boundLk=L-1.5IQR; finally, all the data in this attribute that exceeds the upper and lower bounds are judged as abnormal data, and finally the entire information of these abnormal data is deleted. 3) Inconsistent data processing There are also a few data inconsistent with the data format in much data. For example, the type of electricity consumption is filled in the payment form, and the contract capacity is filled in the number of complaints after power failure. Such inconsistent data is relatively small. For this situation, we use the method of median interpolation to replace the data. This is done by replacing the exception data with the median of all values in the class attribute. (2) Data conversion Although the data after cleaning has ensured the integrity and accuracy of the information, if we want to use our machine algorithm to deal with this information, we need to transform the data of different forms into a unified data shape and standardize the processing. In the collected data information, we find that different indicators have different data formats, and the range of data values is different. We find the derived fields through different mathematical statistical methods to achieve the unity of data form; then divide the difference between each variable and the mean value of the variable by the standard deviation of the variable to get the standardized data; finally, we study the influence degree of various characteristic variables related to the demand in the practical application, according to the influence factorsui(i= 1, 2, 3,…,n) sets a different proportion for its input, wherenrepresents the number of attributes. The specific method is as follows: one attribute corresponds to one influence factorui, and each attribute corresponds to its specific gravityPi. The calculation formula is shown in (19): (19) Through this series of data processing and transformation, we can get the data we need for the final clustering processing. In this paper, the proposed method is used to test the information and communication customer data provided by power companies, and the clustering results of the proposed method are comprehensively evaluated and analyzed by comparing withK-Means and hierarchical clustering test results. A.Clustering results In order to get a better clustering effect, after selecting some characteristic variables that affect the clustering results, this paper collects the information of 20 000 customers, and carries out data cleaning and data conversion for these data, and then uses the method of this paper fuzzy clustering algorithm for data analysis. The simulation results are shown in Fig.1. From the clustering simulation results in Fig.2, we can see that the clustering results are divided into three categories. Fig.2 Clustering results of fuzzy spectrum The clustering results are visualized, and the details of clustering results are obtained in Table 6, which shows the clustering results of the power company’s ICT customers based on the fuzzy clustering algorithm in this paper. Table 6 Details of clustering results From the table, we can see the average value of each characteristic variable of the three types of customers. Through the analysis of the clustering results, it can provide a strong scientific basis for the decision of the power company’s customer service business. B.Result analysis After summarizing the simulation results, Table 7 is obtained. Table 7 Sorting of clustering results As you can see from Table 7, customers are divided into three categories, and the following is an analysis of each category: Customer group 1: the proportion of this kind of customers is 28%, which is less than the other two types of customers, but not much different. The average monthly consumption of this kind of customers is 87 500 yuan, which is in a very high consumption level, and the demand for contract capacity 420 KVA is very high, and most of the types of electricity consumption are large industrial power consumption; The amount of arrears is 39 800 yuan, indicating that this kind of customer electricity bill arrears is more serious; the degree of electricity matching is 0.94, indicating that this kind of customer’s overall awareness of safe electricity consumption is relatively high. In conclusion, the overall value of such customers is at a high level. Customer group 2: the proportion of this kind of customers is 38%, and the proportion of the kind of customers is relatively high. The average monthly consumption of this kind of customers is 9 800 yuan, which is in the lower consumption level, the contract capacity is 110 KVA, the demand of electricity capacity is general, and the industrial and commercial electricity is the main source of it. The amount of arrears is 5 600 yuan, indicating that this kind of customer electricity bill arrears is relatively slight, which shows that their integrity is very high; the power matching degree is 0.77, indicating that this kind of customer’s safety electricity consciousness overall is relatively poor. In conclusion, the overall value of such customers is at a medium level. Customer group 3: this kind of customer accounts for 32%, is the largest number of customers. The average monthly consumption of this kind of customers is 35 600 yuan, which is in the middle consumption level, the contract capacity 270 KVA is relatively high, and the main type of electricity consumption is agricultural production. The amount of arrears is 17 400 yuan, indicating that the arrears of this kind of customers is a bit serious, and their integrity is not very high; the degree of electricity matching is 0.64, indicating that the safety awareness of this kind of customers overall is relatively poor. So overall, the overall value of such customers is at a low level. According to the analysis of clustering results, different marketing strategies can be formulated for these three different customer groups to achieve the purpose of precision marketing, providing better services for customers, improving customer satisfaction, and improving the overall efficiency of the company’s customer service business, and reducing service costs. C.Clustering quality evaluation In order to verify the effect of this method on power company ICT customer data clustering, this section compares the proposed algorithm,K-Means algorithm and hierarchical clustering algorithm to evaluate the test results of power company ICT customer data. In order to avoid contingency, the above algorithms are repeatedly tested, and the average value of several results is taken as the final evaluation index results. Among them, through the FM index, AR index, CI index, DS index and running time of these five indicators to compare and evaluate the clustering quality of the algorithm used in this paper, the specific test results are shown in Table 8. Table 8 Cluster quality test results As can be seen from Table 8, when the number of data samples is 20 000, the algorithm proposed in this paper is superior to hierarchical clustering algorithm andK-Means algorithm in FM index, AR index, CI index and DS index. In addition, it can be seen that the values of FM index and AR index of this algorithm are very close to 1. From these two indicators, it shows that the cluster divided by this algorithm is very close to the standard cluster, and the clustering effect is very good. In the CI index, this algorithm is also significantly better than the other two algorithms, indicating that the clustering cluster of this algorithm has a good compactness. In terms of DS index, although the value of this algorithm is close to that of hierarchical clustering algorithm, it is still a little smaller, which shows that the cluster separation degree of this algorithm is greater than that of the other two algorithms. The clustering effect is better. However, from the running time of the algorithm, the shortest running time is theK-Means algorithm, followed by the algorithm proposed in this paper, and the longest is the hierarchical clustering algorithm. However, the difference in running time between this algorithm and K-Means algorithm is not particularly large, indicating that the complexity of this algorithm is slightly different. Overall, the clustering results of this algorithm for power company ICT customer data are better than K-Means algorithm and hierarchical clustering algorithm, but the running time is a little longer. In this paper, a method of combining spectral clustering with fuzzy clustering is proposed to cluster power customers data. In this method, the fuzzy clustering optimization algorithm is used to replace theK-Means clustering process of the traditional NJW algorithm, in order to improve the global optimization ability of the traditional NJW algorithm. The experimental results show that, based on the clustering quality evaluation indexes FM, AR, CI, DS and running time index, it is proved that the proposed algorithm has better classification effect, clustering quality and global optimization performance than the existingK-Means and hierarchical clustering. And has good robustness at the same time. This method is suitable for the application in the field of customer classification. Due to the limitation of the scale of the data set in the experimental environment of this paper, the next research plan will carry on the parallel computing analysis to the larger data set, and apply the above method to other fields of power company data analysis.2.2 Fuzzy clustering optimization algorithm
2.3 Algorithm flow
3 Cluster validity evaluation
3.1 Internal evaluation indicators
3.2 External evaluation indicators
3.3 Global evaluation indicators
4 Case study
4.1 Data processing
4.2 Result analysis and comparative evaluation
5 Conclusion