Jay Sompura,Amit Joshi,Babji Srinivasan*,Rajagopalan Srinivasan*
1 Department of Electrical Engineering,Indian Institute of Technology Gandhinagar,India
2 Department of Chemical Engineering,Indian Institute of Technology Gandhinagar,India
3 Department of Chemical Engineering,Indian Institute of Technology Madras,India
Keywords:Chattering alarm Correspondence analysis Density based clustering Duplicate alarms Combined-cycle power plant Association rule mining
ABSTRACT Process safety in chemical industries is considered to be one of the important goals towards sustainable development.This is due to the fact that,major accidents still occur and continue to exert significant reputational and financial impacts on process industries.Alarm systems constitute an indispensable component of automation as they draw the attention of process operators to any abnormal condition in the plant.Therefore,if deployed properly,alarm systems can play a critical role in helping plant operators ensure process safety and profitability.However,in practice,many process plants suffer from poor alarm system configuration which leads to nuisance alarms and alarm floods that compromise safety.A vast amount of research has primarily focused on developing sophisticated alarm management algorithms to address specific issues.In this article,we provide a simple,practical,systematic approach that can be applied by plant engineers(i.e.,non-experts)to improve industrial alarm system performance.The proposed approach is demonstrated using an industrial power plant case study.
Sustainable manufacturing as defined by the US Ministry of Commerce is “the creation of manufactured products that use processes that minimize negative environmental impacts,conserve energy and natural resources,are safe for employees,communities,and consumers and are economically sound”[1].Process industries typically manufacture value added products by processing raw materials using a series of chemically processing steps.For developing sustainable processes,it is not only important to maximize the economic benefits but also develop plant management strategies that balance the operating profit with sustainable Environmental, Health and Safety (EH&S) performance.Process safety is of prime importance in the chemical industry to ensure sustainable development and requires an effective system that enables the identification,elimination,reduction and mitigation of risks resulting from operations.To ensure safe operations,industries have developed abnormal situation management systems to mitigate various hazards[2,3].These include sophisticated automatic control techniques and strategies, advanced alarm management systems,support and guidance systems, better equipment maintenance and ergonomic design of human machine interfaces.Of the various technological advancements related to instrumentation and automation in the process industry, the primary focus of this article is on alarm management.Alarm systems help maintain efficiency of plant operation by alerting deviations of the process from normal operation which can have negative impacts on performance [4]. They also serve as the most important tool that plant operators use to improve plant performance and to monitor plant safety[5,6].Alarm management has been identified as a crucial component of industrial automation in abnormal situation management[7].
With the widespread use of DCS(Distributed Control System)that offers ease of configuring alarms,the number of alarms configured per operator in process plants has grown significantly [8]. For instance,Nimmo[9]reported that the number of alarms increased from 150 to around 14000 when a DCS based system was deployed. Due to this ease of adding alarms, poor practices arose in industry, leading to large number of alarms even when the process is operating normally at a steady state.Moreover,modern plants consist of highly interconnected units wherein a disturbance affecting a unit can propagate to other units [10].This leads to a large number of alarms being raised within a short time,which is referred to as an alarm flood.Alarm floods can overwhelm the operator,affecting their ability to investigate the alarms properly and pose a serious threat to industrial process monitoring[11].Rather than serving the operator,improperly configured alarm systems become a nuisance to the operator and distract them from performing vital actions,especially during plant abnormalities.Poorly performing alarm systems have been cited as one of the contributing factors to major accidents and losses.For instance,in the Texaco Milford Haven refinery accident (1994), the operator had to recognize, acknowledge and act on 275 alarms in the last 11 min before an explosion[12].Similarly,post-accident analysis of the Esso Longford gas explosion(1998),BP Texas City refinery explosion(2005)and Buncefield oil storage facility fire(2005)indicates poor alarm management system as a key contributor for catastrophic failures[13-15].
Given the importance and its role in maintaining process safety,a key component of sustainable development,several guidelines have been proposed to properly design and rationalize alarm systems[16,17].Typically,these guidelines suggest an upper limit on the number of alarms an operator should receive per hour.The Engineering Equipment and Materials Users Association[18]guidelines suggest that an average operator can handle about 6 alarms per hour.The performance of the alarm system can be evaluated using several KPIs(Key Performance Indicators)such as number of alarms per hour,number of alarms per minute,number of alarm floods,number of alarms that are likely to be missed by the operator per day,and number of frequent alarms per day[19].Subsequently,if the existing alarm system is found to have issues,cause(s)for poor performance have to be identified and corrective actions taken.Studies indicate the following causes for alarm floods:
1. Alarms occurring due to minor variations(disturbances,noise),
2. Incorrect alarm generation,
3. Incorrect configuration,and
4. Multiple alarm occurrences due to abnormality propagation.
Significant research efforts have been made to identify the causes of poor performance of alarm systems and develop sophisticated remedies.In this work,we focus on practical deployment of alarm management strategies by plant personnel.Specifically,we demonstrate that a step-by-step procedure using simple techniques and easy-to-use readily available tools can lead to significant improvement of alarm system performance. We use data from a combined-cycle gas turbine power plant for this purpose. A brief overview of the process along with the details of the data collected from the alarm system is discussed in Section 2.Section 3 provides details of the various methods for identification of the performance of alarm systems while Section 4 demonstrates the various approaches for diagnosing the various causes for poor performance of alarm systems with its application to the turbine plant.A discussion on the proposed solutions for improving the alarm system along with its utility in the turbine plant is provided in Section 5.
Combined Cycle Power Plant(CCPPs)are popular in power generation industries as they have higher efficiencies compared to Gas Turbine and Steam Turbine working alone.In a typical CCPP,gas turbine and steam turbine are arranged sequentially to provide high efficiency,i.e.,the percentage of the total energy content of a power plant's fuel that is converted into electricity.Typically,the exhaust gas from gas turbine is utilized by a Heat Recovery Steam Generator(HRSG)to convert the water into steam for the operation of steam turbine.HRSG can be operated at multiple pressure levels such as high-pressure HP,intermediatepressure IP and low-pressure LP.
The considered CCPP has a capacity of 112 MW and is a multi-shaft system i.e.,the gas and steam turbines are mounted on different shafts connected to separate electricity generators.As shown in Fig.1,a stream of air enters the compressor of the gas turbine after passing through a filter,which removes solid and liquid contaminants.The compressed air is mixed with fuel and ignited.The ignited fuel-air mixture expands through the gas-turbine,thus rotating its blades.A generator is attached to this turbine to produce electricity.The exhaust gas from the gas turbine is at a temperature of 500-600°C.This hot exhaust gas is utilized to convert water to steam in the HRSG.The triple-pressured HRSG is a series of 13 heat exchangers such as superheater,evaporator,and economizer maintained at pressure levels of 103, 29, and 5 kg·cm-2,respectively.The generated steam is fed to the steam turbine at temperatures of 535 °C, 260 °C, and 200 °C and flow-rates of 113, 8.7 and 7.5 t·h-1.A generator is connected to the steam-turbine to produce electricity.
At full load,the gas turbine generates 72 MW and the steam turbine 40 MW.Although the plant usually operates near full capacity during the day, it is shut down every night when electricity demand is low and started-up the following morning as demand ramps up.Various variables such as temperatures and pressures and flow-rates at different locations inside the gas turbine,HRSG and steam turbine are monitored and controlled using a Distributed Control System(DCS).The CCPP is divided into two major monitoring sections of gas-section and steam section.The steam section of the Dhuvaran plant,comprising the HRSG,steam turbine,and steam generator,has 666 process variables configured with alarms.Further,over 11000 alarms are flagged on a typical day during normal operation.In this paper,we demonstrate the applicability of proposed alarm strategies to improve the existing alarm system in this power plant.
Fig.1.Schematic of Dhuvaran power plant.
As shown in Fig.2,we follow a systematic three-step procedure to address the alarm management problem:
1. Benchmark the performance of the existing alarm system—to establish a baseline.
2. Diagnose the various causes of poor performance—to identify the bad actors.
3. Develop specific solution strategies—to remedy the problems.
The foremost step towards improvement of alarm system lies in benchmarking the existing system against recommended industrial standards.This requires acquiring event log data from the plant.This event log contains information about the status of the alarms, tag names corresponding to alarms,priority of alarms,nature of the alarms and actions taken by the operator along with the time stamp.In general,depending on the plant automation software, the event log may be stored in various formats.However,for ease of data analysis,it may be imported to a common spreadsheet such as Microsoft? Excel. Care must be taken to ensure that representative data is obtained from all the operating regimes of the plant.Subsequently,various KPIs can be computed using this event log data and compared against recommended values to benchmark the baseline performance of the alarm system.
We collected event log data for a total of 10 days from the Dhuvaran plant. To ensure representativeness, data was obtained for six consecutive days from September 26 to October 1,2016 and four days spread over the subsequent fortnight—7,8,15 and 17 October 2016.For ease of reference we label these as Days 1 to 10.It was confirmed that the obtained data included the various operating regimes —startup, dynamic operation, steady state operation, and shut-down.
With this event log data,we first identified the prevailing issues in the alarm system.A statistical analysis of the event log data revealed that the number of alarms on any day ranged from a minimum of 4893 to a maximum of 16518 (Fig.3a).As can be seen from Table 1,with an average of 11520 alarms per day,i.e.,approximately 8 alarms every minute,the alarm count in the Dhuvaran plant is significantly higher than that suggested by the EEMUA(Engineering Equipment Materials Users'Association)guidelines.Out of the 666 total configured alarm variables,a maximum of 209 and a minimum of 89 unique alarms were registered on any day with an average of 178(Fig.3b).It is clear that the KPIs are exceeded every day indicating the prevalence of various issues in the alarm system.
Fig.4 shows the hourly distribution of alarms on a typical day.It is evident that a large number of alarms are focused during specific periods- these periods corresponded to start-up, dynamic operation(time period before reaching full operating capacity) and shutdown of the plant.Various KPIs can be computed to quantify the distribution of alarms over a period of time and their impact on safety.Some significant KPIs are the number of alarm floods, number of missed alarms,and percentage of time the alarm system is in flooded condition.Alarm flood is the period during which the alarm rate exceeds ten or more alarms in a 10-minute window;the flood is considered to be terminated when the rate drops below five alarms in 10 min.The percentage of time that the system is in flood situation can then be computed.During flooding, the number of alarms that exceeds the 10 alarms in 10-minute threshold (manageable as per ISA (International Society of Automation) 18.2 and EEMUA guidelines)is the number of alarms that is likely to be missed by the operator.We computed these alarm flood related KPIs for the Dhuvaran plant.From Table 2,it can be observed that on average,the system is in flooded condition for 55% of the time with a minimum of 32%and maximum of 88%.Out of the 5753 alarms that occurred in total on Day 4, 5245 (i.e.,91%) would have been missed by the operator during an intense alarm flood.On average,during 70%of its operation, the plant had more than 30 alarms per hour. This indicates that most of the alarms occur during the flooded condition and the operator is likely to miss important alarms.Table 3 shows the number of alarms under each priority for ten days.It can be seen that a total high priority alarms (priority 1 and priority 2) is also very high throughout.Therefore,it is important to diagnose the cause of poor performance of the alarm system and remedy it.
Fig.2.Flow diagram of proposed alarm management scheme.
Fig.3.Statistical analysis of event log:(a)Number of alarms per day,and(b)Number of unique alarm tags per day.
Table 1 Comparison of Dhuvaran plant's overall alarm performance with EEUMA guidelines
Nuisance alarms usually occur due to three major causes,(1) chattering, (2) incorrect configuration of alarm variables, and(3)alarm floods due to causal relationship among process variables.In the following,we discuss a step-by-step approach to identify each of these issues.
Fig.4.Hourly distribution of alarms on Day 7.
Table 2 Alarm flood KPIs in the Dhuvaran plant
Table 3 Priority-based distribution of alarms
A chattering alarm is one which repeatedly transitions between the alarm state and the normal state in a short period of time[3].Chattering usually occurs when an alarmed process variable operates near an alarm threshold value and oscillates around it due to external disturbances/noise,repeated on-off action of control loop[20],or poor control loop tuning.It is the most frequent type of nuisance alarm encountered.For the Dhuvaran plant,we analyzed the number of occurrences of each alarm on a daily basis. Fig. 5 shows the 50 most frequently occurring alarms on Day 1.It is evident that a small number of tags contribute to a large number of alarms. For instance, the five most frequent alarms (shown in Fig. 6) contribute over 87% of the total alarms on Day 1.This indicates the possibility of chattering.
Fig.6.Relative contributions of five most frequent alarms on Day 1.
The first step in dealing with chattering alarms is to detect them.Various methods have been proposed to detect the presence of chattering alarms, for instance the balance between the actions taken by the operator and occurrences of alarms [21]; and alarm run length [20,22,23]. The run length based approach to quantify the extent of chatter in an alarm tag is summarized in Fig. 7. Here,the value of the chatter index lies between 0 and 1,with a value of 1 indicating severe chattering. A CI (Chatter Index) value greater than 0.05 has been recommended as a criterion for identifying chattering alarms[22].
For the Dhuvaran plant, we calculated the chatter index of each alarm tag for the ten days data. With a recommended threshold of 0.05,we identified 54 chattering alarms with a total count of 11252.Since the number of alarms to be modified is large,we increased the threshold value to 0.2 which resulted in only 27 chattering alarms with a total count of 10729. Table 4 provides the contribution of chattering alarms along with the total number of alarms on a daily basis.From Table 4,it is clear that chattering is one of the major causes for alarm floods in this plant.
Fig.5.Histogram of 50 most frequent alarms on Day 1.
Fig.7.Identification of chattering alarms based on run length.
Duplicate alarms are the second biggest contributors to nuisance alarms.Due to its relative ease,multiple alarms that all flag the same underlying condition may be configured.For example,when multiple sensors are used to measure the same underlying critical quantity,instead of configuring alarms on the average(or another statistical derivative) of these values, alarms may be configured on all the sensors,leading to multiple alarms for the same issue.Further,in a large scale plant composed of multiple interconnected units, it is highly likely that an abnormal situation in one unit may propagate to other units,thus triggering numerous alarms for the same underlying root cause[24].
Researchers have used clustering of frequent occurring subsequences to identify duplicate alarms[25,26].Schleburg et al.[27]used the model of plant topology to discover related alarms.Noda et al.and Hu et al.[28,29] proposed schemes that detect statistical similaritiesamong discrete occurrences of alarms.Geng et al.,Zhu et al.,and Yang et al. [30-32] grouped alarm variables into different clusters. CA(Correspondence Analysis) is a multivariate statistical technique to identify relationship between categorical variables,similar to principal component analysis for continuous data[33].As shown in Fig.8,the steps for performing CA to project alarm data into a lower dimension are:
Table 4 Daily statistics of chattering alarms
1. Convert the alarm sequence of Tag 1 into binary sequence vector where 0 indicates flag clear state (Clr) and 1 indicates flag alarm state(Alm).
2. Repeat the same procedure for all alarm tags and obtain the binary sequence matrix.
3. Append the binary sequence matrix and it's negation to obtain the Indicator matrix.
4. Convert the indicator matrix into Burt Table,such that each entry indicates the number of times the corresponding row alarm tag and column alarm tag occurred together.
5. Using the concept of Profile,Mass and Inertia[34]convert the frequency information contained in Burt Table into equivalent Euclidean distance metric.
6. Project the obtained Euclidean metric into lower dimensional space by minimizing the loss of inertia.
7. After projecting into lower dimensional space,use DBSCAN(Density-Based Spatial Clustering of Applications with Noise) to identify groups of alarms.
We applied CA on the alarm occurrence and clearance data from the Dhuvaran plant.A total of 15 projections explain 95%of the total variance of all the 666 alarms.After projection on this lower dimensional space,DBSCAN a well-known clustering technique,was used to identify 30 clusters of alarms[35].Out of the 30 groups identified,4 groups indicated the presence of redundant sensors. For instance, as shown in Table 5, group number 1 contained tags 9 and 10 which had exact number of alarm occurrences.Further,their tag name(not shown)also confirmed the redundancy in the sensors.Once clusters containing redundant sensors are identified,clusters containing multiple alarms configured for the same underlying process variable were identified manually.The three most frequently occurring clusters are shown in Table 6.It can be seen that these contribute significantly to the overall number of alarms/day.
Fig.8.Clustering of alarms using CA and DBSCAN.
Table 5 Identified redundant alarms after CA and DBSCAN
Subsequent to identification of redundant alarms configured on a process variable,we used association rules mining on the clusters to discover alarms that are configured within a unit for the same underlying abnormality also called as unit level alarms.Association rule mining[36]is a well-known technique to discover relationship among categorical variables[37-39].Out of the 30 groups,results from association rule mining indicated that two groups had the same underlying cause as discussed below.
Once the key problems in the alarm system have been diagnosed,specific solution strategies can be deployed to address each one. Inthis work,we discuss basic approaches that can be easily deployed to overcome chattering and redundant alarms.
Table 6 Management of redundant alarms configured for a process variable
Fig.9.Alarm count before and after chattering alarm rationalization.
Numerous methods have been proposed to rationalize chattering alarms[20,22,31,40].Primarily,most of these approaches delay the occurrence/clearance of alarms by using a delay timer. For instance,Srinivasan et al.,and Cecilio et al.[40,41]proposed a method to dynamically modified alarm thresholds based on statistical process control techniques. Once chattering alarms are detected, the process alarm high (low) limit was automatically changed to ˉX-2σ ( ˉX+2σ) to hold the process variable in alarm status(standing state).This approach can be used for continuous process variables.Wang and Chen[20]suggested the use of the m-sample delay timer to reduce the number of chattering alarms for digital variables.The m-sample delay timer clears an alarm if and only if m consecutive sample of the alarm signal(xa(t))is 0.Fig.9 shows an example of how m-sample delay timer(m=20)eliminates the chattering in an alarm tag.After such rationalization using alteration of alarm limit dynamically and use of on-delay timer for theentire plant,there is an average reduction of 98.85%of chattering alarms(Table 7).This has to be viewed with the fact that an average of 93.44%alarms per day occur due to chattering alarms.
Table 7 Management of chattering alarms
Table 8 Management of unit level alarms
Once duplicate alarms are identified,it is usual to follow a majority voting logic to reduce their count.The Dhuvaran plant has 21 thermocouples to measure exhaust gas temperature of the gas turbine.It has 2 redundant sensors to measure differential pressure at the drum.Several other redundant sensors are installed at different locations in the plant.
As mentioned earlier,a total of 30 duplicate alarm groups were identified.As shown in Table 5,we focused on the 3 groups which contribute significantly to the total number of redundant alarms configured for the same process variable.By applying the majority voting logic,as shown in Table 6,we could significantly reduce the number of alarms(99.7%,98.2%,and 66.6%)for each group.Table 8 shows the details of alarm groups due to the relationship among process variables and the reduction in alarm counts after rationalization at the unit level.
A practical step-by-step approach to measure the performance of an existing alarm system,diagnose any issues,and implement simple solution strategies for improvement is presented in this work.All the steps are demonstrated using alarm data from a combined-cycle power plant at Dhuvaran, Gujarat, India. We computed various KPIs using this data and identified the existence of several issues in the alarm system.This is performed by comparing the KPIs of the existing alarm system against ISA 18.2 and EEMUA guidelines. Subsequently we used various easy to use techniques to identify two significant issues in the alarm system — chattering alarms and duplicate alarms. A simple delay-timer based strategy was used to handle chattering alarms while majority voting logic and grouped alarms to address duplicate alarms.Analysis indicate that implementation of even these simple approaches would result in a reduction of over 93% alarms, a very significant outcome.Further,the entire procedure can be iterated to achieve further reduction until the alarm system performance is deemed acceptable.Apart from the approaches discussed in this work,many other sophisticated strategies approaches have been proposed in literature which could also be explored.
Acknowledgements
The authors gratefully acknowledge the unflinching support provided by the management and engineers of the Dhuvaran plant during the course of this project.
Chinese Journal of Chemical Engineering2019年5期