亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

Reliable Medical Recommendation Based on Privacy-Preserving Collaborative Filtering

2018-08-15 10:38:32MengweiHouRongWeiTiangangWangYuChengandBuyueQian

Computers Materials&Continua 2018年7期

Mengwei Hou, Rong Wei, , Tiangang Wang, Yu Cheng and Buyue Qian

Abstract: Collaborative filtering (CF) methods are widely adopted by existing medical recommendation systems, which can help clinicians perform their work by seeking and recommending appropriate medical advice. However, privacy issue arises in this process as sensitive patient private data are collected by the recommendation server. Recently proposed privacy-preserving collaborative filtering methods, using computation-intensive cryptography techniques or data perturbation techniques are not appropriate in medical online service. The aim of this study is to address the privacy issues in the context of neighborhoodbased CF methods by proposing a Privacy Preserving Medical Recommendation (PPMR)algorithm, which can protect patients’ treatment information and demographic information during online recommendation process without compromising recommendation accuracy and efficiency. The proposed algorithm includes two privacy preserving operations: Private Neighbor Selection and Neighborhood-based Differential Privacy Recommendation.Private Neighbor Selection is conducted on the basis of the notion of k-anonymity method,meaning that neighbors are privately selected for the target user according to his/her similarities with others. Neighborhood-based Differential Privacy Recommendation and a differential privacy mechanism are introduced in this operation to enhance the performance of recommendation. Our algorithm is evaluated using the real-world hospital EMRs dataset.Experimental results demonstrate that the proposed method achieves stable recommendation accuracy while providing comprehensive privacy for individual patients.

Keywords: Medical recommendation, privacy preserving, neighborhood-based collaborative filtering, differential privacy.

1 Introduction

As Electronic Medical Records (EMRs) and wearable sensors become more widespread,medical datasets tend to be larger and specific methods of exploration are needed to extracting meaningful information. However, even experienced clinicians sometimes find it difficult to deal with the large amount of medical knowledge available to help them complete a particular goal. Thus, clinical organizations must exploit effective methods of discovering and recommending valuable knowledge to assist clinicians’ work.

Recommender systems are becoming more and more important due to the increasing“information overload” problem [Davidson, Liebald, Liu et al. (2010); Das, Datar, Garg et al. (2007)]. Especially, it is difficult for users to determine the suitable information to optimize their process of decision making. Take into account this context, recommender systems provide useful selected information to the target user, which could optimize a large amount of decisions effectively. Some of the recently studies have started to make use of recommender systems for automatically recommending useful knowledge in clinical practices [Sun, Liu, Guo et al. (2016); Zhang, Chen, Tang et al. (2017)].Collaborative filtering (CF) is one of the most popular recommendation techniques as it is insensitive to product details, and is adopted by many online service providers. In this paper, we aim to grapple with privacy preserving issue in the context of neighborhoodbased CF methods in medical recommendation. In clinical environment, CF can be applied to provide clinicians with more correlative information, so as to improve the quality of medical service. During this process, patients’ sensitive information is collected by the recommendation system, which arises privacy concerns. Enck et al.[Enck, Gilbert, Chun et al. (2012); Wondracek, Holz, Kirda et al. (2010); Li, Lv, Xia et al.(2011)] have shown that users’ privacy could be exploited by service providers or malicious users to gain profits.

In this paper, we proposed a private preserving medical recommendation (PPMR) based on neighborhood-based collaborative filtering. The contribution of this work is summarized as follows:

· We provide Private Neighbor Selection to prevent the adversary from malicious hacking the patients’ treatment information and demographic information.Specifically, a new de-identification k-anonymity method, Optimal Lattice Anonymization (OLA) is adopted to produce a globally optimal de-identification solution suitable for EMRs datasets. After the de-identification, the most similar neighbors are selected privately based on the de-identification datasets. Therefore,an adversary is unlikely to use the combination of quasi-identifier to identify an individual patient.

· We propose a Neighborhood-based Differential Privacy Recommendation Algorithm,with the aim of proving comprehensive privacy for the individual patient, as well as maximizing the accuracy of recommendations. Our algorithm consists of several steps, measuring (with noise) progressively more challenging aspects of the data before feeding the measurements to appropriately parameterized variants of the currently algorithms. We first describe the approach at a high level, before describing the sequence of precise calculations more concretely.

· At last, we conduct the security analysis and performance evaluation for the proposed scheme. The experiments carried out on the real-world EMRs dataset verify that the proposed medical recommendation scheme is effective and scalable.

2 Related work

In this section, we review the previous work in the literature related with our work. We will also explain the differences between our methods and the previous ones.

2.1 Medical recommendation system

In terms of applications, a lot of recent work has been done in mining the various kinds of EMRs data for actionable insights to improve the quality of healthcare delivery. For example, Zhou et al. [Zhou, Wang, Hu et al. (2014)] proposed a method to infer phenotypic pattern from EMRs. Lakkaraju et al. [Lakkaraju and Rudin (2016)] proposed to use a Markov Decision Process (MDP) to provide cost-effective recommendations based on a healthcare institution’s financial restrictions. Hirano et al. [Hirano and Tsumoto (2014).] used occurrence and transition frequency to discover typical order sequences. Liu et al. [Liu, Wang, Hu et al. (2015)] developed a method to identify most significant and interpretable graphical feature from longitudinal EMRs. However, their work is mainly based on discovering effective recommendation algorithm in medical datasets without considering the privacy issue in the medical recommendation process.

2.2 Privacy preserving recommendation systems

A number of research has been working on privacy violations in the modern big data systems, including cryptographic, perturbation and obfuscation. Zhan et al. [Zhan, Hsieh,Wang et al. (2010)] solved a similar problem by applying homomorphic encryption and scalar product approaches. Han et al. [Han, Qian, Yang et al. (2016)] proposed a novel physical-layer identification system, utilizes unique features of wireless devices to provide authenticity and security guarantee. The cryptographic method preserves high performance but facing with serious scalability issues. Perturbation will change a user’s rating by adding noise before submitting to the recommender system. Polatidis et al.[Polatidis, Georgiadis, Pimenidis et al. (2017)] proposed a multi-level privacy preserving method for collaborative filtering systems by perturbing each rating before it is submitted to the server. Obfuscation replaces a certain percentage of a user’s rating by random values. Berkovsky et al. [Berkovsky, Eytani, Kuflik et al. (2007)] decentralized rating profile among multiple repositories and replaced some ratings with their mean.

In order to address these problems, differential privacy, a more rigid notion, has been proposed [Dwork (2006)]. Differential privacy provides a strong and provable privacy definition that can quantify the privacy risk to individuals. As a prominent privacy definition, Mcsherry et al. [Mcsherry and Mironov (2009)] were the first to introduce the differential privacy into recommender system using Laplace noise. Hardt et al. [Hardt and Roth (2011)] converted the recommendation problem into the Matrix Completion problem. Zhu et al. [Zhu, Ren, Zhou et al. (2014)] proposed a truncated similarity function in private neighbor selection so as to achieve differential privacy for neighborbased collaborative filtering. However, the above methods only focused on the online commercial recommendation system.

In this paper, we provide a private preserving medical recommendation (PPMR) algorithm based on privacy-preserving collaborative filtering. The algorithm we proposed ensures that both the treatment information and demographic information are considered and protected.

3 Proposed method

In this section, we propose a private preserving medical recommendation (PPMR)algorithm to address the privacy preserving issue in medical recommendation process.

Firstly, we present an overview of the algorithm, followed by a detailed discussion. Then we provide a theoretical analysis on how PPMR achieve the differential privacy preserving purposes while retaining the utility for recommendation purposes.

3.1 The private preserving medical recommendation algorithm

For the privacy preserving issue in the context of neighborhood-based CF methods, the preserving targets differ between the user-based methods and item-based methods due to the different perspectives regarding definition of similarity. Traditional non-private userbased CF methods works as follows: The first stage aims to collects users’ historical behaviors and users’ basic information to identify the users of k nearest neighbors, and the second stage aims to predict the rating by aggregating the ratings on those items that identified neighbor users rated. We propose the PPMR algorithm to address this problem.Detail for the first operations is presented in Section 3.2, and the second operation and theoretical analysis on privacy preserving is provided in Section 3.3.

3.2 Private neighbor selection

Private neighbor selection aims to privately select k neighbors from a list of candidates for the privacy preserving purpose. Prior to any anonymous process, direct identifiers(name, ID number, etc.) need of course to be suppressed from the dataset. However,some of the attributes that remain in the anonymized dataset may be quasi-identifiers,which may facilitate indirect re-identification of respondents through external data source(available as attackers’ background knowledge) that combine those attributes with direct identifiers.

3.2.1 De-identifying patients’ health data

In EMRs datasets, patients’ attributes are recorded in patients’ treatment dataset and demographic information. In this paper, we consider patients’ gender, age, admission date, diagnosis name and treatment outcome as similarity measurement. Thus, we choose gender, age and admission date to be three quasi-identifiers, because these three preferences have been shown to lead to user re-identification.

Our method derives from a recently globally optimal k-anonymity method [Emam,Dankar, Issa et al. (2009)], which is called Optimal Lattice Anonymization (OLA). The advantage of OLA is that it results in less information loss and has faster performance in medical dataset compared to the current de-identification algorithms.

A common way to satisfy the k-anonymity criterion is to generalize values in the quasiidentifies by reducing their precision. Examples of hierarchies can be represented in Fig.1. The precision of variables is reduced as one move up the hierarchy.

Figure 1: Examples of value generalization hierarchies for three common quasi-identifies:(a) Admission date (b) Gender (c) Age

The generalization hierarchies for the three quasi-identifies in Fig. 1 can be represented as a lattice. Each node in the lattice represents a possible instance of the dataset. One of these nodes is the globally optimal solution and the objective of a k-anonymous algorithm is to find it efficiently.

An information loss metric that takes into account is suppressed rate, which is defined as(1):

All equivalence classes in the dataset that are smaller than k are suppressed. 85% of records were suppressed in the dataset represented by nodebecause these records were in small equivalence classes. As more generalization is applied, the extent of suppression goes down.

Suppression is preferable to generalization because the former affects single records whereas generalization affects all the records in the datasets. However, because of the negative impact of missing on the ability to perform meaningful data analysis, the limits on the amount of suppression need to be imposed. We present this limit as MaxSup. In this paper, we define the MaxSup as 5%, and all the nodes that satisfy “suppressed rate＜MaxSup” criterion are k-anonymous nodes.

Once we have identified all the k-anonymous nodes, we need to select the one with the least information loss among them. Suppressed rate is not considered a good information loss metric because it does not account for the generalization hierarchy depths of the quasi-identifiers. For example, the generalization of “Male” to “Person” gives the equal weight to the generalization of age in one year to age in five years. In the former case,there is no information left in the gender variable, whereas the age interval still conveys a considerable amount of information and there are three more possible generalizations left in the age hierarchy.

Hence the gender information plays an important role for clinician to diagnose and prescribe, in this paper we choosethe gender variable. In this case the kanonymous nodes are:We maintain the list of the four anonymous nodes and select the nodewith the lowest height within their generalization strategies as the globally optimal solution.

3.2.2 Similarity measure for patients

In this section, we give the definition of similarity measurement between patients. As is mentioned before, we consider patients’ gender, age, admission date, diagnosis name and treatment outcome as similarity measurement. A patient can be formalized as (2):

In Section 3.2.1, we process the generalization of three quasi-identifies (Gender, Age and Admission Date) and selectas the globally optimal solution. Thus the gender of a patient PG can be “male” or “female”, the agecan be “0-4”, “5-9”, “10-14”, etc. the admission datecan be “2017”, “2016”, “2015”, etc.

In order to be easily understood, we present a toy example of quintuple P by Tab. 1.

Table 1: A toy example of P

In order to define similarity between different patients, we have to develop a method which can compute similarity between two such quintuples.andare represented two random selected patients, the similarity betweenandis defined as following:

Firstly, the similarity betweenandis determined by the diagnosis names, if the diagnosis names of two patients are the same, then gender, age and admission Date are considered in a further step; otherwise, the similarity ofandis set 0. Therefore,the similarity ofandcontains a multiplying term, which equals 1 ifandare the same, and equals 0 otherwise.

Secondly, the gender should be taken into account. The similarity between twois described by term, which is 1 if two genders are the same, and equals 0.5 otherwise.

Thirdly, the age is considered. In this paper,is indicated as interval, such as <5-9>.

There are twenty intervals in all from <0-4> to <95-99>. We flag this intervals asfrom 0 to 20. The similarity between two PA is defined as (3),

Lastly, the admission date also has large impact on the similarity determination. In this paper, admission date is generalized into year and is from 2008 to 2017, so the similarity between twois defined by, which is 1 if, and equals 0.5 ifand equals 0.25 otherwise.

To sum up, similarity between Pi and Pj is finally defined as (3),

The denominator 3 is to ensure the value of similarity drops in [0, 1].

3.2.3 Nearest neighbor selection

The goal of this section is to extract the most similar neighbors for the target patient.Given an active patient pa and his candidate neighbor list P. Firstly, we take into account the neighbor’s outcome in P. To guarantee that the treatment is effective, only the candidate neighbor patients who have the “cured” and “improved” outcome have been chosen. Secondly, we compute the similarity betweenand other positive-outcome patient. In Section 3.2.2, we have given the definition of the similarity formula between two patients. So we get a corresponding similarity listwhich consists of similarities betweenand other m positive-outcome patient. Finally,we choose the most similar k neighbors to form the KNN listfrom the candidate list based on the similarity list.

3.3 Differential privacy recommendation

In Section 3.2, we have proposed an effective and secure-safe similarity measure between patients to find the most similar neighbors. In this section, we propose a neighborhoodbased differential privacy recommendation algorithm, with the aim of proving comprehensive privacy for the individual patient, as well as maximizing the accuracy of recommendations.

3.3.1 Neighborhood-based recommend inference attack

3.3.2 User-based differential privacy recommendation algorithm

Differential Privacy tends to maximize the accuracy of the output of the system while minimizing the chances of identifying the input to the system.

Differential privacy provides a bound on the ability to infer from any output S, whether the input to the computation wasBecause:

Definition 2. (Exponential Mechanism) Given a quality function q:, an input matrix M,is the sensitivity of the quality function. The exponential mechanismwith probability:

In definition 1, the parameterrefers to the privacy budget, which controls the level of privacy guarantee. A smallerrepresents a stronger privacy level.

Based on the above notions, we design a neighborhood-based Differential Privacy Recommendation algorithm. It consists of three steps:

· Given the target patient pa with his/her k nearest neighbors. We sample each patients’ medication administration record in patient treatment dataset T, and form a clinical medicine score functionto count the dosage of each drug mi used on the neighbor candidate list.

· Then we design a differential privacy algorithm by adding an exponential noise mechanism in. Defineto bewhich provides ?-differential privacy.

· We recommend top n score medicines to patientaccording to the privacypreserving medicine score function.

4 Experiment

The details of dataset and experimental setup used to evaluate the proposed privacy preserving medical recommendation algorithm is represented in the following subsections.

4.1 Real datasets

The dataset we experimented with are collected from Hospital Information Systems (HIS)of the First Affiliated Hospital of Xi’an Jiaotong University, which is the largest Grade Three Class A hospital in northwest China. The dataset contains 115,585,623 medical treatment records with 3,944 medicines and 946,429 patients from the year 2008 to 2017,where about 51% of the patients are male and rest are female, the age of patients is in between 0 year to 104 years.

The 946,429 patients are divided such that 662,500 (70%) patients are part of training set and 283,929 (30%) patients are part of the testing set. The predictions by the PPMR are compared with the actual treatment labelled by the medical expert to check the accuracy of the results.

4.2 Accuracy measures

To measure the quality of recommendation, in this paper, three performance metrics are used to evaluate the efficiency of our proposed algorithm: Recall, Precision and Mean Absolute Error (MAE) [Adomavicius and Tuzhilin (2005)]. These metrics are widely accepted for evaluating recommender systems. The Recall rate is defined as the ratio of the products used by the users in the recommending list to all the products that the user actually uses. The Accuracy rate is the ratio of products that users end up using in the recommended list to all recommended products. MAE is used for computing the deviation between the predicted and the real ratings. Note that lower values in MAE mean better recommending predictions. The three metrics is shown in (10), (11) and (12).

4.3 Performance of PPMR

In this section, we examine the performance of PPMR from the perspective of privacy preserving to the patients’ information. Specifically, we apply the traditional neighborhood-based CF as the non-private baseline. The top score parameter n is set to be 20 because most patients have taken about twenty kinds of medicine during their treatment process. Moreover, the privacy budgetused to control the level of privacy,so it need to be set to balance the privacy level and recommend accuracy. The algorithm proposed in this paper can be used to recommend medicine for various diseases. To illustrate and test our algorithm, we focus on the three kinds of patients with coronary heart disease and pneumonia, which are the most common diseases in China today.

4.3.1 Impact of parameter

Tab. 2 illustrates the effects of privacy budgeton coronary heart disease and pneumonia. From the Tab. 2, we can see that whenis in a larger value (for example=0.001), the probability of the best available medicine is amplified. On the other hand,whenis small (for example=0.00001), the differences in usability for every medicines are suppressed and the probability of the medicine output tend to be equal. In this paper,the privacy budget parameteris fixed to 0.0001 to ensure the PPMR algorithm satisfies the 0.0001-differential privacy, which could balance the privacy level and data accuracy.

Table 2: Effects of privacy budget ? on coronary heart disease and pneumonia

4.3.2 Experiment results and analysis

Tab. 3 shows the Recall, Precision and MAE of coronary heart disease and pneumonia patients’ recommendation in the two algorithm, i.e. he proposed PPMR algorithm and the traditional neighborhood-based CF, with k changing. Parameter k demotes the number of nearest neighbors. Here, k could be an integer from 100 to 1600. From Tab. 3, we can see that, with k increase, the Recall and Precision increase, and MAE decreases at first, but when k surpasses a certain threshold, the Recall and Precision decrease and the MAE increases with further increases in the value of k. We can observe that, the Precision,Recall and MAE achieves the best performance when k is around 800, while smaller values like k=400 or larger value k=1600 can potentially degrade the performance.

In addition, we compared the recommendation quality of PPMR algorithm and the traditional neighborhood-based CF to derive the predicted ratings on the medicines. It is discovered that on both Coronary Heart Disease and Pneumonia, the performance of PPMR is very close to that of the non-private baseline with no more than 5% accuracy loss. This indicates PPMR can retain the accuracy of recommendation while providing comprehensive privacy for individuals.

Table 3: Comparison on EMRs datasets

5 Conclusion

Privacy preserving is one of the most essential aspects of collaborate filtering as it protests the sensitive information of users in recommendation systems. In clinical environment, privacy preserving problem is more important since the healthcare data of patients involves high personal and sensitive nature.

This paper proposes an effective privacy preserving method for neighborhood-based collaborative filtering and makes the following contributions:

? Private Neighbor Selection algorithm is provided to prevent the patients’ healthcare information from being attacked. In addition, a new de-identification k-anonymity method is adopted to produce a globally optimal de-identification solution suitable for EMRs datasets.

? A novel Neighborhood-based Differential Privacy Recommendation Algorithm is proposed to provide privacy protection for patients and maximize the accuracy of recommendation at the same time.

? The security analysis and performance evaluation is carried out. Experimental results show the effectiveness and robustness of the proposed PPMR algorithm in various metrics.

Most notably, to the best of our knowledge, this is the first study to investigate the privacy preserving collaborative filtering in medical recommendation. It has been proven that our algorithm can guarantee a better quality of recommendation accuracy. However,the current study only concentrates on the privacy of neighborhood-based CF. Other recommendation techniques, such as Matrix Factorization, still suffer from privacy problem. Therefore, future work should consider the privacy issue for other recommendation techniques.

Acknowledgement:We thank the valuable comments from our reviewers and editors.This work is supported by the Fundamental Research Funds of the First Affiliated Hospital of Xi’an Jiao Tong University (No. 2017RKX-06).

Computers Materials&Continua2018年7期

Computers Materials&Continua的其它文章: Determination of the Normal Contact Stiffness and Integration Time Step for the Finite Element Modeling of Bristle-Surface Interaction; An Image Steganography Algorithm Based on Quantization Index Modulation Resisting Scaling Attacks and Statistical Detection; Machine Learning Based Resource Allocation of Cloud Computing in Auction; A New Encryption-Then-Compression Scheme on Gray Images Using the Markov Random Field; Weighted Sparse Image Classification Based on Low Rank Representation; A Distributed LRTCO Algorithm in Large-Scale DVE Multimedia Systems