This commit is contained in:
Manos Katsomallos 2019-03-05 19:56:18 +01:00
parent 67e975abbe
commit ee2ecd734d
2 changed files with 6 additions and 6 deletions

View File

@ -115,7 +115,7 @@ There are three levels of protection that the data publisher can consider: \emph
An \emph{event} is a (user, sensitive value) pair, e.g.,~the user $a$ is at location $l$. An \emph{event} is a (user, sensitive value) pair, e.g.,~the user $a$ is at location $l$.
\emph{User-}, and \emph{event-} privacy~\cite{dwork2010differential} are the main privacy levels; the former guarantees that all the events of any user \emph{for all timestamps} are protected, while the latter ensures that any single event \emph{at a specific timestamp} is protected. \emph{User-}, and \emph{event-} privacy~\cite{dwork2010differential} are the main privacy levels; the former guarantees that all the events of any user \emph{for all timestamps} are protected, while the latter ensures that any single event \emph{at a specific timestamp} is protected.
Moreover, \emph{w-event}~\cite{kellaris2014differentially} attempts to bridge the gap between event and user level privacy in streaming settings, by protecting any event sequence of any user within a window of $w$ timestamps. Moreover, \emph{w-event}~\cite{kellaris2014differentially} attempts to bridge the gap between event and user level privacy in streaming settings, by protecting any event sequence of any user within a window of $w$ timestamps.
$w-$event is narrower than user level privacy, since it does not hide multiple event sequences from the same user, but when $w$ is set to infinity, $w$-event and user level notions converge. $w$-event is narrower than user level privacy, since it does not hide multiple event sequences from the same user, but when $w$ is set to infinity, $w$-event and user level notions converge.
Note that the described levels have been coined in the context of \emph{differential privacy}~\cite{dwork2006calibrating}, nevertheless, they may apply at other privacy protection techniques as well. Note that the described levels have been coined in the context of \emph{differential privacy}~\cite{dwork2006calibrating}, nevertheless, they may apply at other privacy protection techniques as well.
@ -159,7 +159,7 @@ A data set is said to have $\theta$-closeness when all of its groups have $\thet
\subsubsection{Statistical data} \subsubsection{Statistical data}
While methods based on $k-$anonymity have been mainly employed when releasing microdata, \emph{differential privacy}~\cite{dwork2006calibrating} has been proposed for `privately' releasing high utility aggregates over microdata. While methods based on $k$-anonymity have been mainly employed when releasing microdata, \emph{differential privacy}~\cite{dwork2006calibrating} has been proposed for `privately' releasing high utility aggregates over microdata.
More precisely, differential privacy ensures that the removal or addition of a single data item, i.e.,~the record of an individual, in a released data set does not (substantially) affect the outcome of any analysis. More precisely, differential privacy ensures that the removal or addition of a single data item, i.e.,~the record of an individual, in a released data set does not (substantially) affect the outcome of any analysis.
It is a statistical property of the privacy mechanism and is irrelevant to the computational power and auxiliary information available to the adversary. It is a statistical property of the privacy mechanism and is irrelevant to the computational power and auxiliary information available to the adversary.
@ -292,7 +292,7 @@ Its content varies according to the service's functionality and is transmitted/r
\label{fig:scenario-micro} \label{fig:scenario-micro}
\end{figure} \end{figure}
First, we anonymize the data set of Figure~\ref{fig:scenario} using $k-$anonymity, with $k=3$. First, we anonymize the data set of Figure~\ref{fig:scenario} using $k$-anonymity, with $k=3$.
This means that any user should not be distinguished from at least 2 others. This means that any user should not be distinguished from at least 2 others.
We start by suppressing the values of the Name attribute, which is the identifier. We start by suppressing the values of the Name attribute, which is the identifier.
The Age and Location attributes are the quasi-identifiers, so we proceed to adequately generalize them. The Age and Location attributes are the quasi-identifiers, so we proceed to adequately generalize them.

View File

@ -1,7 +1,7 @@
\section{Microdata} \section{Microdata}
\label{sec:microdata} \label{sec:microdata}
As observed in Table~\ref{tab:related}, privacy preserving algorithms for microdata rely on $k-$anonymity, or derivatives of it. Ganta et al.~\cite{ganta2008composition} revealed that $k$-anonymity methods are vulnerable to \emph{composition attacks}. Consequently, these attacks drew the attention of researchers, who proposed various algorithms based on $k-$anonymity, each introducing a different dimension on the problem, for instance that previous releases are known to the publisher, or that the quasi-identifiers can be formed by combining attributes in different releases. Note, however, that only one (Li et al.~\cite{li2016hybrid}) of the following works assumes \emph{independently} anonymized data sets that may not be known to the publisher in the attack model, making it more general than the rest of the works. As observed in Table~\ref{tab:related}, privacy preserving algorithms for microdata rely on $k$-anonymity, or derivatives of it. Ganta et al.~\cite{ganta2008composition} revealed that $k$-anonymity methods are vulnerable to \emph{composition attacks}. Consequently, these attacks drew the attention of researchers, who proposed various algorithms based on $k$-anonymity, each introducing a different dimension on the problem, for instance that previous releases are known to the publisher, or that the quasi-identifiers can be formed by combining attributes in different releases. Note, however, that only one (Li et al.~\cite{li2016hybrid}) of the following works assumes \emph{independently} anonymized data sets that may not be known to the publisher in the attack model, making it more general than the rest of the works.
% \subsection{Continual data} % \subsection{Continual data}
@ -28,7 +28,7 @@ In the same update setting (insert/delete), \hypertarget{he2011preventing}{He et
% Continuous privacy preserving publishing of data streams % Continuous privacy preserving publishing of data streams
\hypertarget{zhou2009continuous}{Zhou et al.}~\cite{zhou2009continuous} introduce the problem of continuous private data publication in \emph{streams}, and propose a randomized solution based on $k-$anonymity. In their definition, they state that a private stream consists in publishing equivalence classes of size larger than or equal to $k$ containing generalized tuples from distinct persons (or identifiers in general). To create the equivalence classes they set several desiderata. Except for the size of a class, which should be larger or equal to $k$, the information loss occurred by the generalization should be low, whereas the delay in forming and publishing the class should be low as well. To achieve these they built a randomized model using the popular structure of $R-$trees, extended to accommodate data density distribution information. In this way, they achieve a better quality for the released private data: On the one hand, formed classes contain data items that are close to each other (in dense areas), while on the other hand classes with tuples of sparse areas are released as soon as possible so that the delay will remain low. This work has a special focus on publishing good quality private data. Still, it does not consider attacks where background knowledge exists, nor does it measure the privacy level achieved (other than requiring the size of the released class to be larger or equal to $k$ as in $k-$anonymity), as $\varepsilon$-differential privacy. \hypertarget{zhou2009continuous}{Zhou et al.}~\cite{zhou2009continuous} introduce the problem of continuous private data publication in \emph{streams}, and propose a randomized solution based on $k$-anonymity. In their definition, they state that a private stream consists in publishing equivalence classes of size larger than or equal to $k$ containing generalized tuples from distinct persons (or identifiers in general). To create the equivalence classes they set several desiderata. Except for the size of a class, which should be larger or equal to $k$, the information loss occurred by the generalization should be low, whereas the delay in forming and publishing the class should be low as well. To achieve these they built a randomized model using the popular structure of $R-$trees, extended to accommodate data density distribution information. In this way, they achieve a better quality for the released private data: On the one hand, formed classes contain data items that are close to each other (in dense areas), while on the other hand classes with tuples of sparse areas are released as soon as possible so that the delay will remain low. This work has a special focus on publishing good quality private data. Still, it does not consider attacks where background knowledge exists, nor does it measure the privacy level achieved (other than requiring the size of the released class to be larger or equal to $k$ as in $k$-anonymity), as $\varepsilon$-differential privacy.
% Maskit: Privately releasing user context streams for personalized mobile applications % Maskit: Privately releasing user context streams for personalized mobile applications
@ -65,7 +65,7 @@ In the same update setting (insert/delete), \hypertarget{he2011preventing}{He et
% Anonymity for continuous data publishing % Anonymity for continuous data publishing
\hypertarget{fung2008anonymity}{Fung et al.}~\cite{fung2008anonymity} introduce the problem of privately releasing continuous \emph{incremental} data sets. The invariant of this kind of releases is that in every timestamp $T_i$, the records previously released in a timestamp $T_j$, where $j<i$, are released again together with a set of new records. The authors first focus in two consecutive releases and describe three classes of possible attacks. They name these attacks \emph{correspondence} attacks because they rely on the principle that all tuples from data set $D1$ correspond to a tuple in the subsequent data set $D2$. Naturally, the opposite does not hold, as tuples with a timestamp $T_2$ do not exist in $D1$. Assuming that the attacker knows the quasi-identifiers and the timestamp of the record of a person, they define the \emph{backward}, \emph{cross} and \emph{forward} (\emph{BCF}) attacks. They show that combining two individually $k-$anonymized subsequent releases using one of the aforementioned attacks can lead to `cracking' some of the records in the set of $k$ candidate tuples rendering the privacy level lower than $k$. Except for the detection of cases of compromising $BCF$ anonymity between two releases, the authors also provide an anonymization algorithm for a release $R2$ in the presence of a private release $R1$. The algorithm starts from the most possible generalized state for the quasi-identifiers of the records in $D2$. Step by step, it checks which combinations of specializations on the attributes do not violate the $BCF$ anonymity and outputs the most possible specialized version of the data set. The authors discuss how the framework extends to multiple releases and to different kinds of privacy methods (other than $k-$anonymization). It is worth noting that in order to maintain a certain quality for a release, it is essential that the delta among subsequent releases is large enough; otherwise the needed generalization level may destroy the utility of the data set. \hypertarget{fung2008anonymity}{Fung et al.}~\cite{fung2008anonymity} introduce the problem of privately releasing continuous \emph{incremental} data sets. The invariant of this kind of releases is that in every timestamp $T_i$, the records previously released in a timestamp $T_j$, where $j<i$, are released again together with a set of new records. The authors first focus in two consecutive releases and describe three classes of possible attacks. They name these attacks \emph{correspondence} attacks because they rely on the principle that all tuples from data set $D1$ correspond to a tuple in the subsequent data set $D2$. Naturally, the opposite does not hold, as tuples with a timestamp $T_2$ do not exist in $D1$. Assuming that the attacker knows the quasi-identifiers and the timestamp of the record of a person, they define the \emph{backward}, \emph{cross} and \emph{forward} (\emph{BCF}) attacks. They show that combining two individually $k$-anonymized subsequent releases using one of the aforementioned attacks can lead to `cracking' some of the records in the set of $k$ candidate tuples rendering the privacy level lower than $k$. Except for the detection of cases of compromising $BCF$ anonymity between two releases, the authors also provide an anonymization algorithm for a release $R2$ in the presence of a private release $R1$. The algorithm starts from the most possible generalized state for the quasi-identifiers of the records in $D2$. Step by step, it checks which combinations of specializations on the attributes do not violate the $BCF$ anonymity and outputs the most possible specialized version of the data set. The authors discuss how the framework extends to multiple releases and to different kinds of privacy methods (other than $k$-anonymization). It is worth noting that in order to maintain a certain quality for a release, it is essential that the delta among subsequent releases is large enough; otherwise the needed generalization level may destroy the utility of the data set.
% Protecting Locations with Differential Privacy under Temporal Correlations % Protecting Locations with Differential Privacy under Temporal Correlations