diff --git a/background.tex b/background.tex index 633f9c2..836f88b 100644 --- a/background.tex +++ b/background.tex @@ -115,7 +115,7 @@ There are three levels of protection that the data publisher can consider: \emph An \emph{event} is a (user, sensitive value) pair, e.g.,~the user $a$ is at location $l$. \emph{User-}, and \emph{event-} privacy~\cite{dwork2010differential} are the main privacy levels; the former guarantees that all the events of any user \emph{for all timestamps} are protected, while the latter ensures that any single event \emph{at a specific timestamp} is protected. Moreover, \emph{w-event}~\cite{kellaris2014differentially} attempts to bridge the gap between event and user level privacy in streaming settings, by protecting any event sequence of any user within a window of $w$ timestamps. -$w-$event is narrower than user level privacy, since it does not hide multiple event sequences from the same user, but when $w$ is set to infinity, $w$-event and user level notions converge. +$w$-event is narrower than user level privacy, since it does not hide multiple event sequences from the same user, but when $w$ is set to infinity, $w$-event and user level notions converge. Note that the described levels have been coined in the context of \emph{differential privacy}~\cite{dwork2006calibrating}, nevertheless, they may apply at other privacy protection techniques as well. @@ -159,7 +159,7 @@ A data set is said to have $\theta$-closeness when all of its groups have $\thet \subsubsection{Statistical data} -While methods based on $k-$anonymity have been mainly employed when releasing microdata, \emph{differential privacy}~\cite{dwork2006calibrating} has been proposed for `privately' releasing high utility aggregates over microdata. +While methods based on $k$-anonymity have been mainly employed when releasing microdata, \emph{differential privacy}~\cite{dwork2006calibrating} has been proposed for `privately' releasing high utility aggregates over microdata. More precisely, differential privacy ensures that the removal or addition of a single data item, i.e.,~the record of an individual, in a released data set does not (substantially) affect the outcome of any analysis. It is a statistical property of the privacy mechanism and is irrelevant to the computational power and auxiliary information available to the adversary. @@ -292,7 +292,7 @@ Its content varies according to the service's functionality and is transmitted/r \label{fig:scenario-micro} \end{figure} -First, we anonymize the data set of Figure~\ref{fig:scenario} using $k-$anonymity, with $k=3$. +First, we anonymize the data set of Figure~\ref{fig:scenario} using $k$-anonymity, with $k=3$. This means that any user should not be distinguished from at least 2 others. We start by suppressing the values of the Name attribute, which is the identifier. The Age and Location attributes are the quasi-identifiers, so we proceed to adequately generalize them. diff --git a/microdata.tex b/microdata.tex index 1f5436b..2a8b8c1 100644 --- a/microdata.tex +++ b/microdata.tex @@ -1,7 +1,7 @@ \section{Microdata} \label{sec:microdata} -As observed in Table~\ref{tab:related}, privacy preserving algorithms for microdata rely on $k-$anonymity, or derivatives of it. Ganta et al.~\cite{ganta2008composition} revealed that $k$-anonymity methods are vulnerable to \emph{composition attacks}. Consequently, these attacks drew the attention of researchers, who proposed various algorithms based on $k-$anonymity, each introducing a different dimension on the problem, for instance that previous releases are known to the publisher, or that the quasi-identifiers can be formed by combining attributes in different releases. Note, however, that only one (Li et al.~\cite{li2016hybrid}) of the following works assumes \emph{independently} anonymized data sets that may not be known to the publisher in the attack model, making it more general than the rest of the works. +As observed in Table~\ref{tab:related}, privacy preserving algorithms for microdata rely on $k$-anonymity, or derivatives of it. Ganta et al.~\cite{ganta2008composition} revealed that $k$-anonymity methods are vulnerable to \emph{composition attacks}. Consequently, these attacks drew the attention of researchers, who proposed various algorithms based on $k$-anonymity, each introducing a different dimension on the problem, for instance that previous releases are known to the publisher, or that the quasi-identifiers can be formed by combining attributes in different releases. Note, however, that only one (Li et al.~\cite{li2016hybrid}) of the following works assumes \emph{independently} anonymized data sets that may not be known to the publisher in the attack model, making it more general than the rest of the works. % \subsection{Continual data} @@ -28,7 +28,7 @@ In the same update setting (insert/delete), \hypertarget{he2011preventing}{He et % Continuous privacy preserving publishing of data streams -\hypertarget{zhou2009continuous}{Zhou et al.}~\cite{zhou2009continuous} introduce the problem of continuous private data publication in \emph{streams}, and propose a randomized solution based on $k-$anonymity. In their definition, they state that a private stream consists in publishing equivalence classes of size larger than or equal to $k$ containing generalized tuples from distinct persons (or identifiers in general). To create the equivalence classes they set several desiderata. Except for the size of a class, which should be larger or equal to $k$, the information loss occurred by the generalization should be low, whereas the delay in forming and publishing the class should be low as well. To achieve these they built a randomized model using the popular structure of $R-$trees, extended to accommodate data density distribution information. In this way, they achieve a better quality for the released private data: On the one hand, formed classes contain data items that are close to each other (in dense areas), while on the other hand classes with tuples of sparse areas are released as soon as possible so that the delay will remain low. This work has a special focus on publishing good quality private data. Still, it does not consider attacks where background knowledge exists, nor does it measure the privacy level achieved (other than requiring the size of the released class to be larger or equal to $k$ as in $k-$anonymity), as $\varepsilon$-differential privacy. +\hypertarget{zhou2009continuous}{Zhou et al.}~\cite{zhou2009continuous} introduce the problem of continuous private data publication in \emph{streams}, and propose a randomized solution based on $k$-anonymity. In their definition, they state that a private stream consists in publishing equivalence classes of size larger than or equal to $k$ containing generalized tuples from distinct persons (or identifiers in general). To create the equivalence classes they set several desiderata. Except for the size of a class, which should be larger or equal to $k$, the information loss occurred by the generalization should be low, whereas the delay in forming and publishing the class should be low as well. To achieve these they built a randomized model using the popular structure of $R-$trees, extended to accommodate data density distribution information. In this way, they achieve a better quality for the released private data: On the one hand, formed classes contain data items that are close to each other (in dense areas), while on the other hand classes with tuples of sparse areas are released as soon as possible so that the delay will remain low. This work has a special focus on publishing good quality private data. Still, it does not consider attacks where background knowledge exists, nor does it measure the privacy level achieved (other than requiring the size of the released class to be larger or equal to $k$ as in $k$-anonymity), as $\varepsilon$-differential privacy. % Maskit: Privately releasing user context streams for personalized mobile applications @@ -65,7 +65,7 @@ In the same update setting (insert/delete), \hypertarget{he2011preventing}{He et % Anonymity for continuous data publishing -\hypertarget{fung2008anonymity}{Fung et al.}~\cite{fung2008anonymity} introduce the problem of privately releasing continuous \emph{incremental} data sets. The invariant of this kind of releases is that in every timestamp $T_i$, the records previously released in a timestamp $T_j$, where $j