|
|
|
@ -51,8 +51,11 @@ In order to better protect the privacy of Donald in case of attacks, the data sh
|
|
|
|
|
\subsection{Levels of privacy protection}
|
|
|
|
|
\label{subsec:prv-levels}
|
|
|
|
|
|
|
|
|
|
The information disclosure that a data release may entail is linked to the protection level that indicates \emph{what} a privacy-preserving algorithm is trying to achieve.\kat{I don't understand this first sentence}
|
|
|
|
|
More specifically, in continuous data publishing we consider the privacy protection level with respect to not only the users, but also to the \emph{events} occurring in the data.
|
|
|
|
|
% The information disclosure that a data release may entail is linked to the protection level that indicates \emph{what} a privacy-preserving algorithm is trying to achieve.
|
|
|
|
|
% \kat{I don't understand this first sentence}
|
|
|
|
|
% \mk{Same here...}
|
|
|
|
|
% More specifically, i
|
|
|
|
|
In continuous data publishing we consider the privacy protection level with respect to not only the users, but also to the \emph{events} occurring in the data.
|
|
|
|
|
An event is a pair of an identifying attribute of an individual and the sensitive data (including contextual information) and we can see it as a correspondence to a record in a database, where each individual may participate once.
|
|
|
|
|
Data publishers typically release events in the form of sequences of data items, usually indexed in time order (time series) and geotagged, e.g.,~(`Dewey', `at home at Montmartre at $t_1$'), \dots, (`Quackmore', `dining at Opera at $t_1$').
|
|
|
|
|
We use the term `users' to refer to the \emph{individuals}, also known as \emph{participants}, who are the source of the processed and published data.
|
|
|
|
@ -61,9 +64,13 @@ Users are subject to privacy attacks, and thus are the main point of interest of
|
|
|
|
|
In more detail, the privacy protection levels are:
|
|
|
|
|
|
|
|
|
|
\begin{enumerate}[(a)]
|
|
|
|
|
\item \emph{Event}~\cite{dwork2010differential, dwork2010pan}---limits the privacy protection to \emph{any single event} in a time series, providing maximum \kat{maximum? better say high} data utility.
|
|
|
|
|
\item \emph{Event}~\cite{dwork2010differential, dwork2010pan}---limits the privacy protection to \emph{any single event} in a time series, providing high
|
|
|
|
|
% \kat{maximum? better say high}
|
|
|
|
|
data utility.
|
|
|
|
|
\item \emph{$w$-event}~\cite{kellaris2014differentially}---provides privacy protection to \emph{any sequence of $w$ events} in a time series.
|
|
|
|
|
\item \emph{User}~\cite{dwork2010differential, dwork2010pan}---protects \emph{all the events} in a time series, providing maximum\kat{maximum? better say high} privacy protection.
|
|
|
|
|
\item \emph{User}~\cite{dwork2010differential, dwork2010pan}---protects \emph{all the events} in a time series, providing high
|
|
|
|
|
% \kat{maximum? better say high}
|
|
|
|
|
privacy protection.
|
|
|
|
|
\end{enumerate}
|
|
|
|
|
|
|
|
|
|
Figure~\ref{fig:prv-levels} demonstrates the application of the possible protection levels on the statistical data of Example~\ref{ex:continuous}.
|
|
|
|
@ -71,6 +78,7 @@ For instance, in event-level (Figure~\ref{fig:level-event}) it is hard to determ
|
|
|
|
|
Moreover, in user-level (Figure~\ref{fig:level-user}) it is hard to determine whether Quackmore was ever included in the released series of events at all.
|
|
|
|
|
Finally, in $2$-event-level (Figure~\ref{fig:level-w-event}) it is hard to determine whether Quackmore was ever included in the released series of events between the timestamps $t_1$ and $t_2$, $t_2$ and $t_3$, etc. (i.e.,~for a window $w = 2$).
|
|
|
|
|
\kat{Already, by looking at the original counts, for the reader it is hard to see if Quackmore was in the event/database. So, we don't really get the difference among the different levels here.}
|
|
|
|
|
\mk{It is without background knowledge.}
|
|
|
|
|
|
|
|
|
|
\begin{figure}[htp]
|
|
|
|
|
\centering
|
|
|
|
@ -83,7 +91,10 @@ Finally, in $2$-event-level (Figure~\ref{fig:level-w-event}) it is hard to deter
|
|
|
|
|
\subcaptionbox{$2$-event-level\label{fig:level-w-event}}{%
|
|
|
|
|
\includegraphics[width=.32\linewidth]{level-w-event}%
|
|
|
|
|
}\hspace{\fill}
|
|
|
|
|
\caption{Protecting the data of Table~\ref{tab:continuous-statistical} on (a)~event-, (b)~user-, and (c)~$2$-event-level. A suitable distortion method can be applied accordingly. \kat{Why don't you distort the results already in this table?}}
|
|
|
|
|
\caption{Protecting the data of Table~\ref{tab:continuous-statistical} on (a)~event-, (b)~user-, and (c)~$2$-event-level. A suitable distortion method can be applied accordingly.
|
|
|
|
|
% \kat{Why don't you distort the results already in this table?}
|
|
|
|
|
% \mk{Because we've not discussed yet about these operations.}
|
|
|
|
|
}
|
|
|
|
|
\label{fig:prv-levels}
|
|
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
|
@ -97,13 +108,30 @@ Although the described levels have been coined in the context of \emph{different
|
|
|
|
|
\subsection{Privacy-preserving operations}
|
|
|
|
|
\label{subsec:prv-operations}
|
|
|
|
|
|
|
|
|
|
Protecting private information, which is known by many names (obfuscation, cloaking, anonymization, etc.\kat{the techniques are not equivalent, so it is correct to say that they are different names for the same thing}), is achieved by using a specific basic \kat{but later you mention several ones.. so what is the specific basic one ?}privacy protection operation.
|
|
|
|
|
Depending on the intervention\kat{?, technique, algorithm, method, operation, intervention.. we are a little lost with the terminology and the difference among all these } that we choose to perform on the original data, we identify the following operations:\kat{you can mention that the different operations have different granularity}
|
|
|
|
|
Protecting private information
|
|
|
|
|
% , which is known by many names (obfuscation, cloaking, anonymization, etc.),
|
|
|
|
|
% \kat{the techniques are not equivalent, so it is correct to say that they are different names for the same thing}
|
|
|
|
|
is achieved by using a specific basic
|
|
|
|
|
% \kat{but later you mention several ones.. so what is the specific basic one ?}
|
|
|
|
|
privacy protection operation.
|
|
|
|
|
Depending on the
|
|
|
|
|
technique
|
|
|
|
|
% intervention
|
|
|
|
|
% \kat{?, technique, algorithm, method, operation, intervention.. we are a little lost with the terminology and the difference among all these }
|
|
|
|
|
that we choose to perform on the original data, we identify the following operations:
|
|
|
|
|
% \kat{you can mention that the different operations have different granularity}
|
|
|
|
|
% \mk{``granularity''?}
|
|
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
|
\item \emph{Aggregation}---group\kat{or combine? also maybe mention that the single value will replace the values of a specific attribute of these rows} together multiple rows of a data set to form a single value.
|
|
|
|
|
\item \emph{Aggregation}---combine
|
|
|
|
|
% group
|
|
|
|
|
% \kat{or combine? also maybe mention that the single value will replace the values of a specific attribute of these rows}
|
|
|
|
|
% together
|
|
|
|
|
multiple rows of a data set to form a single value which will replace these rows.
|
|
|
|
|
\item \emph{Generalization}---replace an attribute value with a parent value in the attribute taxonomy (when applicable).
|
|
|
|
|
Notice that a step of generalization, may be followed by a step of \emph{specialization}, to improve the quality of the resulting data set.\kat{This technical detail is not totally clear at this point. Either elaborate or remove.}
|
|
|
|
|
% Notice that a step of generalization, may be followed by a step of \emph{specialization}, to improve the quality of the resulting data set.
|
|
|
|
|
% \kat{This technical detail is not totally clear at this point. Either elaborate or remove.}
|
|
|
|
|
% \mk{I cannot remember coming across it in the literature.}
|
|
|
|
|
\item \emph{Suppression}---delete completely certain sensitive values or entire records.
|
|
|
|
|
\item \emph{Perturbation}---disturb the initial attribute value in a deterministic or probabilistic way.
|
|
|
|
|
The probabilistic data distortion is referred to as \emph{randomization}.
|
|
|
|
@ -114,15 +142,20 @@ If we want to protect the \emph{Age} of the user by aggregation, we may replace
|
|
|
|
|
|
|
|
|
|
It is worth mentioning that there is a series of algorithms (e.g.,~\cite{benaloh2009patient, kamara2010cryptographic, cao2014privacy}) based on the \emph{cryptography} operation.
|
|
|
|
|
However, the majority of these methods, among other assumptions that they make, have minimum or even no trust to the entities that handle the personal information.
|
|
|
|
|
Furthermore, the amount and the way of data processing of these techniques usually burden the overall procedure, deteriorate the utility of the resulting data sets, and restrict their applicability.\kat{All these points apply also to the non-cryptography techniques. So you should mostly point out that they do not only deteriorate the utility but make them non-usable at all.}
|
|
|
|
|
Furthermore, the amount and the way of data processing of these techniques usually burden the overall procedure, deteriorate the utility of the resulting data sets to a point where they are completely useless, and restrict their applicability.
|
|
|
|
|
% \kat{All these points apply also to the non-cryptography techniques. So you should mostly point out that they do not only deteriorate the utility but make them non-usable at all.}
|
|
|
|
|
Our focus is limited to techniques that achieve a satisfying balance between both participants' privacy and data utility.
|
|
|
|
|
For these reasons, there will be no further discussion around this family of techniques in this article.\kat{sentence that fitted in the survey but not in the thesis so replace with a more pertinent comment}
|
|
|
|
|
% For these reasons, there will be no further discussion around this family of techniques in this article.
|
|
|
|
|
% \kat{sentence that fitted in the survey but not in the thesis so replace with a more pertinent comment}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\subsection{Basic notions for privacy protection}
|
|
|
|
|
\label{subsec:prv-seminal}
|
|
|
|
|
|
|
|
|
|
For completeness, in this section we present the seminal works for privacy-preserving data publishing, which, even though originally designed for the snapshot publishing scenario \kat{was dp designed for the snapshot publishing scenario?}, have paved the way, since many of the works in privacy-preserving continuous publishing are based on or extend them.
|
|
|
|
|
For completeness, in this section we present the seminal works for privacy-preserving data publishing, which, even though originally designed for the snapshot publishing scenario,
|
|
|
|
|
% \kat{was dp designed for the snapshot publishing scenario?}
|
|
|
|
|
% \mk{Not clearly but yes. We can write it since DP was coined in 2006, while DP under continual observation came later in 2010.}
|
|
|
|
|
have paved the way, since many of the works in privacy-preserving continuous publishing are based on or extend them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\subsubsection{Microdata}
|
|
|
|
@ -131,7 +164,12 @@ For completeness, in this section we present the seminal works for privacy-prese
|
|
|
|
|
Sweeney coined \emph{$k$-anonymity}~\cite{sweeney2002k}, one of the first established works on data privacy.
|
|
|
|
|
A released data set features $k$-anonymity protection when the sequence of values for a set of identifying attributes, called the \emph{quasi-identifiers}, is the same for at least $k$ records in the data set.
|
|
|
|
|
Computing the quasi-identifiers in a set of attributes is still a hard problem on its own~\cite{motwani2007efficient}.
|
|
|
|
|
$k$-anonymity is syntactic\kat{meaning?}, it constitutes an individual indistinguishable from at least $k-1$ other individuals in the same data set.\kat{you just said this in another way,two sentences before}
|
|
|
|
|
% $k$-anonymity
|
|
|
|
|
% is syntactic,
|
|
|
|
|
% \kat{meaning?}
|
|
|
|
|
% it
|
|
|
|
|
% constitutes an individual indistinguishable from at least $k-1$ other individuals in the same data set.
|
|
|
|
|
% \kat{you just said this in another way,two sentences before}
|
|
|
|
|
In a follow-up work~\cite{sweeney2002achieving}, the author describes a way to achieve $k$-anonymity for a data set by the suppression or generalization of certain values of the quasi-identifiers.
|
|
|
|
|
|
|
|
|
|
Several works identified and addressed privacy concerns on $k$-anonymity. Machanavajjhala et al.~\cite{machanavajjhala2006diversity} pointed out that $k$-anonymity is vulnerable to homogeneity and background knowledge attacks.
|
|
|
|
@ -146,17 +184,26 @@ A data set features $\theta$-closeness when all of its groups satisfy $\theta$-
|
|
|
|
|
The main drawback of $k$-anonymity (and its derivatives) is that it is not tolerant to external attacks of re-identification on the released data set.
|
|
|
|
|
The problems identified in~\cite{sweeney2002k} appear when attempting to apply $k$-anonymity on continuous data publishing (as we will also see next in Section~\ref{sec:micro}).
|
|
|
|
|
These attacks include multiple $k$-anonymous data set releases with the same record order, subsequent releases of a data set without taking into account previous $k$-anonymous releases, and tuple updates.
|
|
|
|
|
Proposed solutions include rearranging the attributes, setting the whole attribute set of previously released data sets as quasi-identifiers or releasing data based on previous $k$-anonymous releases.\kat{and the citations of these solutions?}
|
|
|
|
|
Proposed solutions include rearranging the attributes, setting the whole attribute set of previously released data sets as quasi-identifiers or releasing data based on previous $k$-anonymous releases~\cite{simi2017extensive}.
|
|
|
|
|
% \kat{and the citations of these solutions?}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\subsubsection{Statistical data}
|
|
|
|
|
\label{subsec:prv-statistical}
|
|
|
|
|
|
|
|
|
|
While methods based on $k$-anonymity have been mainly employed for releasing microdata, \emph{differential privacy}~\cite{dwork2006calibrating} has been proposed for releasing high utility aggregates over microdata while providing semantic\kat{semantic ?} privacy guarantees.
|
|
|
|
|
Differential privacy is algorithmic \kat{algorithmic? moreover, you repeat this sentence later on, after the definition of neighboring datasets}, it ensures that any adversary observing a privacy-protected output, no matter his/her computational power or auxiliary information, cannot conclude with absolute certainty if an individual is included in the input data set.
|
|
|
|
|
While methods based on $k$-anonymity have been mainly employed for releasing microdata, \emph{differential privacy}~\cite{dwork2006calibrating} has been proposed for releasing high utility aggregates over microdata while providing semantic
|
|
|
|
|
% \kat{semantic ?}
|
|
|
|
|
privacy guarantees that characterize the output data.
|
|
|
|
|
Differential privacy is algorithmic,
|
|
|
|
|
% \kat{algorithmic? moreover, you repeat this sentence later on, after the definition of neighboring datasets}
|
|
|
|
|
it characterizes the data publishing process which passes its privacy guarantee to the resulting data.
|
|
|
|
|
It ensures that any adversary observing a privacy-protected output, no matter their computational power or auxiliary information, cannot conclude with absolute certainty if an individual is included in the input data set.
|
|
|
|
|
Moreover, it quantifies and bounds the impact that the addition/removal of an individual to/from a data set has on the derived privacy-protected aggregates thereof.
|
|
|
|
|
More precisely, differential privacy quantifies the impact of the addition/removal of a single tuple in $D$ on the output $\pmb{o}$ of a privacy mechanism $\mathcal{M}$.
|
|
|
|
|
% \kat{what is M?}
|
|
|
|
|
The distribution of all $\pmb{o}$, in some range $\mathcal{O}$, is not affected \emph{substantially}, i.e.,~it changes only slightly due to the modification of any one tuple in all possible $D \in \mathcal{D}$.
|
|
|
|
|
|
|
|
|
|
\kat{introduce the following definition, and link it to the text before. Maybe you can put the definition after the following paragraph.}
|
|
|
|
|
% \kat{introduce the following definition, and link it to the text before. Maybe you can put the definition after the following paragraph.}
|
|
|
|
|
|
|
|
|
|
\begin{definition}
|
|
|
|
|
[Neighboring data sets]
|
|
|
|
@ -164,10 +211,13 @@ Moreover, it quantifies and bounds the impact that the addition/removal of an in
|
|
|
|
|
Two data sets are neighboring (or adjacent) when they differ by at most one tuple, i.e.,~one can be obtained by adding/removing the data of an individual to/from the other.
|
|
|
|
|
\end{definition}
|
|
|
|
|
|
|
|
|
|
More precisely, differential privacy quantifies the impact of the addition/removal of a single tuple in $D$ on the output $\pmb{o}$ of $\mathcal{M}$. \kat{what is M?}
|
|
|
|
|
The distribution of all $\pmb{o}$, in some range $\mathcal{O}$, is not affected \emph{substantially}, i.e.,~it changes only slightly due to the modification of any one tuple in all possible $D \in \mathcal{D}$.
|
|
|
|
|
Thus, differential privacy is algorithmic\kat{??}, it ensures that any adversary observing any $\pmb{o}$ cannot conclude with absolute certainty whether or not any individual is included in any $D$.
|
|
|
|
|
Its performance is irrelevant to the computational power and auxiliary information available to an adversary observing the outputs of $\mathcal{M}$.\kat{you already said this. Moreover, it is irrelevant to the neighboring datasets and thus does not fit here..}
|
|
|
|
|
% Thus, differential privacy
|
|
|
|
|
% is algorithmic,
|
|
|
|
|
% \kat{??}
|
|
|
|
|
% it
|
|
|
|
|
% ensures that any adversary observing any $\pmb{o}$ cannot conclude with absolute certainty whether or not any individual is included in any $D$.
|
|
|
|
|
% Its performance is irrelevant to the computational power and auxiliary information available to an adversary observing the outputs of $\mathcal{M}$.
|
|
|
|
|
% \kat{you already said this. Moreover, it is irrelevant to the neighboring datasets and thus does not fit here..}
|
|
|
|
|
|
|
|
|
|
\begin{definition}
|
|
|
|
|
[Differential privacy]
|
|
|
|
@ -176,16 +226,39 @@ Its performance is irrelevant to the computational power and auxiliary informati
|
|
|
|
|
$$\Pr[\mathcal{M}(D) \in O] \leq e^\varepsilon \Pr[\mathcal{M}(D') \in O]$$
|
|
|
|
|
\end{definition}
|
|
|
|
|
|
|
|
|
|
\noindent $\Pr[\cdot]$ denotes the probability of $\mathcal{M}$ generating $\pmb{o}$ \kat{there is no o in the definition above} as output from $O \subseteq \mathcal{O}$, when given $D$ as input.
|
|
|
|
|
\noindent $\Pr[\cdot]$ denotes the probability of $\mathcal{M}$ generating an output
|
|
|
|
|
% $\pmb{o}$
|
|
|
|
|
% \kat{there is no o in the definition above}
|
|
|
|
|
% as output
|
|
|
|
|
from all possible $O \subseteq \mathcal{O}$, when given $D$ as input.
|
|
|
|
|
The \emph{privacy budget} $\varepsilon$ is a positive real number that represents the user-defined privacy goal~\cite{mcsherry2009privacy}.
|
|
|
|
|
As the definition implies, $\mathcal{M}$ achieves stronger privacy protection for lower values of $\varepsilon$ since the probabilities of $D$ and $D'$ being true worlds are similar, but the utility of $\pmb{o}$ \kat{there is no o in the definition above} is reduced since more randomness is introduced by $\mathcal{M}$.
|
|
|
|
|
As the definition implies, $\mathcal{M}$ achieves stronger privacy protection for lower values of $\varepsilon$ since the probabilities of $D$ and $D'$ being true worlds are similar, but the utility of tje output
|
|
|
|
|
% $\pmb{o}$
|
|
|
|
|
% \kat{there is no o in the definition above}
|
|
|
|
|
is reduced since more randomness is introduced by $\mathcal{M}$.
|
|
|
|
|
The privacy budget $\varepsilon$ is usually set to $0.01$, $0.1$, or, in some cases, $\ln2$ or $\ln3$~\cite{lee2011much}.
|
|
|
|
|
|
|
|
|
|
Its local variant~\cite{duchi2013local} is compatible with microdata, where $D$ is composed of a single data item and is represented by $x$.\kat{Seems out of place and needs to be described a little more..}
|
|
|
|
|
% Its local variant~\cite{duchi2013local} is compatible with microdata, where $D$ is composed of a single data item and is represented by $x$.\kat{Seems out of place and needs to be described a little more..}
|
|
|
|
|
% We refer the interested reader to~\cite{desfontaines2020sok} for a systematic taxonomy of the different variants and extensions of differential privacy.
|
|
|
|
|
|
|
|
|
|
We refer the interested reader to~\cite{desfontaines2020sok} for a systematic taxonomy of the different variants and extensions of differential privacy.
|
|
|
|
|
The applicability
|
|
|
|
|
% pertinence
|
|
|
|
|
% \kat{pertinence to what?}
|
|
|
|
|
of differential privacy mechanisms is inseparable from the query's
|
|
|
|
|
% \kat{here, you need to associate a mechanism M to the query, because so far you have been talking for mechanisms}
|
|
|
|
|
function sensitivity.
|
|
|
|
|
The presence/absence of a single record should only change the result slightly,
|
|
|
|
|
% \kat{do you want to say 'should' and not 'can'?}
|
|
|
|
|
and therefore differential privacy methods are best for low sensitivity queries such as counts.
|
|
|
|
|
However, sum, max, and in some cases average
|
|
|
|
|
% \kat{and average }
|
|
|
|
|
queries can be problematic since a single (but outlier) value could change the output noticeably, making it necessary to add a lot of noise to the query's answer.
|
|
|
|
|
|
|
|
|
|
<<<<<<< HEAD
|
|
|
|
|
\kat{introduce and link to the previous text the following definition}
|
|
|
|
|
=======
|
|
|
|
|
% \kat{introduce and link to the previous text the following definition }
|
|
|
|
|
>>>>>>> 744bed7ac1bc6669742b970ea6f0f399200db538
|
|
|
|
|
|
|
|
|
|
\begin{definition}
|
|
|
|
|
[Query function sensitivity]
|
|
|
|
@ -194,12 +267,15 @@ We refer the interested reader to~\cite{desfontaines2020sok} for a systematic ta
|
|
|
|
|
$$\Delta f = \max_{D, D' \in \mathcal{D}} \lVert {f(D) - f(D')} \rVert_{1}$$
|
|
|
|
|
\end{definition}
|
|
|
|
|
|
|
|
|
|
<<<<<<< HEAD
|
|
|
|
|
The pertinence \kat{pertinence to what?} of differential privacy methods is inseparable from the query's \kat{here, you need to associate a mechanism M to the query, because so far you have been talking for mechanisms} function sensitivity.
|
|
|
|
|
The presence/absence of a single record can only change the result slightly\kat{do you want to say 'should' and not 'can'?}, and therefore differential privacy methods are best for low sensitivity queries such as counts.
|
|
|
|
|
However, sum and max \kat{and average } queries can be problematic since a single (but outlier) value could change the output noticeably, making it necessary to add a lot of noise to the query's answer.
|
|
|
|
|
\kat{How does the following connects to the query's sensitivity?}Furthermore, asking a series of queries may allow the disambiguation between possible data sets, making it necessary to add even more noise to the outputs.
|
|
|
|
|
For this reason, after a series of queries exhausts the available privacy budget \kat{you have not talked about the sequential theorem, so this comes out of the blue} the data set has to be discarded.
|
|
|
|
|
\kat{THe following is an explanation of the previous. When you restate sth in different words for explanation, please say that you do so, otherwise it is not clear what new you want to convey.}Keeping the original guarantee across multiple queries that return different/new answers \kat{why only different?even the same query multiple times would have the same results} requires the injection of noise proportional to the number of the executed queries, and thus destroying the utility of the output.
|
|
|
|
|
=======
|
|
|
|
|
>>>>>>> 744bed7ac1bc6669742b970ea6f0f399200db538
|
|
|
|
|
|
|
|
|
|
\paragraph{Privacy mechanisms}
|
|
|
|
|
\label{subsec:prv-mech}
|
|
|
|
@ -269,6 +345,14 @@ Generally, when we apply a series of independent (i.e.,~in the way that they inj
|
|
|
|
|
The privacy guarantee of $m \in \mathbb{Z}^+$ independent privacy mechanisms, satisfying $\varepsilon_1$-, $\varepsilon_2$-, \dots, $\varepsilon_m$-differential privacy respectively, when applied over the same data set equals to $\sum_{i = 1}^m \varepsilon_i$.
|
|
|
|
|
\end{theorem}
|
|
|
|
|
|
|
|
|
|
% \kat{How does the following connects to the query's sensitivity?}
|
|
|
|
|
Asking a series of queries may allow the disambiguation between possible data sets, making it necessary to add even more noise to the outputs.
|
|
|
|
|
% \kat{The following is an explanation of the previous. When you restate sth in different words for explanation, please say that you do so, otherwise it is not clear what new you want to convey.}
|
|
|
|
|
Keeping the original guarantee across multiple queries that require different/new answers requires the injection of noise proportional to the number of the executed queries, and thus destroying the utility of the output.
|
|
|
|
|
For this reason, after a series of queries exhausts the available privacy budget
|
|
|
|
|
% \kat{you have not talked about the sequential theorem, so this comes out of the blue}
|
|
|
|
|
the data set has to be discarded.
|
|
|
|
|
|
|
|
|
|
Notice that the sequential composition corresponds to the worst case scenario where each time we use a mechanism we have to invest some (or all) of the available privacy budget.
|
|
|
|
|
In the special case that we query disjoint data sets, we can take advantage of the \emph{parallel} composition property~\cite{mcsherry2009privacy, soria2016big}, and thus spare some of the available privacy budget.
|
|
|
|
|
|
|
|
|
|