privacy: Minor corrections

2021-08-02 23:17:19 +03:00
parent 521ba30c5f
commit 2b4d7674ba
1 changed files with 32 additions and 28 deletions
--- a/text/preliminaries/privacy.tex
+++ b/text/preliminaries/privacy.tex
@ -1,8 +1,11 @@
 \section{Data privacy}
 \label{sec:privacy}
 \subsection{Information disclosure} 
 \label{subsec:prv-info-dscl}
 When personal data are publicly released, either as microdata or statistical data, individuals' privacy can be compromised, i.e,~an adversary becomes certain about an individual's personal information with a probability higher than a desired threshold.
-In the literature, this compromise is know as \emph{information disclosure} and is usually categorized as~\cite{li2007t, wang2010privacy, narayanan2008robust}:
+In the literature, this compromise is known as \emph{information disclosure} and is usually categorized as~\cite{li2007t, wang2010privacy, narayanan2008robust}:
 \begin{itemize}
  \item \emph{Presence disclosure}---the participation (or absence) of an individual in a data set is revealed.
@ -19,6 +22,34 @@ Identity disclosure appears when we can guess that the sixth record of (a privac
 Attribute disclosure appears when it is revealed from (a privacy-protected version of) the microdata of Table~\ref{tab:snapshot-micro} that Quackmore is $62$ years old.
 \subsection{Attacks to privacy}
 \label{subsec:prv-attacks}
 Information disclosure is typically achieved by combining supplementary (background) knowledge with the released data or by setting unrealistic assumptions while designing the privacy-preserving algorithms.
 In its general form, this is known as \emph{adversarial} or \emph{linkage} attack.
 Even though many works directly refer to the general category of linkage attacks, we distinguish also the following sub-categories, addressed in the literature:
 \paragraph{Sensitive attribute domain} knowledge.
 Here we can identify \emph{homogeneity and skewness} attacks~\cite{machanavajjhala2006diversity,li2007t}, when statistics of the sensitive attribute values are available, and \emph{similarity attack}, when semantics of the sensitive attribute values are available.
 \paragraph{Complementary release} attacks~\cite{sweeney2002k} with regard to previous releases of different versions of the same and/or related data sets.
 In this category, we also identify the \emph{unsorted matching} attack~\cite{sweeney2002k}, which is achieved when two privacy-protected versions of an original data set are published in the same tuple ordering.
 Other instances include: (i)~the \emph{join} attack~\cite{wang2006anonymizing}, when tuples can be identified by joining (on the (quasi-)identifiers) several releases, (ii)~the \emph{tuple correspondence} attack~\cite{fung2008anonymity}, when in case of incremental data certain tuples correspond to certain tuples in other releases, in an injective way, (iii)~the \emph{tuple equivalence} attack~\cite{he2011preventing}, when tuples among different releases are found to be equivalent with respect to the sensitive attribute, and (iv)~the \emph{unknown releases} attack~\cite{shmueli2015privacy}, when the privacy preservation is performed without knowing the previously privacy-protected data sets.
 \paragraph{Data dependence} either within one data set or among one data set and previous data releases, and/or other external sources~\cite{kifer2011no, chen2014correlated, liu2016dependence, zhao2017dependent}.
 We will look into this category in more detail later in Section~\ref{sec:correlation}.
 The first sub-category of attacks has been mainly addressed in works on snapshot microdata publishing, and is still present in continuous publishing; however, algorithms for continuous publishing  typically accept the proposed solutions for the snapshot publishing scheme (see discussion over $k$-anonymity and $l$-diversity in Section~\ref{subsec:prv-seminal}).
 This kind of attacks is tightly coupled with publishing the (privacy-protected) sensitive attribute value.
 An example is the lack of diversity in the sensitive attribute domain, e.g.,~if all users in the data set of Table~\ref{tab:snapshot-micro} shared the same \emph{running} Status  (the sensitive attribute).
 The second and third subcategory are attacks emerging (mostly) in continuous publishing scenarios.
 Consider again the data set in Table~\ref{tab:snapshot-micro}.
 The complementary release attack means that an adversary can learn more things about the individuals (e.g.,~that there are high chances that Donald was at work) if he/she combines the information of two privacy-protected versions of this data set.
 By the data dependence attack, the status of Donald could be more certainly inferred, by taking into account the status of Dewey at the same moment and the dependencies between Donald's and Dewey's status, e.g.,~when Dewey is at home, then most probably Donald is at work.
 In order to better protect the privacy of Donald in case of attacks, the data should be privacy-protected in a more adequate way (than without the attacks).
 \subsection{Levels of privacy protection}
 \label{subsec:prv-levels}
@ -64,33 +95,6 @@ In the extreme cases where $w$ is equal to either $1$ or to the size of the enti
 Although the described levels have been coined in the context of \emph{differential privacy}~\cite{dwork2006calibrating}, a seminal privacy method that we will discuss in more detail in Section~\ref{subsec:prv-statistical}, it is possible to apply their definitions to other privacy protection techniques as well.
 \subsection{Attacks to privacy}
 \label{subsec:prv-attacks}
 Information disclosure is typically achieved by combining supplementary (background) knowledge with the released data or by setting unrealistic assumptions while designing the privacy-preserving algorithms.
 In its general form, this is known as \emph{adversarial} or \emph{linkage} attack.
 Even though many works directly refer to the general category of linkage attacks, we distinguish also the following sub-categories, addressed in the literature:
 \paragraph{Sensitive attribute domain} knowledge.
 Here we can identify \emph{homogeneity and skewness} attacks~\cite{machanavajjhala2006diversity,li2007t}, when statistics of the sensitive attribute values are available, and \emph{similarity attack}, when semantics of the sensitive attribute values are available.
 \paragraph{Complementary release} attacks~\cite{sweeney2002k} with regard to previous releases of different versions of the same and/or related data sets.
 In this category, we also identify the \emph{unsorted matching} attack~\cite{sweeney2002k}, which is achieved when two privacy-protected versions of an original data set are published in the same tuple ordering.
 Other instances include: (i)~the \emph{join} attack~\cite{wang2006anonymizing}, when tuples can be identified by joining (on the (quasi-)identifiers) several releases, (ii)~the \emph{tuple correspondence} attack~\cite{fung2008anonymity}, when in case of incremental data certain tuples correspond to certain tuples in other releases, in an injective way, (iii)~the \emph{tuple equivalence} attack~\cite{he2011preventing}, when tuples among different releases are found to be equivalent with respect to the sensitive attribute, and (iv)~the \emph{unknown releases} attack~\cite{shmueli2015privacy}, when the privacy preservation is performed without knowing the previously privacy-protected data sets.
 \paragraph{Data dependence} either within one data set or among one data set and previous data releases, and/or other external sources~\cite{kifer2011no, chen2014correlated, liu2016dependence, zhao2017dependent}.
 We will look into this category in more detail later in Section~\ref{sec:correlation}.
 The first sub-category of attacks has been mainly addressed in works on snapshot microdata publishing, and is still present in continuous publishing; however, algorithms for continuous publishing  typically accept the proposed solutions for the snapshot publishing scheme (see discussion over $k$-anonymity and $l$-diversity in Section~\ref{subsec:prv-seminal}).
 This kind of attacks is tightly coupled with publishing the (privacy-protected) sensitive attribute value.
 An example is the lack of diversity in the sensitive attribute domain, e.g.,~if all users in the data set of Table~\ref{tab:snapshot-micro} shared the same \emph{running} Status  (the sensitive attribute).
 The second and third subcategory are attacks emerging (mostly) in continuous publishing scenarios.
 Consider again the data set in Table~\ref{tab:snapshot-micro}.
 The complementary release attack means that an adversary can learn more things about the individuals (e.g.,~that there are high chances that Donald was at work) if he/she combines the information of two privacy-protected versions of this data set.
 By the data dependence attack, the status of Donald could be more certainly inferred, by taking into account the status of Dewey at the same moment and the dependencies between Donald's and Dewey's status, e.g.,~when Dewey is at home, then most probably Donald is at work.
 In order to better protect the privacy of Donald in case of attacks, the data should be privacy-protected in a more adequate way (than without the attacks).
 \subsection{Privacy-preserving operations}
 \label{subsec:prv-operations}