privacy: Intro and minor corrections

This commit is contained in:
Manos Katsomallos 2021-10-24 13:23:19 +02:00
parent c58a5a99bd
commit e3a742d95c

View File

@ -1,20 +1,24 @@
\section{Data privacy} \section{Data privacy}
\label{sec:privacy} \label{sec:privacy}
In this section we first study the notion of information disclosure and focus on the privacy attacks that can lead to it.
Furthermore, we investigate the possible privacy protection levels in continuous data publishing.
Finally, we identify the most common privacy operations and the seminal works for privacy-preserving data publishing.
\subsection{Information disclosure} \subsection{Information disclosure}
\label{subsec:prv-info-dscl} \label{subsec:prv-info-dscl}
When personal data are publicly released, either as microdata or statistical data, individuals' privacy can be compromised, i.e,~an adversary becomes certain about an individual's \emph{sensitive attribute}, i.e.,~personal information, with a probability higher than a desired threshold. When personal data are publicly released, either as microdata or statistical data, individuals' privacy can be compromised, i.e,~an adversary becomes certain about an individual's \emph{sensitive attribute}, i.e.,~personal information, with a probability higher than a desired threshold.
In the literature, this incident In the literature, this incident
% compromise % compromise
% \kat{do you want to say 'peril', 'risk' instead of compromise ?} % \kat{do you want to say 'peril', 'risk' instead of compromise ?}
% \mk{No, it's more about the result, i.e., compromising privacy, rather than the risk.} % \mk{No, it's more about the result, i.e., compromising privacy, rather than the risk.}
is known as \emph{information disclosure} and is usually categorized as (\cite{li2007t, wang2010privacy, narayanan2008robust}): is known as \emph{information disclosure} and is usually categorized~\cite{li2007t, wang2010privacy, narayanan2008robust} as:
% \emph{presence}, \emph{identity}, or \emph{attribute} disclosure.
\begin{itemize} \begin{itemize}
\item \emph{Presence disclosure}---the participation or absence of an individual in a data set is revealed. \item \emph{Presence disclosure} takes place when the participation or absence of an individual in a data set is revealed.
\item \emph{Identity disclosure}---an individual is linked to a particular record. \item \emph{Identity disclosure} links an individual to a particular record.
\item \emph{Attribute disclosure}---new information (attribute value) about an individual is revealed. \item \emph{Attribute disclosure} reveals information (attribute value) about an individual.
\end{itemize} \end{itemize}
In the literature, identity disclosure is also referred to as \emph{record linkage}, and presence disclosure as \emph{table linkage}. In the literature, identity disclosure is also referred to as \emph{record linkage}, and presence disclosure as \emph{table linkage}.
@ -28,7 +32,6 @@ Attribute disclosure appears when it is revealed from (a privacy-protected versi
\subsection{Attacks to privacy} \subsection{Attacks to privacy}
\label{subsec:prv-attacks} \label{subsec:prv-attacks}
Information disclosure is typically achieved by combining supplementary (background) knowledge with the released data or by setting unrealistic assumptions while designing the privacy-preserving algorithms. Information disclosure is typically achieved by combining supplementary (background) knowledge with the released data or by setting unrealistic assumptions while designing the privacy-preserving algorithms.
In its general form, this is known as \emph{adversarial} or \emph{linkage} attack. In its general form, this is known as \emph{adversarial} or \emph{linkage} attack.
Even though many works directly refer to the general category of linkage attacks, we distinguish also the following sub-categories: Even though many works directly refer to the general category of linkage attacks, we distinguish also the following sub-categories:
@ -76,7 +79,8 @@ Data publishers typically release events in the form of sequences of data items,
We use the term `users' to refer to the \emph{individuals}, also known as \emph{participants}, who are the source of the processed and published data. We use the term `users' to refer to the \emph{individuals}, also known as \emph{participants}, who are the source of the processed and published data.
Therefore, they should not be confused with the consumers of the released data sets. Therefore, they should not be confused with the consumers of the released data sets.
Users are subject to privacy attacks, and thus are the main point of interest of privacy protection mechanisms. Users are subject to privacy attacks, and thus are the main point of interest of privacy protection mechanisms.
The possible privacy protection levels are the \emph{event}~\cite{dwork2010differential, dwork2010pan}, \emph{user}~\cite{dwork2010differential, dwork2010pan}, and \emph{$w$-event}~\cite{kellaris2014differentially}. The possible privacy protection levels are:
% the \emph{event}~\cite{dwork2010differential, dwork2010pan}, \emph{user}~\cite{dwork2010differential, dwork2010pan}, and \emph{$w$-event}~\cite{kellaris2014differentially}.
\begin{enumerate}[(a)] \begin{enumerate}[(a)]
\item \emph{Event-level}~\cite{dwork2010differential, dwork2010pan} limits the privacy protection to \emph{any single event} in a time series, providing high \item \emph{Event-level}~\cite{dwork2010differential, dwork2010pan} limits the privacy protection to \emph{any single event} in a time series, providing high