From 1da7e99e84eee974ca1b982adf6553b7e829e04c Mon Sep 17 00:00:00 2001 From: Manos Katsomallos Date: Fri, 7 Jan 2022 04:08:20 +0100 Subject: [PATCH] reports: Chatzikokolakis; comments --- tables/preliminaries/scenario-statistical.tex | 2 +- text/preliminaries/data.tex | 12 +++++------ text/preliminaries/privacy.tex | 20 +++++++++---------- 3 files changed, 17 insertions(+), 17 deletions(-) diff --git a/tables/preliminaries/scenario-statistical.tex b/tables/preliminaries/scenario-statistical.tex index ce5ac0e..291cb44 100644 --- a/tables/preliminaries/scenario-statistical.tex +++ b/tables/preliminaries/scenario-statistical.tex @@ -33,6 +33,6 @@ \bottomrule \end{tabular}% }% - \caption{(a)~The original version of the data of Table~\ref{tab:continuous-statistical}, and (b)~their $1$-differentially event-level private version.} + \caption{(a)~The original version of the data of Figure~\ref{tab:continuous-statistical}, and (b)~their $1$-differentially event-level private version.} \label{fig:scenario-statistical} \end{figure} diff --git a/text/preliminaries/data.tex b/text/preliminaries/data.tex index 6f1fcbf..c577f67 100644 --- a/text/preliminaries/data.tex +++ b/text/preliminaries/data.tex @@ -17,12 +17,12 @@ form in: % \kat{Use full sentences, even in the bullets. } % \mk{OK} \begin{itemize} - \item \emph{Microdata} (Table~\ref{tab:snapshot-micro}) are the data items + \item \emph{Microdata} (Figure~\ref{tab:snapshot-micro}) are the data items % \kat{define data item} % \mk{OK} in their raw, usually tabular, form pertaining to individuals. % or objects \kat{objects?}. - \item \emph{Statistical data} (Table~\ref{tab:snapshot-statistical}) are the outcome of statistical processes on microdata, e.g.,~average, count, sum, etc. + \item \emph{Statistical data} (Figure~\ref{tab:snapshot-statistical}) are the outcome of statistical processes on microdata, e.g.,~average, count, sum, etc. \end{itemize} To accompany and facilitate the descriptions in this chapter, we provide Example~\ref{ex:snapshot} as a running example. @@ -30,9 +30,9 @@ To accompany and facilitate the descriptions in this chapter, we provide Example \begin{example} \label{ex:snapshot} Users interact with an LBS by making queries in order to retrieve some useful location-based information or just reporting user-state at various locations. - This user--LBS interaction generates user-related data, organized in a schema with the following attributes: \emph{Name} (the unique identifier of the table), \emph{Age}, \emph{Location}, and \emph{Status} (Table~\ref{tab:snapshot-micro}). + This user--LBS interaction generates user-related data, organized in a schema with the following attributes: \emph{Name} (the unique identifier of the table), \emph{Age}, \emph{Location}, and \emph{Status} (Figure~\ref{tab:snapshot-micro}). The `Status' attribute includes information that characterizes the user state or the query itself, and its value varies according to the service functionality. - Subsequently, the generated data are aggregated (by issuing count queries over them) in order to derive useful information about the popularity of the venues during the day (Table~\ref{tab:snapshot-statistical}). + Subsequently, the generated data are aggregated (by issuing count queries over them) in order to derive useful information about the popularity of the venues during the day (Figure~\ref{tab:snapshot-statistical}). \includetable{preliminaries/snapshot} @@ -40,7 +40,7 @@ To accompany and facilitate the descriptions in this chapter, we provide Example % \kat{I miss the definition of data. You speak of data items, data values, what is the difference to data?} % \mk{Done above} -An example of microdata is displayed in Table~\ref{tab:snapshot-micro}, while an example of statistical data in Table~\ref{tab:snapshot-statistical}. +An example of microdata is displayed in Figure~\ref{tab:snapshot-micro}, while an example of statistical data in Figure~\ref{tab:snapshot-statistical}. Data, in either of these two forms, may have a special property called~\emph{continuity}, i.e.,~their values change and can be observed through time. % \kat{The way that you define it here reminds temporal data. What is the difference?} % \mk{It's the same, we talk about time in data, i.e., temporal data. No?} @@ -60,7 +60,7 @@ Depending on the span of the observation, we categorize data in: Extending Example~\ref{ex:snapshot}, % \kat{Maybe put these three tables in a Figure instead of a table?} % \mk{OK} - Table~\ref{fig:continuous} shows an example of continuous data, + Figure~\ref{fig:continuous} shows an example of continuous data, % observation % \kat{maybe mention explicitly before what is data observation and continuous data observation } % \mk{Did it above} diff --git a/text/preliminaries/privacy.tex b/text/preliminaries/privacy.tex index db9921e..cd641bb 100644 --- a/text/preliminaries/privacy.tex +++ b/text/preliminaries/privacy.tex @@ -25,9 +25,9 @@ In the literature, identity disclosure is also referred to as \emph{record linka Notice that identity disclosure can result in attribute disclosure, and vice versa. To better illustrate these definitions, we provide some examples based on Figure~\ref{fig:snapshot}. -Presence disclosure appears when by looking at the (privacy-protected) counts of Table~\ref{tab:snapshot-statistical}, we can guess if Quackmore has participated in Table~\ref{tab:snapshot-micro}. -Identity disclosure appears when we can guess that the sixth record of (a privacy-protected version of) the microdata of Table~\ref{tab:snapshot-micro} belongs to Quackmore. -Attribute disclosure appears when it is revealed from (a privacy-protected version of) the microdata of Table~\ref{tab:snapshot-micro} that Quackmore is $62$ years old. +Presence disclosure appears when by looking at the (privacy-protected) counts of Figure~\ref{tab:snapshot-statistical}, we can guess if Quackmore has participated in Figure~\ref{tab:snapshot-micro}. +Identity disclosure appears when we can guess that the sixth record of (a privacy-protected version of) the microdata of Figure~\ref{tab:snapshot-micro} belongs to Quackmore. +Attribute disclosure appears when it is revealed from (a privacy-protected version of) the microdata of Figure~\ref{tab:snapshot-micro} that Quackmore is $62$ years old. \subsection{Attacks to privacy} @@ -58,9 +58,9 @@ Even though many works directly refer to the general category of linkage attacks The first sub-category of attacks has been mainly addressed in works on snapshot microdata publishing, but is also present in continuous publishing; however, algorithms for continuous publishing typically accept the proposed solutions for the snapshot publishing scheme (see discussion over $k$-anonymity and $l$-diversity in Section~\ref{subsec:prv-seminal}). This kind of attacks is tightly coupled with publishing the (privacy-protected) sensitive attribute value. -An example is the lack of diversity in the sensitive attribute domain, e.g.,~if all users in the data set of Table~\ref{tab:snapshot-micro} had \emph{running} as their Status (the sensitive attribute). +An example is the lack of diversity in the sensitive attribute domain, e.g.,~if all users in the data set of Figure~\ref{tab:snapshot-micro} had \emph{running} as their Status (the sensitive attribute). The second and third subcategories are attacks emerging (mostly) in continuous publishing scenarios. -Consider again the data set in Table~\ref{tab:snapshot-micro}. +Consider again the data set in Figure~\ref{tab:snapshot-micro}. The complementary release attack means that an adversary can learn more things about the individuals (e.g.,~that there are high chances that Donald was at work) if he/she combines the information of two privacy-protected versions of this data set. By the data dependence attack, the status of Donald could be more certainly inferred, by taking into account the status of Dewey at the same moment and the dependencies between Donald's and Dewey's status, e.g.,~when Dewey is at home, then most probably Donald is at work. In order to better protect the privacy of Donald in case of attacks, the data should be privacy-protected in a more adequate way (than without the attacks). @@ -86,7 +86,7 @@ The possible privacy protection levels are: \item \emph{Event-level}~\cite{dwork2010differential, dwork2010pan} limits the privacy protection to \emph{any single event} in a time series, providing high % \kat{maximum? better say high} data utility. - \item \emph{User-level}~\cite{dwork2010differential, dwork2010pan} protects \emph{all the events} in a time series, providing high + \item \emph{User-level}~\cite{dwork2010differential, dwork2010pan} protects \emph{all the events} in a time series, providing high user privacy. \item \emph{$w$-event-level}~\cite{kellaris2014differentially} provides privacy protection to \emph{any sequence of $w$ events} in a time series. % \kat{maximum? better say high} privacy protection. @@ -112,7 +112,7 @@ Finally, in $2$-event-level (Figure~\ref{fig:level-w-event}) it is hard to deter \subcaptionbox{$2$-event-level\label{fig:level-w-event}}{% \includegraphics[width=.32\linewidth]{preliminaries/level-w-event}% }\hspace{\fill} - \caption{Protecting the data of Table~\ref{tab:continuous-statistical} on (a)~event-, (b)~user-, and (c)~$2$-event-level. A suitable distortion method can be applied accordingly. + \caption{Protecting the data of Figure~\ref{tab:continuous-statistical} on (a)~event-, (b)~user-, and (c)~$2$-event-level. A suitable distortion method can be applied accordingly. % \kat{Why don't you distort the results already in this table?} % \mk{Because we've not discussed yet about these operations.} } @@ -462,7 +462,7 @@ That is, we add up the privacy budgets attributed to the outputs from previous m \includetable{preliminaries/scenario-micro} - First, we anonymize the data set of Table~\ref{tab:continuous-micro} using $k$-anonymity, with $k = 3$. + First, we anonymize the data set of Figure~\ref{tab:continuous-micro} using $k$-anonymity, with $k = 3$. This means that any user should not be distinguished from at least $2$ others. Status is the sensitive attribute, thus the attribute that we wish to protect. We start by suppressing the values of the Name attribute, which is the identifier. @@ -474,9 +474,9 @@ That is, we add up the privacy budgets attributed to the outputs from previous m \includetable{preliminaries/scenario-statistical} Next, we demonstrate differential privacy. - We apply an $\varepsilon$-differentially private Laplace mechanism, with $\varepsilon = 1$, taking into account the count query that generated the true counts of Table~\ref{tab:continuous-statistical}. + We apply an $\varepsilon$-differentially private Laplace mechanism, with $\varepsilon = 1$, taking into account the count query that generated the true counts of Figure~\ref{tab:continuous-statistical}. The sensitivity of a count query is $1$ since the addition/removal of a tuple from the data set can change the final result of the query by maximum $1$ (tuple). Figure~\ref{fig:laplace} shows how the Laplace distribution for the true count in Montmartre at $t_1$ looks like. - Table~\ref{tab:statistical-noisy} shows all the perturbed counts that are going to be released. + Figure~\ref{tab:statistical-noisy} shows all the perturbed counts that are going to be released. \end{example}