reports: Chatzikokolakis; comments

This commit is contained in:
Manos Katsomallos 2022-01-07 04:08:20 +01:00
parent a3cf9bf94e
commit 1da7e99e84
3 changed files with 17 additions and 17 deletions

View File

@ -33,6 +33,6 @@
\bottomrule \bottomrule
\end{tabular}% \end{tabular}%
}% }%
\caption{(a)~The original version of the data of Table~\ref{tab:continuous-statistical}, and (b)~their $1$-differentially event-level private version.} \caption{(a)~The original version of the data of Figure~\ref{tab:continuous-statistical}, and (b)~their $1$-differentially event-level private version.}
\label{fig:scenario-statistical} \label{fig:scenario-statistical}
\end{figure} \end{figure}

View File

@ -17,12 +17,12 @@ form in:
% \kat{Use full sentences, even in the bullets. } % \kat{Use full sentences, even in the bullets. }
% \mk{OK} % \mk{OK}
\begin{itemize} \begin{itemize}
\item \emph{Microdata} (Table~\ref{tab:snapshot-micro}) are the data items \item \emph{Microdata} (Figure~\ref{tab:snapshot-micro}) are the data items
% \kat{define data item} % \kat{define data item}
% \mk{OK} % \mk{OK}
in their raw, usually tabular, form pertaining to individuals. in their raw, usually tabular, form pertaining to individuals.
% or objects \kat{objects?}. % or objects \kat{objects?}.
\item \emph{Statistical data} (Table~\ref{tab:snapshot-statistical}) are the outcome of statistical processes on microdata, e.g.,~average, count, sum, etc. \item \emph{Statistical data} (Figure~\ref{tab:snapshot-statistical}) are the outcome of statistical processes on microdata, e.g.,~average, count, sum, etc.
\end{itemize} \end{itemize}
To accompany and facilitate the descriptions in this chapter, we provide Example~\ref{ex:snapshot} as a running example. To accompany and facilitate the descriptions in this chapter, we provide Example~\ref{ex:snapshot} as a running example.
@ -30,9 +30,9 @@ To accompany and facilitate the descriptions in this chapter, we provide Example
\begin{example} \begin{example}
\label{ex:snapshot} \label{ex:snapshot}
Users interact with an LBS by making queries in order to retrieve some useful location-based information or just reporting user-state at various locations. Users interact with an LBS by making queries in order to retrieve some useful location-based information or just reporting user-state at various locations.
This user--LBS interaction generates user-related data, organized in a schema with the following attributes: \emph{Name} (the unique identifier of the table), \emph{Age}, \emph{Location}, and \emph{Status} (Table~\ref{tab:snapshot-micro}). This user--LBS interaction generates user-related data, organized in a schema with the following attributes: \emph{Name} (the unique identifier of the table), \emph{Age}, \emph{Location}, and \emph{Status} (Figure~\ref{tab:snapshot-micro}).
The `Status' attribute includes information that characterizes the user state or the query itself, and its value varies according to the service functionality. The `Status' attribute includes information that characterizes the user state or the query itself, and its value varies according to the service functionality.
Subsequently, the generated data are aggregated (by issuing count queries over them) in order to derive useful information about the popularity of the venues during the day (Table~\ref{tab:snapshot-statistical}). Subsequently, the generated data are aggregated (by issuing count queries over them) in order to derive useful information about the popularity of the venues during the day (Figure~\ref{tab:snapshot-statistical}).
\includetable{preliminaries/snapshot} \includetable{preliminaries/snapshot}
@ -40,7 +40,7 @@ To accompany and facilitate the descriptions in this chapter, we provide Example
% \kat{I miss the definition of data. You speak of data items, data values, what is the difference to data?} % \kat{I miss the definition of data. You speak of data items, data values, what is the difference to data?}
% \mk{Done above} % \mk{Done above}
An example of microdata is displayed in Table~\ref{tab:snapshot-micro}, while an example of statistical data in Table~\ref{tab:snapshot-statistical}. An example of microdata is displayed in Figure~\ref{tab:snapshot-micro}, while an example of statistical data in Figure~\ref{tab:snapshot-statistical}.
Data, in either of these two forms, may have a special property called~\emph{continuity}, i.e.,~their values change and can be observed through time. Data, in either of these two forms, may have a special property called~\emph{continuity}, i.e.,~their values change and can be observed through time.
% \kat{The way that you define it here reminds temporal data. What is the difference?} % \kat{The way that you define it here reminds temporal data. What is the difference?}
% \mk{It's the same, we talk about time in data, i.e., temporal data. No?} % \mk{It's the same, we talk about time in data, i.e., temporal data. No?}
@ -60,7 +60,7 @@ Depending on the span of the observation, we categorize data in:
Extending Example~\ref{ex:snapshot}, Extending Example~\ref{ex:snapshot},
% \kat{Maybe put these three tables in a Figure instead of a table?} % \kat{Maybe put these three tables in a Figure instead of a table?}
% \mk{OK} % \mk{OK}
Table~\ref{fig:continuous} shows an example of continuous data, Figure~\ref{fig:continuous} shows an example of continuous data,
% observation % observation
% \kat{maybe mention explicitly before what is data observation and continuous data observation } % \kat{maybe mention explicitly before what is data observation and continuous data observation }
% \mk{Did it above} % \mk{Did it above}

View File

@ -25,9 +25,9 @@ In the literature, identity disclosure is also referred to as \emph{record linka
Notice that identity disclosure can result in attribute disclosure, and vice versa. Notice that identity disclosure can result in attribute disclosure, and vice versa.
To better illustrate these definitions, we provide some examples based on Figure~\ref{fig:snapshot}. To better illustrate these definitions, we provide some examples based on Figure~\ref{fig:snapshot}.
Presence disclosure appears when by looking at the (privacy-protected) counts of Table~\ref{tab:snapshot-statistical}, we can guess if Quackmore has participated in Table~\ref{tab:snapshot-micro}. Presence disclosure appears when by looking at the (privacy-protected) counts of Figure~\ref{tab:snapshot-statistical}, we can guess if Quackmore has participated in Figure~\ref{tab:snapshot-micro}.
Identity disclosure appears when we can guess that the sixth record of (a privacy-protected version of) the microdata of Table~\ref{tab:snapshot-micro} belongs to Quackmore. Identity disclosure appears when we can guess that the sixth record of (a privacy-protected version of) the microdata of Figure~\ref{tab:snapshot-micro} belongs to Quackmore.
Attribute disclosure appears when it is revealed from (a privacy-protected version of) the microdata of Table~\ref{tab:snapshot-micro} that Quackmore is $62$ years old. Attribute disclosure appears when it is revealed from (a privacy-protected version of) the microdata of Figure~\ref{tab:snapshot-micro} that Quackmore is $62$ years old.
\subsection{Attacks to privacy} \subsection{Attacks to privacy}
@ -58,9 +58,9 @@ Even though many works directly refer to the general category of linkage attacks
The first sub-category of attacks has been mainly addressed in works on snapshot microdata publishing, but is also present in continuous publishing; however, algorithms for continuous publishing typically accept the proposed solutions for the snapshot publishing scheme (see discussion over $k$-anonymity and $l$-diversity in Section~\ref{subsec:prv-seminal}). The first sub-category of attacks has been mainly addressed in works on snapshot microdata publishing, but is also present in continuous publishing; however, algorithms for continuous publishing typically accept the proposed solutions for the snapshot publishing scheme (see discussion over $k$-anonymity and $l$-diversity in Section~\ref{subsec:prv-seminal}).
This kind of attacks is tightly coupled with publishing the (privacy-protected) sensitive attribute value. This kind of attacks is tightly coupled with publishing the (privacy-protected) sensitive attribute value.
An example is the lack of diversity in the sensitive attribute domain, e.g.,~if all users in the data set of Table~\ref{tab:snapshot-micro} had \emph{running} as their Status (the sensitive attribute). An example is the lack of diversity in the sensitive attribute domain, e.g.,~if all users in the data set of Figure~\ref{tab:snapshot-micro} had \emph{running} as their Status (the sensitive attribute).
The second and third subcategories are attacks emerging (mostly) in continuous publishing scenarios. The second and third subcategories are attacks emerging (mostly) in continuous publishing scenarios.
Consider again the data set in Table~\ref{tab:snapshot-micro}. Consider again the data set in Figure~\ref{tab:snapshot-micro}.
The complementary release attack means that an adversary can learn more things about the individuals (e.g.,~that there are high chances that Donald was at work) if he/she combines the information of two privacy-protected versions of this data set. The complementary release attack means that an adversary can learn more things about the individuals (e.g.,~that there are high chances that Donald was at work) if he/she combines the information of two privacy-protected versions of this data set.
By the data dependence attack, the status of Donald could be more certainly inferred, by taking into account the status of Dewey at the same moment and the dependencies between Donald's and Dewey's status, e.g.,~when Dewey is at home, then most probably Donald is at work. By the data dependence attack, the status of Donald could be more certainly inferred, by taking into account the status of Dewey at the same moment and the dependencies between Donald's and Dewey's status, e.g.,~when Dewey is at home, then most probably Donald is at work.
In order to better protect the privacy of Donald in case of attacks, the data should be privacy-protected in a more adequate way (than without the attacks). In order to better protect the privacy of Donald in case of attacks, the data should be privacy-protected in a more adequate way (than without the attacks).
@ -86,7 +86,7 @@ The possible privacy protection levels are:
\item \emph{Event-level}~\cite{dwork2010differential, dwork2010pan} limits the privacy protection to \emph{any single event} in a time series, providing high \item \emph{Event-level}~\cite{dwork2010differential, dwork2010pan} limits the privacy protection to \emph{any single event} in a time series, providing high
% \kat{maximum? better say high} % \kat{maximum? better say high}
data utility. data utility.
\item \emph{User-level}~\cite{dwork2010differential, dwork2010pan} protects \emph{all the events} in a time series, providing high \item \emph{User-level}~\cite{dwork2010differential, dwork2010pan} protects \emph{all the events} in a time series, providing high user privacy.
\item \emph{$w$-event-level}~\cite{kellaris2014differentially} provides privacy protection to \emph{any sequence of $w$ events} in a time series. \item \emph{$w$-event-level}~\cite{kellaris2014differentially} provides privacy protection to \emph{any sequence of $w$ events} in a time series.
% \kat{maximum? better say high} % \kat{maximum? better say high}
privacy protection. privacy protection.
@ -112,7 +112,7 @@ Finally, in $2$-event-level (Figure~\ref{fig:level-w-event}) it is hard to deter
\subcaptionbox{$2$-event-level\label{fig:level-w-event}}{% \subcaptionbox{$2$-event-level\label{fig:level-w-event}}{%
\includegraphics[width=.32\linewidth]{preliminaries/level-w-event}% \includegraphics[width=.32\linewidth]{preliminaries/level-w-event}%
}\hspace{\fill} }\hspace{\fill}
\caption{Protecting the data of Table~\ref{tab:continuous-statistical} on (a)~event-, (b)~user-, and (c)~$2$-event-level. A suitable distortion method can be applied accordingly. \caption{Protecting the data of Figure~\ref{tab:continuous-statistical} on (a)~event-, (b)~user-, and (c)~$2$-event-level. A suitable distortion method can be applied accordingly.
% \kat{Why don't you distort the results already in this table?} % \kat{Why don't you distort the results already in this table?}
% \mk{Because we've not discussed yet about these operations.} % \mk{Because we've not discussed yet about these operations.}
} }
@ -462,7 +462,7 @@ That is, we add up the privacy budgets attributed to the outputs from previous m
\includetable{preliminaries/scenario-micro} \includetable{preliminaries/scenario-micro}
First, we anonymize the data set of Table~\ref{tab:continuous-micro} using $k$-anonymity, with $k = 3$. First, we anonymize the data set of Figure~\ref{tab:continuous-micro} using $k$-anonymity, with $k = 3$.
This means that any user should not be distinguished from at least $2$ others. This means that any user should not be distinguished from at least $2$ others.
Status is the sensitive attribute, thus the attribute that we wish to protect. Status is the sensitive attribute, thus the attribute that we wish to protect.
We start by suppressing the values of the Name attribute, which is the identifier. We start by suppressing the values of the Name attribute, which is the identifier.
@ -474,9 +474,9 @@ That is, we add up the privacy budgets attributed to the outputs from previous m
\includetable{preliminaries/scenario-statistical} \includetable{preliminaries/scenario-statistical}
Next, we demonstrate differential privacy. Next, we demonstrate differential privacy.
We apply an $\varepsilon$-differentially private Laplace mechanism, with $\varepsilon = 1$, taking into account the count query that generated the true counts of Table~\ref{tab:continuous-statistical}. We apply an $\varepsilon$-differentially private Laplace mechanism, with $\varepsilon = 1$, taking into account the count query that generated the true counts of Figure~\ref{tab:continuous-statistical}.
The sensitivity of a count query is $1$ since the addition/removal of a tuple from the data set can change the final result of the query by maximum $1$ (tuple). The sensitivity of a count query is $1$ since the addition/removal of a tuple from the data set can change the final result of the query by maximum $1$ (tuple).
Figure~\ref{fig:laplace} shows how the Laplace distribution for the true count in Montmartre at $t_1$ looks like. Figure~\ref{fig:laplace} shows how the Laplace distribution for the true count in Montmartre at $t_1$ looks like.
Table~\ref{tab:statistical-noisy} shows all the perturbed counts that are going to be released. Figure~\ref{tab:statistical-noisy} shows all the perturbed counts that are going to be released.
\end{example} \end{example}