Organizing stuff

This commit is contained in:
Manos Katsomallos 2021-10-10 06:05:41 +02:00
parent 2cfa5331fe
commit 3baee4e0f5
22 changed files with 26 additions and 21 deletions

View File

@ -46,7 +46,7 @@ Selecting the wrong privacy algorithm or configuring it poorly may put at risk t
\begin{figure}[htp]
\centering
\includegraphics[width=.5\linewidth]{data-value}
\includegraphics[width=.5\linewidth]{introduction/data-value}
\caption{Value of data for decision-making over time from less than seconds to more than months~\cite{gualtieri2016perishable}.}
\label{fig:data-value}
\end{figure}

View File

@ -31,6 +31,10 @@
\usepackage[normalem]{ulem}
\usepackage[table]{xcolor}
\usepackage{arydshln}
% http://mirror.ox.ac.uk/sites/ctan.org/macros/latex/contrib/algorithm2e/doc/algorithm2e.pdf
% {\fontfamily{texttt}\selectfont function}
% {\fontfamily{textsf}\selectfont data}
\usepackage[ruled,lined,noend,linesnumbered]{algorithm2e}
\newcommand\blankpage{%
\null

View File

@ -22,7 +22,7 @@ To accompany and facilitate the descriptions in this chapter, we provide the fol
The `Status' attribute includes information that characterizes the user's state or the query itself, and its value varies according to the service functionality.
Subsequently, the generated data are aggregated (by issuing count queries over them) in order to derive useful information about the popularity of the venues during the day (Table~\ref{tab:snapshot-statistical}).
\includetable{snapshot}
\includetable{preliminaries/snapshot}
\end{example}
@ -43,7 +43,7 @@ Depending on the span of the observation, we distinguish the following categorie
The two data tables over the time-span $[t_1, t_2]$ are an example of finite data.
Infinite data are the whole series of data obtained over the period~$[t_1, \infty)$ (infinity is denoted by `\dots').
\includetable{continuous}
\includetable{preliminaries/continuous}
\end{example}
@ -69,10 +69,10 @@ We categorize data processing and publishing based on the implemented scheme \ka
\begin{figure}[htp]
\centering
\subcaptionbox{Global scheme\label{fig:scheme-global}}{%
\includegraphics[width=\linewidth]{scheme-global}%
\includegraphics[width=\linewidth]{preliminaries/scheme-global}%
} \\ \bigskip
\subcaptionbox{Local scheme\label{fig:scheme-local}}{%
\includegraphics[width=\linewidth]{scheme-local}%
\includegraphics[width=\linewidth]{preliminaries/scheme-local}%
}
\caption{The usual flow of user-generated data, optionally harvested by data publishers, privacy-protected, and released to data consumers, according to the (a)~global, and (b)~local privacy schemes.}
\label{fig:privacy-schemes}
@ -116,13 +116,13 @@ We identify two main data processing and publishing modes: \kat{but so far you h
\begin{figure}[htp]
\centering
\subcaptionbox{Snapshot mode\label{fig:mode-snapshot}}{%
\includegraphics[width=.4\linewidth]{mode-snapshot}%
\includegraphics[width=.4\linewidth]{preliminaries/mode-snapshot}%
} \\ \bigskip\hspace{\fill}
\subcaptionbox{Batch mode\label{fig:mode-batch}}{%
\includegraphics[width=.4\linewidth]{mode-batch}%
\includegraphics[width=.4\linewidth]{preliminaries/mode-batch}%
}\hspace{\fill}
\subcaptionbox{Streaming mode\label{fig:mode-streaming}}{%
\includegraphics[width=.4\linewidth]{mode-streaming}%
\includegraphics[width=.4\linewidth]{preliminaries/mode-streaming}%
}\hspace{\fill}
\caption{The different data processing and publishing modes of continuously generated data sets.
(a)~Snapshot publishing, (b)~continuous publishing--batch mode, and (c)~continuous publishing--streaming mode.

View File

@ -99,13 +99,13 @@ Finally, in $2$-event-level (Figure~\ref{fig:level-w-event}) it is hard to deter
\begin{figure}[htp]
\centering
\hspace{\fill}\subcaptionbox{Event-level\label{fig:level-event}}{%
\includegraphics[width=.32\linewidth]{level-event}%
\includegraphics[width=.32\linewidth]{preliminaries/level-event}%
}\hspace{\fill}
\subcaptionbox{User-level\label{fig:level-user}}{%
\includegraphics[width=.32\linewidth]{level-user}%
\includegraphics[width=.32\linewidth]{preliminaries/level-user}%
}\hspace{\fill}
\subcaptionbox{$2$-event-level\label{fig:level-w-event}}{%
\includegraphics[width=.32\linewidth]{level-w-event}%
\includegraphics[width=.32\linewidth]{preliminaries/level-w-event}%
}\hspace{\fill}
\caption{Protecting the data of Table~\ref{tab:continuous-statistical} on (a)~event-, (b)~user-, and (c)~$2$-event-level. A suitable distortion method can be applied accordingly.
% \kat{Why don't you distort the results already in this table?}
@ -301,7 +301,7 @@ A specialization of this mechanism for location data is the \emph{Planar Laplace
\begin{figure}[htp]
\centering
\includegraphics[width=.7\linewidth]{laplace}
\includegraphics[width=.7\linewidth]{preliminaries/laplace}
\caption{A Laplace distribution for location $\mu = 2$ and scale $b = 1$.}
\label{fig:laplace}
\end{figure}
@ -436,7 +436,7 @@ Naturally, using the same (or different) privacy mechanism(s) multiple times to
Then, the reported data are collected by the central service, in order to be protected and then published, either as a whole, or as statistics thereof.
Notice that in order to showcase the straightforward application of $k$-anonymity and differential privacy, we apply the two methods on each timestamp independently from the previous one, and do not take into account any additional threats imposed by continuity.
\includetable{scenario-micro}
\includetable{preliminaries/scenario-micro}
First, we anonymize the data set of Table~\ref{tab:continuous-micro} using $k$-anonymity, with $k = 3$.
This means that any user should not be distinguished from at least $2$ others.
@ -447,7 +447,7 @@ Naturally, using the same (or different) privacy mechanism(s) multiple times to
Finally, we achieve $3$-anonymity by putting the entries in groups of three, according to the quasi-identifiers.
Table~\ref{tab:scenario-micro} depicts the results at each timestamp.
\includetable{scenario-statistical}
\includetable{preliminaries/scenario-statistical}
Next, we demonstrate differential privacy.
We apply an $\varepsilon$-differentially private Laplace mechanism, with $\varepsilon = 1$, taking into account the count query that generated the true counts of Table~\ref{tab:continuous-statistical}.

View File

@ -1,9 +1,8 @@
\section{Microdata}
\label{sec:micro}
Table~\ref{tab:micro} summarizes the literature for the Microdata category.
Each reviewed work is abstractly described in this table, by its category (finite or infintite), its publishing mode (batch or streaming) and scheme(global or local), the level of privacy achieved (user, event, w-event), the attacks addressed, the privacy operation applied, and the base method it is built upon.
Each reviewed work is abstractly described in this table, by its category (finite or infinite), its publishing mode (batch or streaming) and scheme(global or local), the level of privacy achieved (user, event, $w$-event), the attacks addressed, the privacy operation applied, and the base method it is built upon.
We observe that privacy-preserving algorithms for microdata rely mostly on $k$-anonymity or derivatives of it.
Ganta et al.~\cite{ganta2008composition} revealed that $k$-anonymity methods are vulnerable to complementary release attacks (or \emph{composition attacks} in the original publication).
Consequently, the research community proposed solutions based on $k$-anonymity, focusing on different threats linked to continuous publication, as we review later on.
@ -16,7 +15,8 @@ to account for the extra privacy loss entailed by them.
\bigskip
We begin the discussion with the works designed for microdata as finite observations (Section~\ref{subsec:micro-finite}), to continue with the infinite observations setting (Section~\ref{subsec:micro-infinite}).
\includetable{micro}
\includetable{related/micro}
\subsection{Finite observation}

View File

@ -2,14 +2,15 @@
\label{sec:statistical}
As in Section~\ref{sec:micro}, we summarize the literature for the Statistical Data category in Table~\ref{tab:statistical}, which we structure identically as Table~\ref{tab:micro}.
For a reminder, each reviewed work is abstractly described in this table, by its category (finite or infintite), its publishing mode (batch or streaming) and scheme(global or local), the level of privacy achieved (user, event, w-event), the attacks addressed, the privacy operation applied, and the base method it is built upon.
For a reminder, each reviewed work is abstractly described in this table, by its category (finite or infinite), its publishing mode (batch or streaming) and scheme(global or local), the level of privacy achieved (user, event, $w$-event), the attacks addressed, the privacy operation applied, and the base method it is built upon.
As witnessed in Table~\ref{tab:statistical}, when continuously publishing statistical data, usually in the form of counts, the most widely used privacy method is differential privacy, or derivatives of it.
In theory differential privacy makes no assumptions about the background knowledge available to the adversary.
In practice, data dependencies (e.g.,~correlations) arising in the continuous publication setting are frequently (but without it being the rule) considered as attacks in the proposed algorithms.
We begin the discussion with the works designed for microdata as finite observations (Section~\ref{subsec:statistical-finite}), to continue with the infinite observations setting (Section~\ref{subsec:statistical-infinite}).
\includetable{statistical}
\includetable{related/statistical}
\subsection{Finite observation}
@ -368,7 +369,7 @@ The combination of the Perturber and the Grouper follows the sequential composit
% - w-event
% - differential privacy
% - perturbation (randomized response, Laplace)
\hypertarget{errounda2018continuous}{Errounda et al.}~\cite{errounda2018continuous} proposed a algorithm for sharing w-event local differentially private statistics over infinite streams of location data.
\hypertarget{errounda2018continuous}{Errounda et al.}~\cite{errounda2018continuous} proposed a algorithm for sharing $w$-event local differentially private statistics over infinite streams of location data.
The decision mechanism determines the similarity between the current data of every individual and the most recent release, with respect to a predefined threshold.
Using the randomized response mechanism, it perturbs the result of this comparison and decides whether to perform an approximation based on the most recent release or calculate and release the current statistics after injecting to them Laplacian noise.
Within the sliding window of size $w$, the privacy budget allocation mechanism estimates the overall privacy budget that the algorithm has allocated at any timestamp and decides how to optimally allocate the remaining budget in the future timestamps.