diff --git a/graphics/data-value.pdf b/graphics/introduction/data-value.pdf similarity index 100% rename from graphics/data-value.pdf rename to graphics/introduction/data-value.pdf diff --git a/graphics/laplace.pdf b/graphics/preliminaries/laplace.pdf similarity index 100% rename from graphics/laplace.pdf rename to graphics/preliminaries/laplace.pdf diff --git a/graphics/level-event.pdf b/graphics/preliminaries/level-event.pdf similarity index 100% rename from graphics/level-event.pdf rename to graphics/preliminaries/level-event.pdf diff --git a/graphics/level-user.pdf b/graphics/preliminaries/level-user.pdf similarity index 100% rename from graphics/level-user.pdf rename to graphics/preliminaries/level-user.pdf diff --git a/graphics/level-w-event.pdf b/graphics/preliminaries/level-w-event.pdf similarity index 100% rename from graphics/level-w-event.pdf rename to graphics/preliminaries/level-w-event.pdf diff --git a/graphics/mode-batch.pdf b/graphics/preliminaries/mode-batch.pdf similarity index 100% rename from graphics/mode-batch.pdf rename to graphics/preliminaries/mode-batch.pdf diff --git a/graphics/mode-snapshot.pdf b/graphics/preliminaries/mode-snapshot.pdf similarity index 100% rename from graphics/mode-snapshot.pdf rename to graphics/preliminaries/mode-snapshot.pdf diff --git a/graphics/mode-streaming.pdf b/graphics/preliminaries/mode-streaming.pdf similarity index 100% rename from graphics/mode-streaming.pdf rename to graphics/preliminaries/mode-streaming.pdf diff --git a/graphics/scheme-global.pdf b/graphics/preliminaries/scheme-global.pdf similarity index 100% rename from graphics/scheme-global.pdf rename to graphics/preliminaries/scheme-global.pdf diff --git a/graphics/scheme-local.pdf b/graphics/preliminaries/scheme-local.pdf similarity index 100% rename from graphics/scheme-local.pdf rename to graphics/preliminaries/scheme-local.pdf diff --git a/tables/continuous.tex b/tables/preliminaries/continuous.tex similarity index 100% rename from tables/continuous.tex rename to tables/preliminaries/continuous.tex diff --git a/tables/scenario-micro.tex b/tables/preliminaries/scenario-micro.tex similarity index 100% rename from tables/scenario-micro.tex rename to tables/preliminaries/scenario-micro.tex diff --git a/tables/scenario-statistical.tex b/tables/preliminaries/scenario-statistical.tex similarity index 100% rename from tables/scenario-statistical.tex rename to tables/preliminaries/scenario-statistical.tex diff --git a/tables/snapshot.tex b/tables/preliminaries/snapshot.tex similarity index 100% rename from tables/snapshot.tex rename to tables/preliminaries/snapshot.tex diff --git a/tables/micro.tex b/tables/related/micro.tex similarity index 100% rename from tables/micro.tex rename to tables/related/micro.tex diff --git a/tables/statistical.tex b/tables/related/statistical.tex similarity index 100% rename from tables/statistical.tex rename to tables/related/statistical.tex diff --git a/text/introduction/main.tex b/text/introduction/main.tex index ec84715..5971fb0 100644 --- a/text/introduction/main.tex +++ b/text/introduction/main.tex @@ -46,7 +46,7 @@ Selecting the wrong privacy algorithm or configuring it poorly may put at risk t \begin{figure}[htp] \centering - \includegraphics[width=.5\linewidth]{data-value} + \includegraphics[width=.5\linewidth]{introduction/data-value} \caption{Value of data for decision-making over time from less than seconds to more than months~\cite{gualtieri2016perishable}.} \label{fig:data-value} \end{figure} diff --git a/text/main.tex b/text/main.tex index 6d761f1..020e3fd 100644 --- a/text/main.tex +++ b/text/main.tex @@ -31,6 +31,10 @@ \usepackage[normalem]{ulem} \usepackage[table]{xcolor} \usepackage{arydshln} +% http://mirror.ox.ac.uk/sites/ctan.org/macros/latex/contrib/algorithm2e/doc/algorithm2e.pdf +% {\fontfamily{texttt}\selectfont function} +% {\fontfamily{textsf}\selectfont data} +\usepackage[ruled,lined,noend,linesnumbered]{algorithm2e} \newcommand\blankpage{% \null diff --git a/text/preliminaries/data.tex b/text/preliminaries/data.tex index c3f2add..6a5b49d 100644 --- a/text/preliminaries/data.tex +++ b/text/preliminaries/data.tex @@ -22,7 +22,7 @@ To accompany and facilitate the descriptions in this chapter, we provide the fol The `Status' attribute includes information that characterizes the user's state or the query itself, and its value varies according to the service functionality. Subsequently, the generated data are aggregated (by issuing count queries over them) in order to derive useful information about the popularity of the venues during the day (Table~\ref{tab:snapshot-statistical}). - \includetable{snapshot} + \includetable{preliminaries/snapshot} \end{example} @@ -43,7 +43,7 @@ Depending on the span of the observation, we distinguish the following categorie The two data tables over the time-span $[t_1, t_2]$ are an example of finite data. Infinite data are the whole series of data obtained over the period~$[t_1, \infty)$ (infinity is denoted by `\dots'). - \includetable{continuous} + \includetable{preliminaries/continuous} \end{example} @@ -69,10 +69,10 @@ We categorize data processing and publishing based on the implemented scheme \ka \begin{figure}[htp] \centering \subcaptionbox{Global scheme\label{fig:scheme-global}}{% - \includegraphics[width=\linewidth]{scheme-global}% + \includegraphics[width=\linewidth]{preliminaries/scheme-global}% } \\ \bigskip \subcaptionbox{Local scheme\label{fig:scheme-local}}{% - \includegraphics[width=\linewidth]{scheme-local}% + \includegraphics[width=\linewidth]{preliminaries/scheme-local}% } \caption{The usual flow of user-generated data, optionally harvested by data publishers, privacy-protected, and released to data consumers, according to the (a)~global, and (b)~local privacy schemes.} \label{fig:privacy-schemes} @@ -116,13 +116,13 @@ We identify two main data processing and publishing modes: \kat{but so far you h \begin{figure}[htp] \centering \subcaptionbox{Snapshot mode\label{fig:mode-snapshot}}{% - \includegraphics[width=.4\linewidth]{mode-snapshot}% + \includegraphics[width=.4\linewidth]{preliminaries/mode-snapshot}% } \\ \bigskip\hspace{\fill} \subcaptionbox{Batch mode\label{fig:mode-batch}}{% - \includegraphics[width=.4\linewidth]{mode-batch}% + \includegraphics[width=.4\linewidth]{preliminaries/mode-batch}% }\hspace{\fill} \subcaptionbox{Streaming mode\label{fig:mode-streaming}}{% - \includegraphics[width=.4\linewidth]{mode-streaming}% + \includegraphics[width=.4\linewidth]{preliminaries/mode-streaming}% }\hspace{\fill} \caption{The different data processing and publishing modes of continuously generated data sets. (a)~Snapshot publishing, (b)~continuous publishing--batch mode, and (c)~continuous publishing--streaming mode. diff --git a/text/preliminaries/privacy.tex b/text/preliminaries/privacy.tex index 9259948..5255920 100644 --- a/text/preliminaries/privacy.tex +++ b/text/preliminaries/privacy.tex @@ -99,13 +99,13 @@ Finally, in $2$-event-level (Figure~\ref{fig:level-w-event}) it is hard to deter \begin{figure}[htp] \centering \hspace{\fill}\subcaptionbox{Event-level\label{fig:level-event}}{% - \includegraphics[width=.32\linewidth]{level-event}% + \includegraphics[width=.32\linewidth]{preliminaries/level-event}% }\hspace{\fill} \subcaptionbox{User-level\label{fig:level-user}}{% - \includegraphics[width=.32\linewidth]{level-user}% + \includegraphics[width=.32\linewidth]{preliminaries/level-user}% }\hspace{\fill} \subcaptionbox{$2$-event-level\label{fig:level-w-event}}{% - \includegraphics[width=.32\linewidth]{level-w-event}% + \includegraphics[width=.32\linewidth]{preliminaries/level-w-event}% }\hspace{\fill} \caption{Protecting the data of Table~\ref{tab:continuous-statistical} on (a)~event-, (b)~user-, and (c)~$2$-event-level. A suitable distortion method can be applied accordingly. % \kat{Why don't you distort the results already in this table?} @@ -301,7 +301,7 @@ A specialization of this mechanism for location data is the \emph{Planar Laplace \begin{figure}[htp] \centering - \includegraphics[width=.7\linewidth]{laplace} + \includegraphics[width=.7\linewidth]{preliminaries/laplace} \caption{A Laplace distribution for location $\mu = 2$ and scale $b = 1$.} \label{fig:laplace} \end{figure} @@ -436,7 +436,7 @@ Naturally, using the same (or different) privacy mechanism(s) multiple times to Then, the reported data are collected by the central service, in order to be protected and then published, either as a whole, or as statistics thereof. Notice that in order to showcase the straightforward application of $k$-anonymity and differential privacy, we apply the two methods on each timestamp independently from the previous one, and do not take into account any additional threats imposed by continuity. - \includetable{scenario-micro} + \includetable{preliminaries/scenario-micro} First, we anonymize the data set of Table~\ref{tab:continuous-micro} using $k$-anonymity, with $k = 3$. This means that any user should not be distinguished from at least $2$ others. @@ -447,7 +447,7 @@ Naturally, using the same (or different) privacy mechanism(s) multiple times to Finally, we achieve $3$-anonymity by putting the entries in groups of three, according to the quasi-identifiers. Table~\ref{tab:scenario-micro} depicts the results at each timestamp. - \includetable{scenario-statistical} + \includetable{preliminaries/scenario-statistical} Next, we demonstrate differential privacy. We apply an $\varepsilon$-differentially private Laplace mechanism, with $\varepsilon = 1$, taking into account the count query that generated the true counts of Table~\ref{tab:continuous-statistical}. diff --git a/text/related/micro.tex b/text/related/micro.tex index 7ffe666..b654c02 100644 --- a/text/related/micro.tex +++ b/text/related/micro.tex @@ -1,9 +1,8 @@ \section{Microdata} \label{sec:micro} - Table~\ref{tab:micro} summarizes the literature for the Microdata category. -Each reviewed work is abstractly described in this table, by its category (finite or infintite), its publishing mode (batch or streaming) and scheme(global or local), the level of privacy achieved (user, event, w-event), the attacks addressed, the privacy operation applied, and the base method it is built upon. +Each reviewed work is abstractly described in this table, by its category (finite or infinite), its publishing mode (batch or streaming) and scheme(global or local), the level of privacy achieved (user, event, $w$-event), the attacks addressed, the privacy operation applied, and the base method it is built upon. We observe that privacy-preserving algorithms for microdata rely mostly on $k$-anonymity or derivatives of it. Ganta et al.~\cite{ganta2008composition} revealed that $k$-anonymity methods are vulnerable to complementary release attacks (or \emph{composition attacks} in the original publication). Consequently, the research community proposed solutions based on $k$-anonymity, focusing on different threats linked to continuous publication, as we review later on. @@ -16,7 +15,8 @@ to account for the extra privacy loss entailed by them. \bigskip We begin the discussion with the works designed for microdata as finite observations (Section~\ref{subsec:micro-finite}), to continue with the infinite observations setting (Section~\ref{subsec:micro-infinite}). -\includetable{micro} + +\includetable{related/micro} \subsection{Finite observation} @@ -460,4 +460,4 @@ Setting privacy to ON, the user obfuscates their original query by randomly send Although this randomization step makes the original query indistinguishable while making sure that the users always get the information that they need, there is no clear quantification of the privacy guarantee that the scheme offers over time. \bigskip -\kat{Add here the comparison/contrast paragraph of microdata techniques shown previously, and your work} \ No newline at end of file +\kat{Add here the comparison/contrast paragraph of microdata techniques shown previously, and your work} diff --git a/text/related/statistical.tex b/text/related/statistical.tex index 4fd13bb..cfe2856 100644 --- a/text/related/statistical.tex +++ b/text/related/statistical.tex @@ -2,14 +2,15 @@ \label{sec:statistical} As in Section~\ref{sec:micro}, we summarize the literature for the Statistical Data category in Table~\ref{tab:statistical}, which we structure identically as Table~\ref{tab:micro}. -For a reminder, each reviewed work is abstractly described in this table, by its category (finite or infintite), its publishing mode (batch or streaming) and scheme(global or local), the level of privacy achieved (user, event, w-event), the attacks addressed, the privacy operation applied, and the base method it is built upon. +For a reminder, each reviewed work is abstractly described in this table, by its category (finite or infinite), its publishing mode (batch or streaming) and scheme(global or local), the level of privacy achieved (user, event, $w$-event), the attacks addressed, the privacy operation applied, and the base method it is built upon. As witnessed in Table~\ref{tab:statistical}, when continuously publishing statistical data, usually in the form of counts, the most widely used privacy method is differential privacy, or derivatives of it. In theory differential privacy makes no assumptions about the background knowledge available to the adversary. In practice, data dependencies (e.g.,~correlations) arising in the continuous publication setting are frequently (but without it being the rule) considered as attacks in the proposed algorithms. We begin the discussion with the works designed for microdata as finite observations (Section~\ref{subsec:statistical-finite}), to continue with the infinite observations setting (Section~\ref{subsec:statistical-infinite}). -\includetable{statistical} + +\includetable{related/statistical} \subsection{Finite observation} @@ -368,7 +369,7 @@ The combination of the Perturber and the Grouper follows the sequential composit % - w-event % - differential privacy % - perturbation (randomized response, Laplace) -\hypertarget{errounda2018continuous}{Errounda et al.}~\cite{errounda2018continuous} proposed a algorithm for sharing w-event local differentially private statistics over infinite streams of location data. +\hypertarget{errounda2018continuous}{Errounda et al.}~\cite{errounda2018continuous} proposed a algorithm for sharing $w$-event local differentially private statistics over infinite streams of location data. The decision mechanism determines the similarity between the current data of every individual and the most recent release, with respect to a predefined threshold. Using the randomized response mechanism, it perturbs the result of this comparison and decides whether to perform an approximation based on the most recent release or calculate and release the current statistics after injecting to them Laplacian noise. Within the sliding window of size $w$, the privacy budget allocation mechanism estimates the overall privacy budget that the algorithm has allocated at any timestamp and decides how to optimally allocate the remaining budget in the future timestamps.