Organizing stuff

This commit is contained in:
2021-10-10 06:05:41 +02:00
parent 2cfa5331fe
commit 3baee4e0f5
22 changed files with 26 additions and 21 deletions

View File

@ -22,7 +22,7 @@ To accompany and facilitate the descriptions in this chapter, we provide the fol
The `Status' attribute includes information that characterizes the user's state or the query itself, and its value varies according to the service functionality.
Subsequently, the generated data are aggregated (by issuing count queries over them) in order to derive useful information about the popularity of the venues during the day (Table~\ref{tab:snapshot-statistical}).
\includetable{snapshot}
\includetable{preliminaries/snapshot}
\end{example}
@ -43,7 +43,7 @@ Depending on the span of the observation, we distinguish the following categorie
The two data tables over the time-span $[t_1, t_2]$ are an example of finite data.
Infinite data are the whole series of data obtained over the period~$[t_1, \infty)$ (infinity is denoted by `\dots').
\includetable{continuous}
\includetable{preliminaries/continuous}
\end{example}
@ -69,10 +69,10 @@ We categorize data processing and publishing based on the implemented scheme \ka
\begin{figure}[htp]
\centering
\subcaptionbox{Global scheme\label{fig:scheme-global}}{%
\includegraphics[width=\linewidth]{scheme-global}%
\includegraphics[width=\linewidth]{preliminaries/scheme-global}%
} \\ \bigskip
\subcaptionbox{Local scheme\label{fig:scheme-local}}{%
\includegraphics[width=\linewidth]{scheme-local}%
\includegraphics[width=\linewidth]{preliminaries/scheme-local}%
}
\caption{The usual flow of user-generated data, optionally harvested by data publishers, privacy-protected, and released to data consumers, according to the (a)~global, and (b)~local privacy schemes.}
\label{fig:privacy-schemes}
@ -116,13 +116,13 @@ We identify two main data processing and publishing modes: \kat{but so far you h
\begin{figure}[htp]
\centering
\subcaptionbox{Snapshot mode\label{fig:mode-snapshot}}{%
\includegraphics[width=.4\linewidth]{mode-snapshot}%
\includegraphics[width=.4\linewidth]{preliminaries/mode-snapshot}%
} \\ \bigskip\hspace{\fill}
\subcaptionbox{Batch mode\label{fig:mode-batch}}{%
\includegraphics[width=.4\linewidth]{mode-batch}%
\includegraphics[width=.4\linewidth]{preliminaries/mode-batch}%
}\hspace{\fill}
\subcaptionbox{Streaming mode\label{fig:mode-streaming}}{%
\includegraphics[width=.4\linewidth]{mode-streaming}%
\includegraphics[width=.4\linewidth]{preliminaries/mode-streaming}%
}\hspace{\fill}
\caption{The different data processing and publishing modes of continuously generated data sets.
(a)~Snapshot publishing, (b)~continuous publishing--batch mode, and (c)~continuous publishing--streaming mode.

View File

@ -99,13 +99,13 @@ Finally, in $2$-event-level (Figure~\ref{fig:level-w-event}) it is hard to deter
\begin{figure}[htp]
\centering
\hspace{\fill}\subcaptionbox{Event-level\label{fig:level-event}}{%
\includegraphics[width=.32\linewidth]{level-event}%
\includegraphics[width=.32\linewidth]{preliminaries/level-event}%
}\hspace{\fill}
\subcaptionbox{User-level\label{fig:level-user}}{%
\includegraphics[width=.32\linewidth]{level-user}%
\includegraphics[width=.32\linewidth]{preliminaries/level-user}%
}\hspace{\fill}
\subcaptionbox{$2$-event-level\label{fig:level-w-event}}{%
\includegraphics[width=.32\linewidth]{level-w-event}%
\includegraphics[width=.32\linewidth]{preliminaries/level-w-event}%
}\hspace{\fill}
\caption{Protecting the data of Table~\ref{tab:continuous-statistical} on (a)~event-, (b)~user-, and (c)~$2$-event-level. A suitable distortion method can be applied accordingly.
% \kat{Why don't you distort the results already in this table?}
@ -301,7 +301,7 @@ A specialization of this mechanism for location data is the \emph{Planar Laplace
\begin{figure}[htp]
\centering
\includegraphics[width=.7\linewidth]{laplace}
\includegraphics[width=.7\linewidth]{preliminaries/laplace}
\caption{A Laplace distribution for location $\mu = 2$ and scale $b = 1$.}
\label{fig:laplace}
\end{figure}
@ -436,7 +436,7 @@ Naturally, using the same (or different) privacy mechanism(s) multiple times to
Then, the reported data are collected by the central service, in order to be protected and then published, either as a whole, or as statistics thereof.
Notice that in order to showcase the straightforward application of $k$-anonymity and differential privacy, we apply the two methods on each timestamp independently from the previous one, and do not take into account any additional threats imposed by continuity.
\includetable{scenario-micro}
\includetable{preliminaries/scenario-micro}
First, we anonymize the data set of Table~\ref{tab:continuous-micro} using $k$-anonymity, with $k = 3$.
This means that any user should not be distinguished from at least $2$ others.
@ -447,7 +447,7 @@ Naturally, using the same (or different) privacy mechanism(s) multiple times to
Finally, we achieve $3$-anonymity by putting the entries in groups of three, according to the quasi-identifiers.
Table~\ref{tab:scenario-micro} depicts the results at each timestamp.
\includetable{scenario-statistical}
\includetable{preliminaries/scenario-statistical}
Next, we demonstrate differential privacy.
We apply an $\varepsilon$-differentially private Laplace mechanism, with $\varepsilon = 1$, taking into account the count query that generated the true counts of Table~\ref{tab:continuous-statistical}.