diff --git a/tables/continuous.tex b/tables/continuous.tex new file mode 100644 index 0000000..eaa9f47 --- /dev/null +++ b/tables/continuous.tex @@ -0,0 +1,52 @@ +\begin{table} + \centering + \subcaptionbox{Microdata\label{tab:continuous-micro}}{% + \adjustbox{max width=\linewidth}{% + \begin{tabular}{@{}ccc@{}} + \begin{tabular}{@{}lrll@{}} + \toprule + \textit{Name} & \multicolumn{1}{c}{Age} & Location & Status \\ + \midrule + Donald & $27$ & Le Marais & at work \\ + Daisy & $25$ & Belleville & driving \\ + Huey & $12$ & Montmartre & running \\ + Dewey & $11$ & Montmartre & at home \\ + Louie & $10$ & Latin Quarter & walking \\ + Quackmore & $62$ & Opera & dining \\ + \bottomrule + \multicolumn{4}{c}{$t_1$} \\ + \end{tabular} & + \begin{tabular}{@{}lrll@{}} + \toprule + \textit{Name} & \multicolumn{1}{c}{Age} & Location & Status \\ + \midrule + Donald & $27$ & Montmartre & driving \\ + Daisy & $25$ & Montmartre & at the mall \\ + Huey & $12$ & Latin Quarter & sightseeing \\ + Dewey & $11$ & Opera & walking \\ + Louie & $10$ & Latin Quarter & at home \\ + Quackmore & $62$ & Montmartre & biking \\ + \bottomrule + \multicolumn{4}{c}{$t_2$} \\ + \end{tabular} & + \dots + \end{tabular}% + }% + } \\ \bigskip + \subcaptionbox{Statistical data\label{tab:continuous-statistical}}{% + \begin{tabular}{@{}lrrr@{}} + \toprule + \multirow{2}{*}{Location} & \multicolumn{3}{c@{}}{Count}\\ + & \multicolumn{1}{c}{$t_1$} & \multicolumn{1}{c}{$t_2$} & \dots \\ + \midrule + Belleville & $1$ & $0$ & \dots \\ + Latin Quarter & $1$ & $2$ & \dots \\ + Le Marais & $1$ & $0$ & \dots \\ + Montmartre & $2$ & $3$ & \dots \\ + Opera & $1$ & $1$ & \dots \\ + \bottomrule + \end{tabular}% + }% + \caption{Continuous data observation of (a)~microdata, and corresponding (b)~statistics at multiple timestamps.} + \label{tab:continuous} +\end{table} diff --git a/text/preliminaries/data.tex b/text/preliminaries/data.tex index 45d4a05..06feaf2 100644 --- a/text/preliminaries/data.tex +++ b/text/preliminaries/data.tex @@ -5,7 +5,7 @@ \subsection{Categories} \label{subsec:data-categories} -As this survey is about privacy, the data that we are interested in, contain information about individuals and their actions. +The data that we are interested in, contain information about individuals and their actions. We firstly classify the data based on their content: \begin{itemize} @@ -28,58 +28,8 @@ Depending on the span of observation, we distinguish the following categories: The two data tables, over the time-span $[t_1, t_2]$ are an example of finite data. Infinite data are the whole series of data obtained over the period~$[t_1, \infty)$ (infinity is denoted by `\dots'). - \begin{table} - \centering - \subcaptionbox{Microdata\label{tab:continuous-micro}}{% - \adjustbox{max width=\linewidth}{% - \begin{tabular}{@{}ccc@{}} - \begin{tabular}{@{}lrll@{}} - \toprule - \textit{Name} & \multicolumn{1}{c}{Age} & Location & Status \\ - \midrule - Donald & $27$ & Le Marais & at work \\ - Daisy & $25$ & Belleville & driving \\ - Huey & $12$ & Montmartre & running \\ - Dewey & $11$ & Montmartre & at home \\ - Louie & $10$ & Latin Quarter & walking \\ - Quackmore & $62$ & Opera & dining \\ - \bottomrule - \multicolumn{4}{c}{$t_1$} \\ - \end{tabular} & - \begin{tabular}{@{}lrll@{}} - \toprule - \textit{Name} & \multicolumn{1}{c}{Age} & Location & Status \\ - \midrule - Donald & $27$ & Montmartre & driving \\ - Daisy & $25$ & Montmartre & at the mall \\ - Huey & $12$ & Latin Quarter & sightseeing \\ - Dewey & $11$ & Opera & walking \\ - Louie & $10$ & Latin Quarter & at home \\ - Quackmore & $62$ & Montmartre & biking \\ - \bottomrule - \multicolumn{4}{c}{$t_2$} \\ - \end{tabular} & - \dots - \end{tabular}% - }% - } \\ \bigskip - \subcaptionbox{Statistical data\label{tab:continuous-statistical}}{% - \begin{tabular}{@{}lrrr@{}} - \toprule - \multirow{2}{*}{Location} & \multicolumn{3}{c@{}}{Count}\\ - & \multicolumn{1}{c}{$t_1$} & \multicolumn{1}{c}{$t_2$} & \dots \\ - \midrule - Belleville & $1$ & $0$ & \dots \\ - Latin Quarter & $1$ & $2$ & \dots \\ - Le Marais & $1$ & $0$ & \dots \\ - Montmartre & $2$ & $3$ & \dots \\ - Opera & $1$ & $1$ & \dots \\ - \bottomrule - \end{tabular}% - }% - \caption{Continuous data observation of (a)~microdata, and corresponding (b)~statistics at multiple timestamps.} - \label{tab:continuous} - \end{table} + \includetable{continuous} + \end{example} We further define two sub-categories applicable to both finite and infinite data: \emph{sequential} and \emph{incremental} data; these two subcategories are not exhaustive, i.e.,~not all data sets belong to the one or the other category. @@ -125,7 +75,7 @@ Nonetheless, data distortion at an early stage might prove detrimental to the ov The so far consensus is that there is no overall optimal solution among the two designs. Most service-providing companies prefer the global scheme, mainly for reasons of better management and control over the data, while several privacy advocates support the local privacy scheme that offers users full control over what and how data are published. Although there have been attempts to bridge the gap between them, e.g.,~\cite{bittau2017prochlo}, the global scheme is considerably better explored and implemented~\cite{satyanarayanan2017emergence}. -For this reason, most of the works in this survey span this context. +For this reason, most of the works in our work span this context. We distinguish between two publishing modes for private data: \emph{snapshot} and \emph{continuous}. In snapshot publishing (also appearing as \emph{one-shot} or \emph{one-off} publishing), the system processes and releases a data set at a specific point in time and thereafter is not concerned anymore with the specific data set. @@ -133,7 +83,7 @@ For example, in Figure~\ref{fig:mode-snapshot} (ignore the privacy-preserving st In continuous data publishing the system computes, and publishes augmented or updated versions of one data set in different time points, and without a predefined duration. In the context of privacy-preserving data publishing, privacy preservation is tightly coupled with the data processing and publishing stages. -As already discussed in Section~\ref{ch:intro}, in this survey we are studying the continuous data publishing mode, and thus we do not include works considering the snapshot paradigm. +As already discussed in Section~\ref{ch:intro}, in this work we are studying the continuous data publishing mode, and thus we do not include works considering the snapshot paradigm. We make this deliberate choice as privacy-preserving continuous data publishing is a more complex problem, receiving more and more attention from the scientific community in the recent years, as shown by the increasing number of publications in this area. Moreover, the use cases of continuous data publishing abound, with the proliferation of the Internet, sensors, and connected devices, which produce and send to servers huge amounts of continuous personal data in astounding speed.