Organizing stuff
This commit is contained in:
parent
2cfa5331fe
commit
3baee4e0f5
@ -46,7 +46,7 @@ Selecting the wrong privacy algorithm or configuring it poorly may put at risk t
|
|||||||
|
|
||||||
\begin{figure}[htp]
|
\begin{figure}[htp]
|
||||||
\centering
|
\centering
|
||||||
\includegraphics[width=.5\linewidth]{data-value}
|
\includegraphics[width=.5\linewidth]{introduction/data-value}
|
||||||
\caption{Value of data for decision-making over time from less than seconds to more than months~\cite{gualtieri2016perishable}.}
|
\caption{Value of data for decision-making over time from less than seconds to more than months~\cite{gualtieri2016perishable}.}
|
||||||
\label{fig:data-value}
|
\label{fig:data-value}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
@ -31,6 +31,10 @@
|
|||||||
\usepackage[normalem]{ulem}
|
\usepackage[normalem]{ulem}
|
||||||
\usepackage[table]{xcolor}
|
\usepackage[table]{xcolor}
|
||||||
\usepackage{arydshln}
|
\usepackage{arydshln}
|
||||||
|
% http://mirror.ox.ac.uk/sites/ctan.org/macros/latex/contrib/algorithm2e/doc/algorithm2e.pdf
|
||||||
|
% {\fontfamily{texttt}\selectfont function}
|
||||||
|
% {\fontfamily{textsf}\selectfont data}
|
||||||
|
\usepackage[ruled,lined,noend,linesnumbered]{algorithm2e}
|
||||||
|
|
||||||
\newcommand\blankpage{%
|
\newcommand\blankpage{%
|
||||||
\null
|
\null
|
||||||
|
@ -22,7 +22,7 @@ To accompany and facilitate the descriptions in this chapter, we provide the fol
|
|||||||
The `Status' attribute includes information that characterizes the user's state or the query itself, and its value varies according to the service functionality.
|
The `Status' attribute includes information that characterizes the user's state or the query itself, and its value varies according to the service functionality.
|
||||||
Subsequently, the generated data are aggregated (by issuing count queries over them) in order to derive useful information about the popularity of the venues during the day (Table~\ref{tab:snapshot-statistical}).
|
Subsequently, the generated data are aggregated (by issuing count queries over them) in order to derive useful information about the popularity of the venues during the day (Table~\ref{tab:snapshot-statistical}).
|
||||||
|
|
||||||
\includetable{snapshot}
|
\includetable{preliminaries/snapshot}
|
||||||
|
|
||||||
\end{example}
|
\end{example}
|
||||||
|
|
||||||
@ -43,7 +43,7 @@ Depending on the span of the observation, we distinguish the following categorie
|
|||||||
The two data tables over the time-span $[t_1, t_2]$ are an example of finite data.
|
The two data tables over the time-span $[t_1, t_2]$ are an example of finite data.
|
||||||
Infinite data are the whole series of data obtained over the period~$[t_1, \infty)$ (infinity is denoted by `\dots').
|
Infinite data are the whole series of data obtained over the period~$[t_1, \infty)$ (infinity is denoted by `\dots').
|
||||||
|
|
||||||
\includetable{continuous}
|
\includetable{preliminaries/continuous}
|
||||||
|
|
||||||
\end{example}
|
\end{example}
|
||||||
|
|
||||||
@ -69,10 +69,10 @@ We categorize data processing and publishing based on the implemented scheme \ka
|
|||||||
\begin{figure}[htp]
|
\begin{figure}[htp]
|
||||||
\centering
|
\centering
|
||||||
\subcaptionbox{Global scheme\label{fig:scheme-global}}{%
|
\subcaptionbox{Global scheme\label{fig:scheme-global}}{%
|
||||||
\includegraphics[width=\linewidth]{scheme-global}%
|
\includegraphics[width=\linewidth]{preliminaries/scheme-global}%
|
||||||
} \\ \bigskip
|
} \\ \bigskip
|
||||||
\subcaptionbox{Local scheme\label{fig:scheme-local}}{%
|
\subcaptionbox{Local scheme\label{fig:scheme-local}}{%
|
||||||
\includegraphics[width=\linewidth]{scheme-local}%
|
\includegraphics[width=\linewidth]{preliminaries/scheme-local}%
|
||||||
}
|
}
|
||||||
\caption{The usual flow of user-generated data, optionally harvested by data publishers, privacy-protected, and released to data consumers, according to the (a)~global, and (b)~local privacy schemes.}
|
\caption{The usual flow of user-generated data, optionally harvested by data publishers, privacy-protected, and released to data consumers, according to the (a)~global, and (b)~local privacy schemes.}
|
||||||
\label{fig:privacy-schemes}
|
\label{fig:privacy-schemes}
|
||||||
@ -116,13 +116,13 @@ We identify two main data processing and publishing modes: \kat{but so far you h
|
|||||||
\begin{figure}[htp]
|
\begin{figure}[htp]
|
||||||
\centering
|
\centering
|
||||||
\subcaptionbox{Snapshot mode\label{fig:mode-snapshot}}{%
|
\subcaptionbox{Snapshot mode\label{fig:mode-snapshot}}{%
|
||||||
\includegraphics[width=.4\linewidth]{mode-snapshot}%
|
\includegraphics[width=.4\linewidth]{preliminaries/mode-snapshot}%
|
||||||
} \\ \bigskip\hspace{\fill}
|
} \\ \bigskip\hspace{\fill}
|
||||||
\subcaptionbox{Batch mode\label{fig:mode-batch}}{%
|
\subcaptionbox{Batch mode\label{fig:mode-batch}}{%
|
||||||
\includegraphics[width=.4\linewidth]{mode-batch}%
|
\includegraphics[width=.4\linewidth]{preliminaries/mode-batch}%
|
||||||
}\hspace{\fill}
|
}\hspace{\fill}
|
||||||
\subcaptionbox{Streaming mode\label{fig:mode-streaming}}{%
|
\subcaptionbox{Streaming mode\label{fig:mode-streaming}}{%
|
||||||
\includegraphics[width=.4\linewidth]{mode-streaming}%
|
\includegraphics[width=.4\linewidth]{preliminaries/mode-streaming}%
|
||||||
}\hspace{\fill}
|
}\hspace{\fill}
|
||||||
\caption{The different data processing and publishing modes of continuously generated data sets.
|
\caption{The different data processing and publishing modes of continuously generated data sets.
|
||||||
(a)~Snapshot publishing, (b)~continuous publishing--batch mode, and (c)~continuous publishing--streaming mode.
|
(a)~Snapshot publishing, (b)~continuous publishing--batch mode, and (c)~continuous publishing--streaming mode.
|
||||||
|
@ -99,13 +99,13 @@ Finally, in $2$-event-level (Figure~\ref{fig:level-w-event}) it is hard to deter
|
|||||||
\begin{figure}[htp]
|
\begin{figure}[htp]
|
||||||
\centering
|
\centering
|
||||||
\hspace{\fill}\subcaptionbox{Event-level\label{fig:level-event}}{%
|
\hspace{\fill}\subcaptionbox{Event-level\label{fig:level-event}}{%
|
||||||
\includegraphics[width=.32\linewidth]{level-event}%
|
\includegraphics[width=.32\linewidth]{preliminaries/level-event}%
|
||||||
}\hspace{\fill}
|
}\hspace{\fill}
|
||||||
\subcaptionbox{User-level\label{fig:level-user}}{%
|
\subcaptionbox{User-level\label{fig:level-user}}{%
|
||||||
\includegraphics[width=.32\linewidth]{level-user}%
|
\includegraphics[width=.32\linewidth]{preliminaries/level-user}%
|
||||||
}\hspace{\fill}
|
}\hspace{\fill}
|
||||||
\subcaptionbox{$2$-event-level\label{fig:level-w-event}}{%
|
\subcaptionbox{$2$-event-level\label{fig:level-w-event}}{%
|
||||||
\includegraphics[width=.32\linewidth]{level-w-event}%
|
\includegraphics[width=.32\linewidth]{preliminaries/level-w-event}%
|
||||||
}\hspace{\fill}
|
}\hspace{\fill}
|
||||||
\caption{Protecting the data of Table~\ref{tab:continuous-statistical} on (a)~event-, (b)~user-, and (c)~$2$-event-level. A suitable distortion method can be applied accordingly.
|
\caption{Protecting the data of Table~\ref{tab:continuous-statistical} on (a)~event-, (b)~user-, and (c)~$2$-event-level. A suitable distortion method can be applied accordingly.
|
||||||
% \kat{Why don't you distort the results already in this table?}
|
% \kat{Why don't you distort the results already in this table?}
|
||||||
@ -301,7 +301,7 @@ A specialization of this mechanism for location data is the \emph{Planar Laplace
|
|||||||
|
|
||||||
\begin{figure}[htp]
|
\begin{figure}[htp]
|
||||||
\centering
|
\centering
|
||||||
\includegraphics[width=.7\linewidth]{laplace}
|
\includegraphics[width=.7\linewidth]{preliminaries/laplace}
|
||||||
\caption{A Laplace distribution for location $\mu = 2$ and scale $b = 1$.}
|
\caption{A Laplace distribution for location $\mu = 2$ and scale $b = 1$.}
|
||||||
\label{fig:laplace}
|
\label{fig:laplace}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
@ -436,7 +436,7 @@ Naturally, using the same (or different) privacy mechanism(s) multiple times to
|
|||||||
Then, the reported data are collected by the central service, in order to be protected and then published, either as a whole, or as statistics thereof.
|
Then, the reported data are collected by the central service, in order to be protected and then published, either as a whole, or as statistics thereof.
|
||||||
Notice that in order to showcase the straightforward application of $k$-anonymity and differential privacy, we apply the two methods on each timestamp independently from the previous one, and do not take into account any additional threats imposed by continuity.
|
Notice that in order to showcase the straightforward application of $k$-anonymity and differential privacy, we apply the two methods on each timestamp independently from the previous one, and do not take into account any additional threats imposed by continuity.
|
||||||
|
|
||||||
\includetable{scenario-micro}
|
\includetable{preliminaries/scenario-micro}
|
||||||
|
|
||||||
First, we anonymize the data set of Table~\ref{tab:continuous-micro} using $k$-anonymity, with $k = 3$.
|
First, we anonymize the data set of Table~\ref{tab:continuous-micro} using $k$-anonymity, with $k = 3$.
|
||||||
This means that any user should not be distinguished from at least $2$ others.
|
This means that any user should not be distinguished from at least $2$ others.
|
||||||
@ -447,7 +447,7 @@ Naturally, using the same (or different) privacy mechanism(s) multiple times to
|
|||||||
Finally, we achieve $3$-anonymity by putting the entries in groups of three, according to the quasi-identifiers.
|
Finally, we achieve $3$-anonymity by putting the entries in groups of three, according to the quasi-identifiers.
|
||||||
Table~\ref{tab:scenario-micro} depicts the results at each timestamp.
|
Table~\ref{tab:scenario-micro} depicts the results at each timestamp.
|
||||||
|
|
||||||
\includetable{scenario-statistical}
|
\includetable{preliminaries/scenario-statistical}
|
||||||
|
|
||||||
Next, we demonstrate differential privacy.
|
Next, we demonstrate differential privacy.
|
||||||
We apply an $\varepsilon$-differentially private Laplace mechanism, with $\varepsilon = 1$, taking into account the count query that generated the true counts of Table~\ref{tab:continuous-statistical}.
|
We apply an $\varepsilon$-differentially private Laplace mechanism, with $\varepsilon = 1$, taking into account the count query that generated the true counts of Table~\ref{tab:continuous-statistical}.
|
||||||
|
@ -1,9 +1,8 @@
|
|||||||
\section{Microdata}
|
\section{Microdata}
|
||||||
\label{sec:micro}
|
\label{sec:micro}
|
||||||
|
|
||||||
|
|
||||||
Table~\ref{tab:micro} summarizes the literature for the Microdata category.
|
Table~\ref{tab:micro} summarizes the literature for the Microdata category.
|
||||||
Each reviewed work is abstractly described in this table, by its category (finite or infintite), its publishing mode (batch or streaming) and scheme(global or local), the level of privacy achieved (user, event, w-event), the attacks addressed, the privacy operation applied, and the base method it is built upon.
|
Each reviewed work is abstractly described in this table, by its category (finite or infinite), its publishing mode (batch or streaming) and scheme(global or local), the level of privacy achieved (user, event, $w$-event), the attacks addressed, the privacy operation applied, and the base method it is built upon.
|
||||||
We observe that privacy-preserving algorithms for microdata rely mostly on $k$-anonymity or derivatives of it.
|
We observe that privacy-preserving algorithms for microdata rely mostly on $k$-anonymity or derivatives of it.
|
||||||
Ganta et al.~\cite{ganta2008composition} revealed that $k$-anonymity methods are vulnerable to complementary release attacks (or \emph{composition attacks} in the original publication).
|
Ganta et al.~\cite{ganta2008composition} revealed that $k$-anonymity methods are vulnerable to complementary release attacks (or \emph{composition attacks} in the original publication).
|
||||||
Consequently, the research community proposed solutions based on $k$-anonymity, focusing on different threats linked to continuous publication, as we review later on.
|
Consequently, the research community proposed solutions based on $k$-anonymity, focusing on different threats linked to continuous publication, as we review later on.
|
||||||
@ -16,7 +15,8 @@ to account for the extra privacy loss entailed by them.
|
|||||||
\bigskip
|
\bigskip
|
||||||
|
|
||||||
We begin the discussion with the works designed for microdata as finite observations (Section~\ref{subsec:micro-finite}), to continue with the infinite observations setting (Section~\ref{subsec:micro-infinite}).
|
We begin the discussion with the works designed for microdata as finite observations (Section~\ref{subsec:micro-finite}), to continue with the infinite observations setting (Section~\ref{subsec:micro-infinite}).
|
||||||
\includetable{micro}
|
|
||||||
|
\includetable{related/micro}
|
||||||
|
|
||||||
|
|
||||||
\subsection{Finite observation}
|
\subsection{Finite observation}
|
||||||
@ -460,4 +460,4 @@ Setting privacy to ON, the user obfuscates their original query by randomly send
|
|||||||
Although this randomization step makes the original query indistinguishable while making sure that the users always get the information that they need, there is no clear quantification of the privacy guarantee that the scheme offers over time.
|
Although this randomization step makes the original query indistinguishable while making sure that the users always get the information that they need, there is no clear quantification of the privacy guarantee that the scheme offers over time.
|
||||||
\bigskip
|
\bigskip
|
||||||
|
|
||||||
\kat{Add here the comparison/contrast paragraph of microdata techniques shown previously, and your work}
|
\kat{Add here the comparison/contrast paragraph of microdata techniques shown previously, and your work}
|
||||||
|
@ -2,14 +2,15 @@
|
|||||||
\label{sec:statistical}
|
\label{sec:statistical}
|
||||||
|
|
||||||
As in Section~\ref{sec:micro}, we summarize the literature for the Statistical Data category in Table~\ref{tab:statistical}, which we structure identically as Table~\ref{tab:micro}.
|
As in Section~\ref{sec:micro}, we summarize the literature for the Statistical Data category in Table~\ref{tab:statistical}, which we structure identically as Table~\ref{tab:micro}.
|
||||||
For a reminder, each reviewed work is abstractly described in this table, by its category (finite or infintite), its publishing mode (batch or streaming) and scheme(global or local), the level of privacy achieved (user, event, w-event), the attacks addressed, the privacy operation applied, and the base method it is built upon.
|
For a reminder, each reviewed work is abstractly described in this table, by its category (finite or infinite), its publishing mode (batch or streaming) and scheme(global or local), the level of privacy achieved (user, event, $w$-event), the attacks addressed, the privacy operation applied, and the base method it is built upon.
|
||||||
|
|
||||||
As witnessed in Table~\ref{tab:statistical}, when continuously publishing statistical data, usually in the form of counts, the most widely used privacy method is differential privacy, or derivatives of it.
|
As witnessed in Table~\ref{tab:statistical}, when continuously publishing statistical data, usually in the form of counts, the most widely used privacy method is differential privacy, or derivatives of it.
|
||||||
In theory differential privacy makes no assumptions about the background knowledge available to the adversary.
|
In theory differential privacy makes no assumptions about the background knowledge available to the adversary.
|
||||||
In practice, data dependencies (e.g.,~correlations) arising in the continuous publication setting are frequently (but without it being the rule) considered as attacks in the proposed algorithms.
|
In practice, data dependencies (e.g.,~correlations) arising in the continuous publication setting are frequently (but without it being the rule) considered as attacks in the proposed algorithms.
|
||||||
|
|
||||||
We begin the discussion with the works designed for microdata as finite observations (Section~\ref{subsec:statistical-finite}), to continue with the infinite observations setting (Section~\ref{subsec:statistical-infinite}).
|
We begin the discussion with the works designed for microdata as finite observations (Section~\ref{subsec:statistical-finite}), to continue with the infinite observations setting (Section~\ref{subsec:statistical-infinite}).
|
||||||
\includetable{statistical}
|
|
||||||
|
\includetable{related/statistical}
|
||||||
|
|
||||||
|
|
||||||
\subsection{Finite observation}
|
\subsection{Finite observation}
|
||||||
@ -368,7 +369,7 @@ The combination of the Perturber and the Grouper follows the sequential composit
|
|||||||
% - w-event
|
% - w-event
|
||||||
% - differential privacy
|
% - differential privacy
|
||||||
% - perturbation (randomized response, Laplace)
|
% - perturbation (randomized response, Laplace)
|
||||||
\hypertarget{errounda2018continuous}{Errounda et al.}~\cite{errounda2018continuous} proposed a algorithm for sharing w-event local differentially private statistics over infinite streams of location data.
|
\hypertarget{errounda2018continuous}{Errounda et al.}~\cite{errounda2018continuous} proposed a algorithm for sharing $w$-event local differentially private statistics over infinite streams of location data.
|
||||||
The decision mechanism determines the similarity between the current data of every individual and the most recent release, with respect to a predefined threshold.
|
The decision mechanism determines the similarity between the current data of every individual and the most recent release, with respect to a predefined threshold.
|
||||||
Using the randomized response mechanism, it perturbs the result of this comparison and decides whether to perform an approximation based on the most recent release or calculate and release the current statistics after injecting to them Laplacian noise.
|
Using the randomized response mechanism, it perturbs the result of this comparison and decides whether to perform an approximation based on the most recent release or calculate and release the current statistics after injecting to them Laplacian noise.
|
||||||
Within the sliding window of size $w$, the privacy budget allocation mechanism estimates the overall privacy budget that the algorithm has allocated at any timestamp and decides how to optimally allocate the remaining budget in the future timestamps.
|
Within the sliding window of size $w$, the privacy budget allocation mechanism estimates the overall privacy budget that the algorithm has allocated at any timestamp and decides how to optimally allocate the remaining budget in the future timestamps.
|
||||||
|
Loading…
Reference in New Issue
Block a user