contribution: Done

This commit is contained in:
Manos Katsomallos 2021-10-25 06:36:37 +02:00
parent 869924d7af
commit 09970607ff

View File

@ -1,4 +1,39 @@
\section{Contribution}
\label{sec:contr}
Our objective is to study the problems around the subject of two contrasting desiderata: quality and privacy in user-generated Big Data.
We consider the scenario where data are processed/published in a continuous manner and envision a configurable privacy technique that we can tune depending on the use case requirements.
\mk{WIP}
The main challenge in this setting is maintaining a balance between privacy and utility.
Furthermore, we consider the presence of temporal correlation, which is inherent in continuous data publishing schemes and can lead to additional privacy loss.
\paragraph{Privacy, space and time}
The first contribution of this thesis is the survey of the existing literature regarding methods on privacy-preserving continuous data publishing.
We study works that were published over the past two decades and provide a guide that will navigate its users through the available methodology and help them select the algorithm(s) that fit(s) best their needs.
We categorize the works that we review depending on if they deal with \emph{microdata} or \emph{statistical data}.
Then, we group them based on the duration of the processing/publishing that they aim for.
Furthermore, we document in detail the privacy protection characteristics of each reviewed method.
\paragraph{{\Thething} privacy}
Our second contribution is the proposal and formal definition of a novel privacy notion, \emph{{\thething} privacy}.
Contrary to the existing privacy protection levels, our notion differentiates events between regular and events that a user might consider more privacy-sensitive, i.e.~\emph{{\thethings}}.
The introduction of {\thethings}, allows for a configurable privacy protection.
First, we design and implement three {\thething} privacy schemes, accounting for {\thethings} spanning a finite time series.
Thereafter, we investigate {\thething} privacy under temporal correlation, which is inherent in time series publishing, and discuss how {\thethings} can affect the propagation of temporal privacy loss.
\paragraph{Dummy {\thething} selection}
The third contribution of this thesis is the design of a module that extends our {\thething} privacy schemes and provides additional protection to {\thethings}.
In other words, we answer the question \emph{`How can we protect the fact that we care more about certain events?'}.
We design an additional differential privacy mechanism, based on the exponential mechanism, that we can easily plug-in the proposed existing {\thething} privacy schemes.
We provide an optimal solution to this problem, which we improve by adopting a heuristic approach, and then implement a more efficient module that relies in partitioning.
\bigskip
We extensively evaluate the methods that we propose by conducting experiments on real and synthetic data sets.
We compare {\thething} privacy with event- and user-level privacy protection, and investigates the behavior of the overall privacy loss under temporal correlation for different distributions of {\thethings}.
Furthermore, we estimate the impact of the privacy-preserving dummy {\thething} selection module on the utility of our privacy scheme.