diff --git a/text/introduction/contribution.tex b/text/introduction/contribution.tex index 61a1e58..d767187 100644 --- a/text/introduction/contribution.tex +++ b/text/introduction/contribution.tex @@ -1,4 +1,39 @@ \section{Contribution} \label{sec:contr} +Our objective is to study the problems around the subject of two contrasting desiderata: quality and privacy in user-generated Big Data. +We consider the scenario where data are processed/published in a continuous manner and envision a configurable privacy technique that we can tune depending on the use case requirements. -\mk{WIP} +The main challenge in this setting is maintaining a balance between privacy and utility. +Furthermore, we consider the presence of temporal correlation, which is inherent in continuous data publishing schemes and can lead to additional privacy loss. + + +\paragraph{Privacy, space and time} +The first contribution of this thesis is the survey of the existing literature regarding methods on privacy-preserving continuous data publishing. +We study works that were published over the past two decades and provide a guide that will navigate its users through the available methodology and help them select the algorithm(s) that fit(s) best their needs. + +We categorize the works that we review depending on if they deal with \emph{microdata} or \emph{statistical data}. +Then, we group them based on the duration of the processing/publishing that they aim for. +Furthermore, we document in detail the privacy protection characteristics of each reviewed method. + + +\paragraph{{\Thething} privacy} +Our second contribution is the proposal and formal definition of a novel privacy notion, \emph{{\thething} privacy}. +Contrary to the existing privacy protection levels, our notion differentiates events between regular and events that a user might consider more privacy-sensitive, i.e.~\emph{{\thethings}}. +The introduction of {\thethings}, allows for a configurable privacy protection. + +First, we design and implement three {\thething} privacy schemes, accounting for {\thethings} spanning a finite time series. +Thereafter, we investigate {\thething} privacy under temporal correlation, which is inherent in time series publishing, and discuss how {\thethings} can affect the propagation of temporal privacy loss. + + +\paragraph{Dummy {\thething} selection} +The third contribution of this thesis is the design of a module that extends our {\thething} privacy schemes and provides additional protection to {\thethings}. +In other words, we answer the question \emph{`How can we protect the fact that we care more about certain events?'}. + +We design an additional differential privacy mechanism, based on the exponential mechanism, that we can easily plug-in the proposed existing {\thething} privacy schemes. +We provide an optimal solution to this problem, which we improve by adopting a heuristic approach, and then implement a more efficient module that relies in partitioning. + + +\bigskip +We extensively evaluate the methods that we propose by conducting experiments on real and synthetic data sets. +We compare {\thething} privacy with event- and user-level privacy protection, and investigates the behavior of the overall privacy loss under temporal correlation for different distributions of {\thethings}. +Furthermore, we estimate the impact of the privacy-preserving dummy {\thething} selection module on the utility of our privacy scheme.