the-last-thing/text/introduction/contribution.tex

50 lines
3.6 KiB
TeX

\section{Contribution}
\label{sec:contr}
Our objective is to study the problems around the subject of two contrasting desiderata: quality and privacy in user-generated Big Data.
We consider the scenario where data are processed/published in a continuous manner and envision a configurable privacy technique that we can tune depending on the use case requirements.
The main challenge in this setting is maintaining a balance between privacy and utility.
Furthermore, we consider the presence of temporal correlation, which is inherent in continuous data publishing schemes and can lead to additional privacy loss.
\paragraph{Privacy, space and time}
The first contribution of this thesis is the survey~\cite{katsomallos2019privacy}
% \kat{cite it here}
of the existing literature regarding methods on privacy-preserving continuous data publishing, which appeared in
% \kat{name the journal and the special issue}
the special feature on Geospatial Privacy and Security in the $19$th journal of Spatial Information Science.
We study works that were published over the past two decades and provide a guide that will navigate its users through the available methodology and help them select the algorithms that are fitting best their needs.
We categorize the works that we review depending on if they deal with \emph{microdata} or \emph{statistical data}.
Then, we group them based on the duration of the processing/publishing that they aim for.
Furthermore, we document in detail the privacy protection characteristics of each reviewed method.
\paragraph{{\Thething} privacy}
Our second contribution is the proposal and formal definition of a novel privacy notion, \emph{{\thething} privacy}.
Contrary to the existing privacy protection levels, our notion differentiates events between regular and events that a user might consider more privacy-sensitive, i.e.~\emph{{\thethings}}.
The introduction of {\thethings}, allows for a configurable privacy protection.
First, we design and implement three {\thething} privacy schemes, accounting for {\thethings} spanning a finite time series.
Thereafter, we investigate {\thething} privacy under temporal correlation, which is inherent in time series publishing, and study how {\thethings} can affect the propagation of temporal privacy loss.
\paragraph{Dummy {\thething} selection}
The third contribution of this thesis is the design of a module that extends our {\thething} privacy schemes and provides additional protection to {\thethings}.
In other words, we answer the question \emph{`How can we protect the fact that we care more about certain events?'}.
We design an additional differential privacy mechanism, based on the exponential mechanism, that we can easily plug in to the proposed {\thething} privacy schemes.
We provide an optimal solution to this problem, which we improve by adopting a heuristic approach, and then implement a more efficient module that relies on partitioning.
\bigskip
We extensively evaluate the methods that we propose by conducting experiments on real and synthetic data sets.
We compare {\thething} privacy with event- and user-level privacy protection, and investigate the behavior of the overall privacy loss under temporal correlation for different distributions of {\thethings}.
Furthermore, we estimate the impact of the privacy-preserving dummy {\thething} selection module on the utility of our privacy scheme.
The second and the third contributions are described in the article~\cite{katsomallos2022landmark},
% \kat{cite the technical report}
which is submitted at the research papers track
% \kat{name the conference}
of the $12$th ACM conference on Data and Application Security and Privacy.