the-last-thing/text/introduction/contribution.tex

50 lines
3.6 KiB
TeX
Raw Normal View History

2021-07-18 17:31:05 +02:00
\section{Contribution}
\label{sec:contr}
2021-10-25 06:36:37 +02:00
Our objective is to study the problems around the subject of two contrasting desiderata: quality and privacy in user-generated Big Data.
We consider the scenario where data are processed/published in a continuous manner and envision a configurable privacy technique that we can tune depending on the use case requirements.
2021-10-15 09:00:47 +02:00
2021-10-25 06:36:37 +02:00
The main challenge in this setting is maintaining a balance between privacy and utility.
Furthermore, we consider the presence of temporal correlation, which is inherent in continuous data publishing schemes and can lead to additional privacy loss.
\paragraph{Privacy, space and time}
The first contribution of this thesis is the survey~\cite{katsomallos2019privacy}
% \kat{cite it here}
of the existing literature regarding methods on privacy-preserving continuous data publishing, which appeared in
% \kat{name the journal and the special issue}
the special feature on Geospatial Privacy and Security in the $19$th Journal of Spatial Information Science.
We study works that were published over the past two decades and provide a guide that will navigate its users through the available methodology and help them select the algorithms that are fitting best their needs.
2021-10-25 06:36:37 +02:00
We categorize the works that we review depending on if they deal with \emph{microdata} or \emph{statistical data}.
Then, we group them based on the duration of the processing/publishing that they aim for.
Furthermore, we document in detail the privacy protection characteristics of each reviewed method.
\paragraph{{\Thething} privacy}
Our second contribution is the proposal and formal definition of a novel privacy notion, \emph{{\thething} privacy}.
Contrary to the existing privacy protection levels, our notion differentiates events between regular and events that a user might consider more privacy-sensitive, i.e.~\emph{{\thethings}}.
The introduction of {\thethings}, allows for a configurable privacy protection.
First, we design and implement three {\thething} privacy schemes, accounting for {\thethings} spanning a finite time series.
Thereafter, we investigate {\thething} privacy under temporal correlation, which is inherent in time series publishing, and study how {\thethings} can affect the propagation of temporal privacy loss.
2021-10-25 06:36:37 +02:00
\paragraph{Dummy {\thething} selection}
The third contribution of this thesis is the design of a module that extends our {\thething} privacy schemes and provides additional protection to {\thethings}.
In other words, we answer the question \emph{`How can we protect the fact that we care more about certain events?'}.
We design an additional differential privacy mechanism, based on the exponential mechanism, that we can easily plug in to the proposed {\thething} privacy schemes.
We provide an optimal solution to this problem, which we improve by adopting a heuristic approach, and then implement a more efficient module that relies on partitioning.
2021-10-25 06:36:37 +02:00
\bigskip
We extensively evaluate the methods that we propose by conducting experiments on real and synthetic data sets.
We compare {\thething} privacy with event- and user-level privacy protection, and investigate the behavior of the overall privacy loss under temporal correlation for different distributions of {\thethings}.
2021-10-25 06:36:37 +02:00
Furthermore, we estimate the impact of the privacy-preserving dummy {\thething} selection module on the utility of our privacy scheme.
2021-10-30 16:05:57 +02:00
The second and the third contributions are described in the article~\cite{katsomallos2022landmark},
% \kat{cite the technical report}
which is submitted at the research papers track
% \kat{name the conference}
of the $12$th ACM Conference on Data and Application Security and Privacy.