50 lines
3.6 KiB
TeX
50 lines
3.6 KiB
TeX
\section{Contribution}
|
|
\label{sec:contr}
|
|
Our objective is to study the problems around the subject of two contrasting desiderata: quality and privacy in user-generated Big Data.
|
|
We consider the scenario where data are processed/published in a continuous manner and envision a configurable privacy technique that we can tune depending on the use case requirements.
|
|
|
|
The main challenge in this setting is maintaining a balance between privacy and utility.
|
|
Furthermore, we consider the presence of temporal correlation, which is inherent in continuous data publishing schemes and can lead to additional privacy loss.
|
|
|
|
|
|
\paragraph{Privacy, space and time}
|
|
The first contribution of this thesis is the survey~\cite{katsomallos2019privacy}
|
|
% \kat{cite it here}
|
|
of the existing literature regarding methods on privacy-preserving continuous data publishing, which appeared in
|
|
% \kat{name the journal and the special issue}
|
|
the special feature on Geospatial Privacy and Security in the $19$th journal of Spatial Information Science.
|
|
We study works that were published over the past two decades and provide a guide that will navigate its users through the available methodology and help them select the algorithms that are fitting best their needs.
|
|
|
|
We categorize the works that we review depending on if they deal with \emph{microdata} or \emph{statistical data}.
|
|
Then, we group them based on the duration of the processing/publishing that they aim for.
|
|
Furthermore, we document in detail the privacy protection characteristics of each reviewed method.
|
|
|
|
|
|
\paragraph{{\Thething} privacy}
|
|
Our second contribution is the proposal and formal definition of a novel privacy notion, \emph{{\thething} privacy}.
|
|
Contrary to the existing privacy protection levels, our notion differentiates events between regular and events that a user might consider more privacy-sensitive, i.e.~\emph{{\thethings}}.
|
|
The introduction of {\thethings}, allows for a configurable privacy protection.
|
|
|
|
First, we design and implement three {\thething} privacy schemes, accounting for {\thethings} spanning a finite time series.
|
|
Thereafter, we investigate {\thething} privacy under temporal correlation, which is inherent in time series publishing, and study how {\thethings} can affect the propagation of temporal privacy loss.
|
|
|
|
|
|
\paragraph{Dummy {\thething} selection}
|
|
The third contribution of this thesis is the design of a module that extends our {\thething} privacy schemes and provides additional protection to {\thethings}.
|
|
In other words, we answer the question \emph{`How can we protect the fact that we care more about certain events?'}.
|
|
|
|
We design an additional differential privacy mechanism, based on the exponential mechanism, that we can easily plug in to the proposed {\thething} privacy schemes.
|
|
We provide an optimal solution to this problem, which we improve by adopting a heuristic approach, and then implement a more efficient module that relies on partitioning.
|
|
|
|
|
|
\bigskip
|
|
We extensively evaluate the methods that we propose by conducting experiments on real and synthetic data sets.
|
|
We compare {\thething} privacy with event- and user-level privacy protection, and investigate the behavior of the overall privacy loss under temporal correlation for different distributions of {\thethings}.
|
|
Furthermore, we estimate the impact of the privacy-preserving dummy {\thething} selection module on the utility of our privacy scheme.
|
|
|
|
The second and the third contributions are described in the article~\cite{katsomallos2022landmark},
|
|
% \kat{cite the technical report}
|
|
which will appear at the research papers track
|
|
% \kat{name the conference}
|
|
of the $12$th ACM conference on Data and Application Security and Privacy.
|