41 lines
4.1 KiB
TeX
41 lines
4.1 KiB
TeX
\chapter{Abstract}
|
||
\label{ch:abs}
|
||
% \kat{Il faut aussi en francais :) }
|
||
% \mk{D'accord :( }
|
||
Sensors, portable devices, and crowdsensing applications
|
||
% , e.g.,~trajectory monitoring, smart metering, contact tracing, etc.,
|
||
generate massive amounts of user-related, and usually geo-tagged, data on a daily basis.
|
||
The manipulation of such data is useful in numerous application domains including traffic monitoring, intelligent building, and healthcare.
|
||
A high percentage of these data carry information of user activities and other personal details, and thus their manipulation and sharing raise concerns about the privacy of the individuals involved.
|
||
To enable the secure---from the user privacy perspective---data sharing, researchers have already proposed various seminal techniques for the protection of user privacy while accounting for data utility and quality.
|
||
However, the continuous fashion in which data are generated nowadays, and the high availability of external sources of information, pose more threats and add extra challenges to the problem.
|
||
% \kat{Mention here the extra challenges posed by the specific problem that you address : the Landmark privacy}
|
||
It is therefore essential to design solutions that not only guarantee a balance between user privacy protection and data utility, but also provide configurability and consider the preferences of the users.
|
||
|
||
% Survey
|
||
Initially, we investigate the literature regarding data privacy in continuous data publishing, and report on the proposed solutions, with a special focus on solutions concerning location or geo-referenced data.
|
||
As a matter of fact, a wealth of algorithms has been proposed for privacy-preserving data publishing, either for microdata or statistical data.
|
||
In this context, we seek to offer a guide that would allow readers to choose the proper algorithm(s) for their specific use case accordingly.
|
||
We provide an insight into time-related properties of the algorithms, e.g.,~if they work on finite or infinite data, or if they take into consideration any underlying type of data correlation.
|
||
|
||
% Landmarks
|
||
Thereafter, we proceed to propose a novel type of data privacy, called \emph{{\thething} privacy}.
|
||
We observe that in continuous data publishing, events are not equally significant in terms of privacy, and hence they should affect the privacy-preserving processing differently.
|
||
Differential privacy is a well-established paradigm in privacy-preserving time series publishing.
|
||
The existing differential privacy protection levels protect either a single timestamp, or all the data per user or per window in the time series; however, considering all timestamps as equally significant.
|
||
The novel notion that we propose, {\thething} privacy, is based on differential privacy and allocates the available privacy budget while taking into account significant events (\emph{\thethings}) in the time series.
|
||
We design three privacy schemes that guarantee {\thething} privacy and further extend them by providing more robust privacy protection to the {\thething} set with the design of a dummy {\thething} selection module.
|
||
|
||
% Evaluation
|
||
Finally, we evaluate the {\thething} privacy schemes and dummy {\thething} selection module, that we proposed, on real and synthetic data sets.
|
||
We assess the impact on data utility for several possible {\thething} distributions, with emphasis on situations under the presence of temporal correlation.
|
||
% \kat{add selection, and a small comment on the conclusions driven by the experiments.}
|
||
Overall, the results of the experimental evaluation and comparative analysis of {\thething} privacy validate its applicability to several use case scenarios and showcase the improvement, in terms of data utility, over the existing privacy protection levels.
|
||
Particularly, the dummy {\thething} selection module introduces a reasonable data utility decline to all of the {\thething} privacy schemes.
|
||
In terms of temporal correlation, we observe that under moderate and strong correlation, greater average regular–{\thething} event distance causes greater overall privacy loss.
|
||
|
||
|
||
|
||
\paragraph{Keywords:}
|
||
data quality, data privacy, continuous data publishing, crowdsensing, privacy-preserving data processing
|