thething: Minor corrections
This commit is contained in:
parent
d6c2d5955a
commit
93ecd84144
@ -1,41 +1,6 @@
|
||||
\section{Significant events}
|
||||
\label{sec:thething}
|
||||
|
||||
% Crowdsensing applications
|
||||
The plethora of sensors currently embedded in personal devices and other infrastructures have paved the way for the development of numerous \emph{crowdsensing services} (e.g.,~Ring~\cite{ring}, TousAntiCovid~\cite{tousanticovid}, Waze~\cite{waze}, etc.) based on the collected personal, and usually geotagged and timestamped data.
|
||||
% Continuously user-generated data
|
||||
User--service interactions gather personal event-like data, that are data items comprised of pairs of an identifying attribute of an individual and the---possibly sensitive---information at a timestamp (including contextual information), e.g.,~(\emph{`Bob', `dining', `Canal Saint-Martin', $17{:}00$}).
|
||||
When the interactions are performed in a continuous manner, we obtain ~\emph{time series} of events.
|
||||
% Observation/interaction duration
|
||||
Depending on the duration, we distinguish the interaction/observation into \emph{finite}, when taking place during a predefined time interval, and \emph{infinite}, when taking place in an uninterrupted fashion.
|
||||
Example~\ref{ex:scenario} shows the result of user--LBS interaction while retrieving location-based information or reporting user-state at various locations.
|
||||
|
||||
\begin{example}
|
||||
\label{ex:scenario}
|
||||
|
||||
Consider a finite sequence of spatiotemporal data generated by Bob during an interval of $8$ timestamps, as shown in Figure~\ref{fig:scenario}.
|
||||
Events in a shade correspond to privacy-sensitive events that Bob has defined beforehand. For instance his home is around {\'E}lys{\'e}e, his workplace is around the Louvre, and his hangout is around Canal Saint-Martin.
|
||||
|
||||
\begin{figure}[htp]
|
||||
\centering
|
||||
\includegraphics[width=\linewidth]{problem/lmdk-scenario}
|
||||
\caption{A time series with {\thethings} (highlighted in gray).
|
||||
}
|
||||
\label{fig:scenario}
|
||||
\end{figure}
|
||||
|
||||
\end{example}
|
||||
|
||||
% Privacy-preserving data processing
|
||||
The services collect and further process the time series in order to give useful feedback to the involved users or to provide valuable insight to various internal/external analytical services.
|
||||
The regulation regarding the processing of user-generated data sets~\cite{tankard2016gdpr} requires the provision of privacy guarantees to the users.
|
||||
At the same time, it is essential to provide utility metrics to the final consumers of the privacy-preserving process output.
|
||||
To accomplish this, various privacy techniques perturb the original data or the processing output at the expense of the overall utility of the final output.
|
||||
A widely recognized tool that introduces probabilistic randomness to the original data, while quantifying with a parameter $\varepsilon$ (`privacy budget'~\cite{mcsherry2009privacy}) the privacy/utility ratio is \emph{$\varepsilon$-differential privacy}~\cite{dwork2006calibrating}.
|
||||
Due to its \emph{composition} property, i.e.,~the combination of differentially private outputs satisfies differential privacy as well, differential privacy is suitable for privacy-preserving time series publishing.
|
||||
\emph{Event}, \emph{user}~\cite{dwork2010differential, dwork2010pan}, and \emph{$w$-event}~\cite{kellaris2014differentially} comprise the possible levels of privacy protection.
|
||||
Event-level limits the privacy protection to \emph{any single event}, user-level protects \emph{all the events} of any user, and $w$-event provides privacy protection to \emph{any sequence of $w$ events}.
|
||||
|
||||
The privacy mechanisms for the aforementioned levels assume that in a time series any single event, or any sequence of events, or the entire series of events is equally privacy-significant for the users.
|
||||
In reality, this is an simplistic assumption.
|
||||
The significance of an event is related to certain user-defined privacy criteria, or to its adjacent events, as well as to the entire time series.
|
||||
|
@ -3,7 +3,7 @@
|
||||
|
||||
\subsubsection{Setting}
|
||||
\label{subsec:lmdk-set}
|
||||
Our problem setting consists of three entities: (i) data generators (users), (ii) data publishers (trusted non-adversarial entities), and (iii) data consumers (possibly adversarial entities).
|
||||
Our problem setting consists of three entities: (i)~data generators (users), (ii)~data publishers (trusted non-adversarial entities), and (iii)~data consumers (possibly adversarial entities).
|
||||
Users generate sensitive data, which are processed in a secure and private way by a trusted curator and are later published in order to be consumed by potentially adversarial data analysts.
|
||||
%The data unit produced by the users is an \emph{event}, i.e., a piece of timestamped user-related information.\kat{should we say geo-stamped?}.
|
||||
Data are produced as a series of events, which we call time series.
|
||||
|
@ -7,7 +7,7 @@
|
||||
|
||||
\paragraph{Uniform}
|
||||
%\kat{isn't the uniform distribution a method? there is a section for the methods. }
|
||||
Figure~\ref{fig:lmdk-uniform} shows the simplest model that implements Theorem~\ref{theor:thething-prv}, the \textbf{Uniform} distribution of privacy budget $\varepsilon$ for {\thething} privacy.
|
||||
Figure~\ref{fig:lmdk-uniform} shows the simplest model that implements Theorem~\ref{theor:thething-prv}, the \emph{Uniform} distribution of privacy budget $\varepsilon$ for {\thething} privacy.
|
||||
% \mk{We capitalize the first letter because it's the name of the method.}
|
||||
% in comparison with user-level protection.
|
||||
In this case, it is enough to distribute at each timestamp the total privacy budget divided by the number of timestamps corresponding to {\thethings}, plus one if we are releasing a regular timestamp.
|
||||
@ -24,7 +24,7 @@ Consequently, at each timestamp we protect every {\thething}, while reserving a
|
||||
|
||||
\paragraph{Skip}
|
||||
% Why skipping publications is problematic?
|
||||
One might argue that we could \textbf{Skip} the \thething\ data releases.
|
||||
One might argue that we could \emph{Skip} the \thething\ data releases.
|
||||
% and limit the number of {\thethings}.
|
||||
This would result in preserving all of the available privacy budget for regular events (because the set $L \cup \{t\}$ becomes $\{t\}$), equivalently to event-level protection.
|
||||
In practice, however, this approach can eventually pose arbitrary privacy risks, especially when dealing with geotagged data.
|
||||
@ -39,7 +39,7 @@ Particularly, sporadic location data publishing~\cite{gambs2010show, russell2018
|
||||
|
||||
|
||||
\paragraph{Adaptive}
|
||||
Next, we propose an \textbf{Adaptive} privacy mechanism taking into account changes in the input data and exploiting the post-processing property of differential privacy.
|
||||
Next, we propose an emph{Adaptive} privacy mechanism taking into account changes in the input data and exploiting the post-processing property of differential privacy.
|
||||
Initially, it reserves uniformly the available privacy budget for each future release.
|
||||
At each timestamp, based on a sampling rate the mechanism either publishes with noise the original data or it releases an approximation based on previous releases.
|
||||
In the case when it publishes with noise the original data, it also calculates the difference between the current and the previous release and compares the difference with the scale of the perturbation ($\frac{\Delta f}{\varepsilon}$).
|
||||
|
Loading…
Reference in New Issue
Block a user