problem: Structure
This commit is contained in:
		@ -1,9 +1,11 @@
 | 
				
			|||||||
<<<<<<< HEAD
 | 
					 | 
				
			||||||
\chapter{Landmark privacy}
 | 
					 | 
				
			||||||
\label{ch:thething-prv}
 | 
					 | 
				
			||||||
=======
 | 
					 | 
				
			||||||
\chapter{Landmark Privacy}
 | 
					\chapter{Landmark Privacy}
 | 
				
			||||||
>>>>>>> b334e056b320357ce4f4eaa89a1be7f3576350cf
 | 
					\label{ch:lmdk-prv}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					In this chapter, we propose a novel configurable privacy scheme, \emph{\thething} privacy, which takes into account significant events (\emph{\thethings}) in the time series and allocates the available privacy budget accordingly.
 | 
				
			||||||
 | 
					We propose two privacy models that guarantee {\thething} privacy.
 | 
				
			||||||
 | 
					To further enhance our privacy method, and protect the {\thethings} position in the time series, we propose techniques to perturb the initial {\thethings} set (Section~\ref{sec:theotherthing}).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
\input{problem/thething/main}
 | 
					\input{problem/thething/main}
 | 
				
			||||||
 | 
					\input{problem/theotherthing/main}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\input{problem/summary}
 | 
				
			||||||
 | 
				
			|||||||
@ -1,5 +1,6 @@
 | 
				
			|||||||
\section{Summary}
 | 
					\section{Summary}
 | 
				
			||||||
\label{sec:lmdk-sum}
 | 
					\label{sec:lmdk-sum}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
In this chapter, we presented \emph{{\thething} privacy} for privacy-preserving time series publishing, which allows for the protection of significant events, while improving the utility of the final result w.r.t. the traditional user-level differential privacy.
 | 
					In this chapter, we presented \emph{{\thething} privacy} for privacy-preserving time series publishing, which allows for the protection of significant events, while improving the utility of the final result w.r.t. the traditional user-level differential privacy.
 | 
				
			||||||
We also proposed three models for  {\thething} privacy, and quantified the privacy loss under temporal correlation.
 | 
					We also proposed three models for  {\thething} privacy, and quantified the privacy loss under temporal correlation.
 | 
				
			||||||
%Our experiments on real and synthetic data sets validate our proposal. 
 | 
					%Our experiments on real and synthetic data sets validate our proposal. 
 | 
				
			||||||
@ -1,2 +1,2 @@
 | 
				
			|||||||
\subsection{Selection of events}
 | 
					\section{Selection of events}
 | 
				
			||||||
\label{subsec:theotherthing}
 | 
					\label{sec:theotherthing}
 | 
				
			||||||
 | 
				
			|||||||
@ -1,7 +1,6 @@
 | 
				
			|||||||
\section{Contribution}
 | 
					\subsection{Contribution}
 | 
				
			||||||
\label{sec:lmdk-contrib}
 | 
					\label{subsec:lmdk-contrib}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
In this chapter, we formally define a novel privacy notion that we call \emph{{\thething} privacy}.
 | 
					In this section, we formally define a novel privacy notion that we call \emph{{\thething} privacy}.
 | 
				
			||||||
We apply this privacy notion to time series consisting of \emph{{\thethings}} and regular events, and we design and implement three {\thething} privacy mechanisms.
 | 
					We apply this privacy notion to time series consisting of \emph{{\thethings}} and regular events, and we design and implement three {\thething} privacy mechanisms.
 | 
				
			||||||
We further study {\thething} privacy under temporal correlation that is inherent in time series publishing.
 | 
					We further study {\thething} privacy under temporal correlation that is inherent in time series publishing.
 | 
				
			||||||
Finally, we evaluate {\thething} privacy with real and synthetic data sets, in settings with or without temporal correlation, showcasing the validity of our model.
 | 
					 | 
				
			||||||
 | 
				
			|||||||
@ -1,24 +1,85 @@
 | 
				
			|||||||
%\section{Significant events}
 | 
					\section{Significant events}
 | 
				
			||||||
%\label{sec:thething}
 | 
					\label{sec:thething}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					% Crowdsensing applications
 | 
				
			||||||
 | 
					The plethora of sensors currently embedded in personal devices and other infrastructures have paved the way for the development of numerous \emph{crowdsensing services} (e.g.,~Ring~\cite{ring}, TousAntiCovid~\cite{tousanticovid}, Waze~\cite{waze}, etc.) based on the collected personal, and usually geotagged and timestamped data.
 | 
				
			||||||
 | 
					% Continuously user-generated data
 | 
				
			||||||
 | 
					User--service interactions gather personal event-like data, that are data items comprised of pairs of an identifying attribute of an individual and the---possibly sensitive---information at a timestamp (including contextual information), e.g.,~(\emph{`Bob', `dining', `Canal Saint-Martin', $17{:}00$}).
 | 
				
			||||||
 | 
					When the interactions are performed in a continuous manner, we obtain ~\emph{time series} of events.
 | 
				
			||||||
 | 
					% Observation/interaction duration
 | 
				
			||||||
 | 
					Depending on the duration, we distinguish the interaction/observation into \emph{finite}, when taking place during a predefined time interval, and \emph{infinite}, when taking place in an uninterrupted fashion.
 | 
				
			||||||
 | 
					Example~\ref{ex:scenario} shows the result of user--LBS interaction while retrieving location-based information or reporting user-state at various locations.
 | 
				
			||||||
 | 
					 
 | 
				
			||||||
 | 
					\begin{example}
 | 
				
			||||||
 | 
					  \label{ex:scenario}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  Consider a finite sequence of spatiotemporal data generated by Bob during an interval of $8$ timestamps, as shown in Figure~\ref{fig:scenario}.
 | 
				
			||||||
 | 
					  Events in a shade correspond to privacy-sensitive events that Bob has defined beforehand. For instance his home is around {\'E}lys{\'e}e, his workplace is around the Louvre, and his hangout is around Canal Saint-Martin.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  \begin{figure}[htp]
 | 
				
			||||||
 | 
					    \centering
 | 
				
			||||||
 | 
					    \includegraphics[width=\linewidth]{lmdk-scenario}
 | 
				
			||||||
 | 
					    \caption{A time series with {\thethings} (highlighted in gray).
 | 
				
			||||||
 | 
					    }
 | 
				
			||||||
 | 
					    \label{fig:scenario}
 | 
				
			||||||
 | 
					  \end{figure}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\end{example}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					% Privacy-preserving data processing
 | 
				
			||||||
 | 
					The services collect and further process the time series in order to give useful feedback to the involved users or to provide valuable insight to various internal/external analytical services.
 | 
				
			||||||
 | 
					The regulation regarding the processing of user-generated data sets~\cite{tankard2016gdpr} requires the provision of privacy guarantees to the users. 
 | 
				
			||||||
 | 
					At the same time, it is essential to provide utility metrics to the final consumers of the privacy-preserving process output. 
 | 
				
			||||||
 | 
					To accomplish this, various privacy techniques perturb the original data or the processing output at the expense of the overall utility of the final output.
 | 
				
			||||||
 | 
					A widely recognized tool that introduces probabilistic randomness to the original data, while quantifying with a parameter $\varepsilon$ (`privacy budget'~\cite{mcsherry2009privacy}) the privacy/utility ratio is \emph{$\varepsilon$-differential privacy}~\cite{dwork2006calibrating}.
 | 
				
			||||||
 | 
					Due to its \emph{composition} property, i.e.,~the combination of differentially private outputs satisfies differential privacy as well, differential privacy is suitable for privacy-preserving time series publishing.
 | 
				
			||||||
 | 
					\emph{Event}, \emph{user}~\cite{dwork2010differential, dwork2010pan}, and \emph{$w$-event}~\cite{kellaris2014differentially} comprise the possible levels of privacy protection.
 | 
				
			||||||
 | 
					Event-level limits the privacy protection to \emph{any single event}, user-level protects \emph{all the events} of any user, and $w$-event provides privacy protection to \emph{any sequence of $w$ events}.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The privacy mechanisms for the aforementioned levels assume that in a time series any single event, or any sequence of events, or the entire series of events is equally privacy-significant for the users.
 | 
				
			||||||
 | 
					In reality, this is an simplistic assumption.
 | 
				
			||||||
 | 
					The significance of an event is related to certain user-defined privacy criteria, or to its adjacent events, as well as to the entire time series.
 | 
				
			||||||
 | 
					We term significant events as \emph{{\thething} events} or simply \emph{\thethings}. 
 | 
				
			||||||
 | 
					Identifying {\thethings} can be done in an automatic or manual way (but is out of scope for this work).
 | 
				
			||||||
 | 
					For example, in spatiotemporal data, \emph{places where an individual spent some time} denote \emph{points of interest} (POIs) (called also stay points)~\cite{zheng2015trajectory}.
 | 
				
			||||||
 | 
					Such events, and more particularly their spatial attribute values, can be less privacy-sensitive~\cite{primault2018long}, e.g.,~parks, theaters, etc. or, if individuals frequent them, they can reveal supplementary information, e.g.,~residences (home addresses)~\cite{gambs2010show}, places of worship (religious beliefs)~\cite{franceschi-bicchierairussell2015redditor}, etc.
 | 
				
			||||||
 | 
					POIs can be an example of how we can choose {\thethings}, but the idea is not limited to these.
 | 
				
			||||||
 | 
					Another example is the detection of privacy-sensitive user interactions by \emph{contact tracing} applications.
 | 
				
			||||||
 | 
					This can be practical in decease control~\cite{eames2003contact}, similar to the recent outbreak of the Coronavirus disease 2019 (COVID-19) epidemic~\cite{ahmed2020survey}.
 | 
				
			||||||
 | 
					Last but not least, {\thethings} in \emph{smart grid} electricity usage patterns could not only reveal the energy consumption of a user but also information regarding activities, e.g.,~`at work', `sleeping', etc. and types of appliances already installed or recently purchased~\cite{khurana2010smart}.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\begin{example}
 | 
				
			||||||
 | 
					  \label{ex:st-cont}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  Figure~\ref{fig:st-cont} shows the case when we want to protect all of Bob's significant events ($p_1$, $p_3$, $p_5$, $p_8$) in his trajectory shown in Figure~\ref{fig:scenario}.
 | 
				
			||||||
 | 
					  % That is, we have to allocate privacy budget $\varepsilon$ such that at any timestamp $t$ it holds that $\varepsilon_t + \varepsilon_1 + \varepsilon_3 + \varepsilon_5 + \varepsilon_8 \leq \varepsilon$.
 | 
				
			||||||
 | 
					  In this scenario, event-level protection is not suitable since it can only protect one event at a time.
 | 
				
			||||||
 | 
					  Hence, we have to apply user-level privacy protection by distributing equal portions of $\varepsilon$ to all the events, i.e.,~$\frac{\varepsilon}{8}$ to each one (the equivalent of applying $8$-event privacy).
 | 
				
			||||||
 | 
					  In this way, we have protected the {\thething} points; we have allocated a total of $\frac{\varepsilon}{2}<\varepsilon$ to the {\thethings}. 
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  \begin{figure}[htp]
 | 
				
			||||||
 | 
					    \centering
 | 
				
			||||||
 | 
					    \includegraphics[width=\linewidth]{st-cont}
 | 
				
			||||||
 | 
					    \caption{User-level and {\thething} $\varepsilon$-differential privacy protection for the time series of Figure~\ref{fig:scenario}.}
 | 
				
			||||||
 | 
					    \label{fig:st-cont}
 | 
				
			||||||
 | 
					  \end{figure}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  However, perturbing by $\frac{\varepsilon}{8}$ each regular point deteriorates the data utility unnecessarily.
 | 
				
			||||||
 | 
					  Notice that the overall privacy budget that we ended up allocating to the user-defined significant events is equal to $\frac{\varepsilon}{2}$ and leaves an equal amount of budget to distribute to any current event.
 | 
				
			||||||
 | 
					  In other words, uniformly allocating $\frac{\varepsilon}{5}$ to every event would still achieve the Bob's privacy goal, i.e.,~protect every significant event, while achieving better utility overall.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\end{example}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					We argue that protecting only {\thething} events along with any regular event release is sufficient for the user's protection, while it improves data utility.
 | 
				
			||||||
 | 
					Considering {\thething} events can prevent over-perturbing the data in the benefit of their final quality. 
 | 
				
			||||||
 | 
					Take for example the scenario in Figure~\ref{fig:st-cont}, where {\thethings} are highlighted in gray.
 | 
				
			||||||
 | 
					If we want to protect the {\thething} points, we have to allocate at most a budget of $\varepsilon$ to the {\thethings}, while saving some for the release of regular events.
 | 
				
			||||||
 | 
					Essentially, the more budget we allocate to an event the less we protect it, but at the same time we maintain its utility.
 | 
				
			||||||
 | 
					With {\thething} privacy we propose to distribute the budget taking into account only the existence of the {\thethings} when we release an event of the time series, i.e.,~allocating $\frac{\varepsilon}{5}$ ($4\ \text{\thethings} + 1\ \text{regular point}$) to each event (see  Figure~\ref{fig:st-cont}).
 | 
				
			||||||
 | 
					This way, we still guarantee that the {\thethings} are  adequately protected, as they receive a total budget of $\frac{4\varepsilon}{5}<\varepsilon$. 
 | 
				
			||||||
 | 
					At the same time, we avoid over-perturbing the regular events, as we allocate to them  a higher total budget ($\frac{4\varepsilon}{5}$) than in user-level ($\frac{\varepsilon}{2}$), and thus less noise. 
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
<<<<<<< HEAD
 | 
					 | 
				
			||||||
In this chapter, we propose a novel configurable privacy scheme, \emph{{\thething} privacy}, which takes into account significant events (\emph{\thethings}) in the time series and allocates the available privacy budget accordingly.
 | 
					 | 
				
			||||||
We propose three privacy models that guarantee {\thething} privacy and validate our proposal on real and synthetic data sets.
 | 
					 | 
				
			||||||
\kat{Now, you have space so you need to be more detailed in the discussions, the motivation, the examples etc.}
 | 
					 | 
				
			||||||
\input{problem/thething/motivation}
 | 
					 | 
				
			||||||
\input{problem/thething/contribution}
 | 
					\input{problem/thething/contribution}
 | 
				
			||||||
\input{problem/thething/problem}
 | 
					\input{problem/thething/problem}
 | 
				
			||||||
\input{problem/thething/solution}
 | 
					\input{problem/thething/solution}
 | 
				
			||||||
=======
 | 
					 | 
				
			||||||
In this chapter, we propose a novel configurable privacy scheme, \emph{\thething} privacy, which takes into account significant events (\emph{\thethings}) in the time series and allocates the available privacy budget accordingly.
 | 
					 | 
				
			||||||
We propose two privacy models that guarantee {\thething} privacy.
 | 
					 | 
				
			||||||
To further enhance our privacy method, and protect the landmarks position in the time series, we propose techniques to perturb the initial landmarks set (Section~\ref{sec:theotherthing}). 
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
% and validate our proposal on real and synthetic data sets. \kat{this will go in the experiments section}
 | 
					 | 
				
			||||||
 
 | 
					 | 
				
			||||||
\input{problem/thething/motivation}
 | 
					 | 
				
			||||||
\input{problem/thething/contribution}
 | 
					 | 
				
			||||||
\input{problem/thething/problem}
 | 
					 | 
				
			||||||
\input{problem/theotherthing/main}
 | 
					 | 
				
			||||||
>>>>>>> b334e056b320357ce4f4eaa89a1be7f3576350cf
 | 
					 | 
				
			||||||
\input{problem/thething/summary}
 | 
					 | 
				
			||||||
 | 
				
			|||||||
@ -1,80 +0,0 @@
 | 
				
			|||||||
\section{Motivation}
 | 
					 | 
				
			||||||
\label{sec:lmdk-motiv}
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
% Crowdsensing applications
 | 
					 | 
				
			||||||
The plethora of sensors currently embedded in personal devices and other infrastructures have paved the way for the development of numerous \emph{crowdsensing services} (e.g.,~Ring~\cite{ring}, TousAntiCovid~\cite{tousanticovid}, Waze~\cite{waze}, etc.) based on the collected personal, and usually geotagged and timestamped data.
 | 
					 | 
				
			||||||
% Continuously user-generated data
 | 
					 | 
				
			||||||
User--service interactions gather personal event-like data, that are data items comprised of pairs of an identifying attribute of an individual and the---possibly sensitive---information at a timestamp (including contextual information), e.g.,~(\emph{`Bob', `dining', `Canal Saint-Martin', $17{:}00$}).
 | 
					 | 
				
			||||||
When the interactions are performed in a continuous manner, we obtain ~\emph{time series} of events.
 | 
					 | 
				
			||||||
% Observation/interaction duration
 | 
					 | 
				
			||||||
Depending on the duration, we distinguish the interaction/observation into \emph{finite}, when taking place during a predefined time interval, and \emph{infinite}, when taking place in an uninterrupted fashion.
 | 
					 | 
				
			||||||
Example~\ref{ex:scenario} shows the result of user--LBS interaction while retrieving location-based information or reporting user-state at various locations.
 | 
					 | 
				
			||||||
 
 | 
					 | 
				
			||||||
\begin{example}
 | 
					 | 
				
			||||||
  \label{ex:scenario}
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
  Consider a finite sequence of spatiotemporal data generated by Bob during an interval of $8$ timestamps, as shown in Figure~\ref{fig:scenario}.
 | 
					 | 
				
			||||||
  Events in a shade correspond to privacy-sensitive events that Bob has defined beforehand. For instance his home is around {\'E}lys{\'e}e, his workplace is around the Louvre, and his hangout is around Canal Saint-Martin.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
  \begin{figure}[htp]
 | 
					 | 
				
			||||||
    \centering
 | 
					 | 
				
			||||||
    \includegraphics[width=\linewidth]{lmdk-scenario}
 | 
					 | 
				
			||||||
    \caption{A time series with {\thethings} (highlighted in gray).
 | 
					 | 
				
			||||||
    }
 | 
					 | 
				
			||||||
    \label{fig:scenario}
 | 
					 | 
				
			||||||
  \end{figure}
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
\end{example}
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
% Privacy-preserving data processing
 | 
					 | 
				
			||||||
The services collect and further process the time series in order to give useful feedback to the involved users or to provide valuable insight to various internal/external analytical services.
 | 
					 | 
				
			||||||
The regulation regarding the processing of user-generated data sets~\cite{tankard2016gdpr} requires the provision of privacy guarantees to the users. 
 | 
					 | 
				
			||||||
At the same time, it is essential to provide utility metrics to the final consumers of the privacy-preserving process output. 
 | 
					 | 
				
			||||||
To accomplish this, various privacy techniques perturb the original data or the processing output at the expense of the overall utility of the final output.
 | 
					 | 
				
			||||||
A widely recognized tool that introduces probabilistic randomness to the original data, while quantifying with a parameter $\varepsilon$ (`privacy budget'~\cite{mcsherry2009privacy}) the privacy/utility ratio is \emph{$\varepsilon$-differential privacy}~\cite{dwork2006calibrating}.
 | 
					 | 
				
			||||||
Due to its \emph{composition} property, i.e.,~the combination of differentially private outputs satisfies differential privacy as well, differential privacy is suitable for privacy-preserving time series publishing.
 | 
					 | 
				
			||||||
\emph{Event}, \emph{user}~\cite{dwork2010differential, dwork2010pan}, and \emph{$w$-event}~\cite{kellaris2014differentially} comprise the possible levels of privacy protection.
 | 
					 | 
				
			||||||
Event-level limits the privacy protection to \emph{any single event}, user-level protects \emph{all the events} of any user, and $w$-event provides privacy protection to \emph{any sequence of $w$ events}.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
The privacy mechanisms for the aforementioned levels assume that in a time series any single event, or any sequence of events, or the entire series of events is equally privacy-significant for the users.
 | 
					 | 
				
			||||||
In reality, this is an simplistic assumption.
 | 
					 | 
				
			||||||
The significance of an event is related to certain user-defined privacy criteria, or to its adjacent events, as well as to the entire time series.
 | 
					 | 
				
			||||||
We term significant events as \emph{{\thething} events} or simply \emph{\thethings}. 
 | 
					 | 
				
			||||||
Identifying {\thethings} can be done in an automatic or manual way (but is out of scope for this work).
 | 
					 | 
				
			||||||
For example, in spatiotemporal data, \emph{places where an individual spent some time} denote \emph{points of interest} (POIs) (called also stay points)~\cite{zheng2015trajectory}.
 | 
					 | 
				
			||||||
Such events, and more particularly their spatial attribute values, can be less privacy-sensitive~\cite{primault2018long}, e.g.,~parks, theaters, etc. or, if individuals frequent them, they can reveal supplementary information, e.g.,~residences (home addresses)~\cite{gambs2010show}, places of worship (religious beliefs)~\cite{franceschi-bicchierairussell2015redditor}, etc.
 | 
					 | 
				
			||||||
POIs can be an example of how we can choose {\thethings}, but the idea is not limited to these.
 | 
					 | 
				
			||||||
Another example is the detection of privacy-sensitive user interactions by \emph{contact tracing} applications.
 | 
					 | 
				
			||||||
This can be practical in decease control~\cite{eames2003contact}, similar to the recent outbreak of the Coronavirus disease 2019 (COVID-19) epidemic~\cite{ahmed2020survey}.
 | 
					 | 
				
			||||||
Last but not least, {\thethings} in \emph{smart grid} electricity usage patterns could not only reveal the energy consumption of a user but also information regarding activities, e.g.,~`at work', `sleeping', etc. and types of appliances already installed or recently purchased~\cite{khurana2010smart}.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
\begin{example}
 | 
					 | 
				
			||||||
  \label{ex:st-cont}
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
  Figure~\ref{fig:st-cont} shows the case when we want to protect all of Bob's significant events ($p_1$, $p_3$, $p_5$, $p_8$) in his trajectory shown in Figure~\ref{fig:scenario}.
 | 
					 | 
				
			||||||
  % That is, we have to allocate privacy budget $\varepsilon$ such that at any timestamp $t$ it holds that $\varepsilon_t + \varepsilon_1 + \varepsilon_3 + \varepsilon_5 + \varepsilon_8 \leq \varepsilon$.
 | 
					 | 
				
			||||||
  In this scenario, event-level protection is not suitable since it can only protect one event at a time.
 | 
					 | 
				
			||||||
  Hence, we have to apply user-level privacy protection by distributing equal portions of $\varepsilon$ to all the events, i.e.,~$\frac{\varepsilon}{8}$ to each one (the equivalent of applying $8$-event privacy).
 | 
					 | 
				
			||||||
  In this way, we have protected the {\thething} points; we have allocated a total of $\frac{\varepsilon}{2}<\varepsilon$ to the {\thethings}. 
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
  \begin{figure}[htp]
 | 
					 | 
				
			||||||
    \centering
 | 
					 | 
				
			||||||
    \includegraphics[width=\linewidth]{st-cont}
 | 
					 | 
				
			||||||
    \caption{User-level and {\thething} $\varepsilon$-differential privacy protection for the time series of Figure~\ref{fig:scenario}.}
 | 
					 | 
				
			||||||
    \label{fig:st-cont}
 | 
					 | 
				
			||||||
  \end{figure}
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
  However, perturbing by $\frac{\varepsilon}{8}$ each regular point deteriorates the data utility unnecessarily.
 | 
					 | 
				
			||||||
  Notice that the overall privacy budget that we ended up allocating to the user-defined significant events is equal to $\frac{\varepsilon}{2}$ and leaves an equal amount of budget to distribute to any current event.
 | 
					 | 
				
			||||||
  In other words, uniformly allocating $\frac{\varepsilon}{5}$ to every event would still achieve the Bob's privacy goal, i.e.,~protect every significant event, while achieving better utility overall.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
\end{example}
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
We argue that protecting only {\thething} events along with any regular event release is sufficient for the user's protection, while it improves data utility.
 | 
					 | 
				
			||||||
Considering {\thething} events can prevent over-perturbing the data in the benefit of their final quality. 
 | 
					 | 
				
			||||||
Take for example the scenario in Figure~\ref{fig:st-cont}, where {\thethings} are highlighted in gray.
 | 
					 | 
				
			||||||
If we want to protect the {\thething} points, we have to allocate at most a budget of $\varepsilon$ to the {\thethings}, while saving some for the release of regular events.
 | 
					 | 
				
			||||||
Essentially, the more budget we allocate to an event the less we protect it, but at the same time we maintain its utility.
 | 
					 | 
				
			||||||
With {\thething} privacy we propose to distribute the budget taking into account only the existence of the {\thethings} when we release an event of the time series, i.e.,~allocating $\frac{\varepsilon}{5}$ ($4\ \text{\thethings} + 1\ \text{regular point}$) to each event (see  Figure~\ref{fig:st-cont}).
 | 
					 | 
				
			||||||
This way, we still guarantee that the {\thethings} are  adequately protected, as they receive a total budget of $\frac{4\varepsilon}{5}<\varepsilon$. 
 | 
					 | 
				
			||||||
At the same time, we avoid over-perturbing the regular events, as we allocate to them  a higher total budget ($\frac{4\varepsilon}{5}$) than in user-level ($\frac{\varepsilon}{2}$), and thus less noise. 
 | 
					 | 
				
			||||||
@ -1,11 +1,8 @@
 | 
				
			|||||||
<<<<<<< HEAD
 | 
					\subsection{Problem definition}
 | 
				
			||||||
\subsection{Problem description and definition}
 | 
					 | 
				
			||||||
\label{subsec:lmdk-prob}
 | 
					\label{subsec:lmdk-prob}
 | 
				
			||||||
=======
 | 
					 | 
				
			||||||
\section{{\Thething} privacy}
 | 
					 | 
				
			||||||
\label{sec:lmdk-prob}
 | 
					 | 
				
			||||||
>>>>>>> b334e056b320357ce4f4eaa89a1be7f3576350cf
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\subsubsection{Setting}
 | 
				
			||||||
 | 
					\label{subsec:lmdk-set}
 | 
				
			||||||
Our problem setting consists of three entities: (i) data generators (users), (ii) data publishers (trusted non-adversarial entities), and (iii) data consumers (possibly adversarial entities). 
 | 
					Our problem setting consists of three entities: (i) data generators (users), (ii) data publishers (trusted non-adversarial entities), and (iii) data consumers (possibly adversarial entities). 
 | 
				
			||||||
Users generate sensitive data, which are processed in a secure and private way by a trusted curator and are later published in order to be consumed by potentially adversarial data analysts. 
 | 
					Users generate sensitive data, which are processed in a secure and private way by a trusted curator and are later published in order to be consumed by potentially adversarial data analysts. 
 | 
				
			||||||
%The data unit produced by the users is an \emph{event}, i.e., a piece of timestamped user-related information.\kat{should we say geo-stamped?}. 
 | 
					%The data unit produced by the users is an \emph{event}, i.e., a piece of timestamped user-related information.\kat{should we say geo-stamped?}. 
 | 
				
			||||||
@ -33,7 +30,7 @@ Notice that, in a real life scenario, $E_g$ and $E_c$ might overlap with each ot
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
\subsubsection{Privacy goal}
 | 
					\subsubsection{Privacy goal}
 | 
				
			||||||
\label{subsec:prv-g}
 | 
					\label{subsec:lmdk-goal}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
We argue that in continuous user-generated data publishing, events are not equally `significant' in terms of privacy.
 | 
					We argue that in continuous user-generated data publishing, events are not equally `significant' in terms of privacy.
 | 
				
			||||||
% We term a significant event---according to user- or data-related criteria---as a \emph{\thething}~event.
 | 
					% We term a significant event---according to user- or data-related criteria---as a \emph{\thething}~event.
 | 
				
			||||||
 | 
				
			|||||||
@ -1,7 +1,6 @@
 | 
				
			|||||||
\subsection{Achieving {\thething} privacy}
 | 
					\subsection{Achieving {\thething} privacy}
 | 
				
			||||||
\label{subsec:lmdk-sol}
 | 
					\label{subsec:lmdk-sol}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					 | 
				
			||||||
\subsubsection{{\Thething} privacy mechanisms}
 | 
					\subsubsection{{\Thething} privacy mechanisms}
 | 
				
			||||||
\label{subsec:lmdk-mechs}
 | 
					\label{subsec:lmdk-mechs}
 | 
				
			||||||
% \kat{add the two models -- uniform and dynamic  and skip}
 | 
					% \kat{add the two models -- uniform and dynamic  and skip}
 | 
				
			||||||
@ -132,6 +131,7 @@ to the next timestamps.
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
\subsubsection{{\Thething} privacy under temporal correlation}
 | 
					\subsubsection{{\Thething} privacy under temporal correlation}
 | 
				
			||||||
\label{subsec:lmdk-cor}
 | 
					\label{subsec:lmdk-cor}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
From the discussion so far, it is evident that for the budget distribution it is not the positions but rather the number of the {\thethings} that matters.
 | 
					From the discussion so far, it is evident that for the budget distribution it is not the positions but rather the number of the {\thethings} that matters.
 | 
				
			||||||
However, this is not the case under the presence of temporal correlation, which is inherent in continuously generated data.
 | 
					However, this is not the case under the presence of temporal correlation, which is inherent in continuously generated data.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
				
			|||||||
		Reference in New Issue
	
	Block a user