115 lines
		
	
	
		
			7.9 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
			
		
		
	
	
			115 lines
		
	
	
		
			7.9 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
| \subsection{Problem definition}
 | |
| \label{subsec:lmdk-prob}
 | |
| 
 | |
| \subsubsection{Setting}
 | |
| \label{subsec:lmdk-set}
 | |
| Our problem setting consists of three entities: (i)~data generators (users), (ii)~data publishers (trusted non-adversarial entities), and (iii)~data consumers (possibly adversarial entities). 
 | |
| Users generate sensitive data, which are processed in a secure and private way by a trusted curator and are later published in order to be consumed by potentially adversarial data analysts. 
 | |
| %The data unit produced by the users is an \emph{event}, i.e., a piece of timestamped user-related information.\kat{should we say geo-stamped?}. 
 | |
| Data are produced as a series of events, which we call time series. 
 | |
| An \emph{event} is defined as a triple of an identifying attribute of an individual and the possibly sensitive data at a timestamp.
 | |
| %This workflow is repeated in a continuous manner, producing series of events, which we call time series. 
 | |
| %, producing, processing, publishing, and consuming events in a private manner. 
 | |
| %\kat{keep only the terms with a small description.}
 | |
| \begin{enumerate}[(i)]
 | |
| 
 | |
|  \item \textbf{Data generators} (users) entity $E_g$ interacts with a crowdsensing application and produces continuously privacy-sensitive data items in an arbitrary frequency during the application's usage period $T = (t)_{t \in \mathbb{N}}$.
 | |
|  Thus, at each timestamp $t$, $E_g$ generates a data set $D_t \in \mathcal{D}$ where each of its members contributes a single data item.
 | |
| 
 | |
|  \item \textbf{Data publishers} (trusted non-adversarial) entity $E_p$ receives the data sent by $E_g$ in the form of a series of events in $T$.
 | |
|  Following the \emph{global} processing and publishing scheme, $E_p$ collects at $t$ a data set $D_t$ and privacy-protects it by applying the respective privacy mechanism $\mathcal{M}_t$.
 | |
|  $\mathcal{M}_t$ uses independent randomness such that it satisfies $\varepsilon_t$-differential privacy.
 | |
| 
 | |
|  \item \textbf{Data consumers} (possibly adversarial) entity $E_c$ receives the result $\mathbf{o}_t$ of the privacy-preserving processing of $D_t$ by $E_p$.
 | |
|  According to Theorem~\ref{theor:compo-seq-ind}, the overall privacy guarantee of the outputs of $\mathcal{M}$ is equal to the sum of all the privacy budgets of the respective privacy mechanisms that compose $\mathcal{M}$, i.e.,~$\sum_{t \in T}\varepsilon_t$.
 | |
| 
 | |
| \end{enumerate}
 | |
| 
 | |
| We assume that all the interactions between $E_g$ and $E_p$ are secure and private, and thus $E_p$ is considered trusted and non-adversarial by $E_g$.
 | |
| Notice that, in a real life scenario, $E_g$ and $E_c$ might overlap with each other, i.e.,~data producers might be data consumers as well.
 | |
| 
 | |
| 
 | |
| \subsubsection{Privacy goal}
 | |
| \label{subsec:lmdk-goal}
 | |
| 
 | |
| We argue that in continuous user-generated data publishing, events are not equally `significant' in terms of privacy.
 | |
| % We term a significant event---according to user- or data-related criteria---as a \emph{\thething}~event.
 | |
| The identification of {\thething} events can be performed manually or automatically~\cite{zhou2004discovering, hariharan2004project}, and is an orthogonal problem to this current work.
 | |
| In this work, we consider the {\thething} timestamps non-sensitive and provided by the user as input along with the privacy budget $\varepsilon$.
 | |
| For example, events $p_1$, $p_3$, $p_5$, $p_8$ in Figure~\ref{fig:scenario} are {\thething} events.
 | |
| We give the definition of {\thethings} below (Definition~\ref{def:thething-evnt}).
 | |
| 
 | |
| % A significant event or item signals its consequence to us, toward us.
 | |
| % https://www.quora.com/What-is-the-difference-between-significant-and-important
 | |
| \begin{definition}
 | |
|   [{\Thething} event]
 | |
|   \label{def:thething-evnt}
 | |
|   A {\thething} event is a significant---according to user- or data-related criteria---user-generated data item.
 | |
| \end{definition}
 | |
| 
 | |
| Definition~\ref{def:thething-nb} extends the notion of neighboring data sets to the context of {\thethings}.
 | |
| 
 | |
| \begin{definition}
 | |
|   [{\Thething} neighboring time series]
 | |
|   \label{def:thething-nb}
 | |
|   Two time series of equal lengths are \emph{{\thething} neighboring} when they differ by a single {\thething} event.
 | |
| \end{definition}
 | |
| 
 | |
| For example, the time series ($p_1$, \dots, $p_8$) with {\thethings} set the \{$p_1$, $p_3$, $p_5$\} is {\thething} neighboring to the time series of Figure~\ref{fig:scenario}.
 | |
| Therefore, Corollary~\ref{cor:thething-nb} follows.
 | |
| 
 | |
| \begin{corollary}
 | |
|   \label{cor:thething-nb}
 | |
|   Two {\thething} neighboring time series are event neighboring as well.
 | |
| \end{corollary}
 | |
| 
 | |
| We proceed to propose \emph{{\thething} privacy}, a configurable variation of differential privacy for time series (Definition~\ref{def:thething-prv}). 
 | |
| 
 | |
| \begin{definition}
 | |
|   [{\Thething} privacy]
 | |
|   \label{def:thething-prv}
 | |
|   Let $\mathcal{M}$ be a privacy mechanism with range $\mathcal{O}$ that takes as input a time series.
 | |
|   We say that $\mathcal{M}$ satisfies {\thething} $\varepsilon$-differential privacy (or, simply, {\thething} privacy) if for all sets of possible outputs $O \subseteq \mathcal{O}$, and for every pair of {\thething}-neighboring time series $S_T$, $S_T'$,
 | |
|   % and all $T = (t)_{t \in \mathbb{N}}$, 
 | |
|   it holds that
 | |
|   $$Pr[\mathcal{M}(S_T) \in O] \leq e^\varepsilon Pr[\mathcal{M}(S_T') \in O]$$
 | |
| \end{definition}
 | |
| 
 | |
| User-level privacy can achieve {\thething} privacy, but it over-perturbs the final data by not distinguishing into {\thething} and regular events.
 | |
| Theorem~\ref{theor:thething-prv} proposes how to achieve the desired privacy for the {\thethings} (i.e.,~a total budget lower than $\varepsilon$), and in the same time provide better quality overall. 
 | |
| 
 | |
| \begin{theorem}
 | |
|   [{\Thething} privacy]
 | |
|   \label{theor:thething-prv}
 | |
|   Let $\mathcal{M}$ be a mechanism with input a time series $S_T$, where $T$ is the set of the  involved timestamps, and $L \subseteq T$ be the set of {\thething}  timestamps.
 | |
|   $\mathcal{M}$ is decomposed to $\varepsilon$-differential private sub-mechanisms $\mathcal{M}_t$, for every $t \in T$, that apply independent randomness to the data item at $t$.
 | |
|   Then, given a privacy budget $\varepsilon$, $\mathcal{M}$ satisfies {\thething} privacy if for every $t$ it holds that
 | |
|   $$ \sum_{i\in L \cup \{t\}} \varepsilon_i \leq \varepsilon$$
 | |
| \end{theorem}
 | |
| 
 | |
| \begin{proof}
 | |
|   \label{pf:thething-prv}
 | |
|   All mechanisms use independent randomness, and therefore for a time series $S_T = {D_1, \dots, D_T}$ and outputs $(\pmb{o}_1, \dots, \pmb{o}_T) \in O \subseteq \mathcal{O}$ it holds that
 | |
| 
 | |
|   $$Pr[\mathcal{M}(S_T) = (\pmb{o}_1, \dots, \pmb{o}_T)] = \prod_{i \in [1, T]} Pr[\mathcal{M}_i(D_i) = \pmb{o}_i]$$
 | |
| 
 | |
|   Likewise, for any {\thething}-neighboring time series $S'_T$ of $S_T$ with the same outputs $(\pmb{o}_1, \dots, \pmb{o}_T) \in O \subseteq \mathcal{O}$
 | |
| 
 | |
|   $$Pr[\mathcal{M}(S'_T) = (\pmb{o}_1, \dots, \pmb{o}_T)] = \prod_{i \in [1, T]} Pr[\mathcal{M}_i(D'_i) = \pmb{o}_i]$$
 | |
| 
 | |
|   Since $S_T$ and $S'_T$ are {\thething}-neighboring, there exists $i \in T$ such that $D_i = D'_i$ for a set of {\thethings} with timestamps $L$.
 | |
|   Thus, we get
 | |
| 
 | |
|   $$\frac{Pr[\mathcal{M}(S_T) = (\pmb{o}_1, \dots, \pmb{o}_T)]}{Pr[\mathcal{M}(S'_T) = (\pmb{o}_1, \dots, \pmb{o}_T)]} = \prod_{i \in L \cup \{t\}} \frac{Pr[\mathcal{M}_i(D_i) = \pmb{o}_i]}{Pr[\mathcal{M}_i(D'_i) = \pmb{o}_i]}$$
 | |
| 
 | |
|   $D_i$ and $D'_i$ are neighboring for $i \in L \cup \{t\}$.
 | |
|   $\mathcal{M}_i$ is differential private and from Definition~\ref{def:dp} we get that $\frac{Pr[\mathcal{M}_i(D_i) = \pmb{o}_i]}{Pr[\mathcal{M}_i(D'_i) = \pmb{o}_i]} \leq e^{\varepsilon_i}$.
 | |
|   Hence, we can write
 | |
| 
 | |
|   $$\frac{Pr[\mathcal{M}(S_T) = (\pmb{o}_1, \dots, \pmb{o}_T)]}{Pr[\mathcal{M}(S'_T) = (\pmb{o}_1, \dots, \pmb{o}_T)]} \leq \prod_{i \in L \cup \{t\}} e^{\varepsilon_i} = e^{\sum_{i \in L \cup \{t\}} \varepsilon_i}$$
 | |
| 
 | |
|   For any $O \in \mathcal{O}$ we get $\frac{Pr[\mathcal{M}(S_T) \in O}{Pr[\mathcal{M}(S'_T) \in O]} \leq e^{\sum_{i \in L \cup \{t\}} \varepsilon_i}$.
 | |
|   If the formula of Theorem~\ref{theor:thething-prv} holds, then we get $\frac{Pr[\mathcal{M}(S_T) \in O}{Pr[\mathcal{M}(S'_T) \in O]} \leq e^\varepsilon$.
 | |
|   Due to Definition~\ref{def:thething-prv} this concludes our proof.
 | |
| \end{proof}
 |