the-last-thing/text/problem/thething/problem.tex

\subsection{Problem definition}
\label{subsec:lmdk-prob}
In this section, we introduce a new privacy definition.


\subsubsection{Setting}
\label{subsec:lmdk-set}
Our problem setting consists of three entities: (i)~data generators (users), (ii)~data publishers (trusted non-adversarial entities), and (iii)~data consumers (possibly adversarial entities). 
Users generate a finite series of sensitive data over time, which are processed in batch mode in a secure and private way locally (or by a trusted curator) and are later published in order to be consumed by potentially adversarial data analysts. 
Data are produced as a series of events, which we call time series. 

% Users generate sensitive data, which are processed in a secure and private way by a trusted curator and are later published in order to be consumed by potentially adversarial data analysts. 
%The data unit produced by the users is an \emph{event}, i.e., a piece of timestamped user-related information.\kat{should we say geo-stamped?}. 
% Data are produced as a series of events, which we call time series. 
% An \emph{event} is defined as a triple of an identifying attribute of an individual and the possibly sensitive data at a timestamp.
%This workflow is repeated in a continuous manner, producing series of events, which we call time series. 
%, producing, processing, publishing, and consuming events in a private manner. 
\begin{enumerate}[(i)]

 \item \textbf{Data generators} (users) entity $E_g$ interacts with a crowdsensing application and produces continuously privacy-sensitive data items in an arbitrary frequency during the application's usage period $T = (i)_{i \in \mathbb{N}}$.
 Thus, at each timestamp $t$, $E_g$ generates a data set $D_i \in \mathcal{D}$ where each of its members contributes a single data item.

 \item \textbf{Data publishers} (trusted non-adversarial) entity $E_p$ receives the data sent by $E_g$ in the form of a series of events in $T$.
 Following the \emph{global} processing and publishing scheme, $E_p$ collects at $t$ a data set $D_i$ and privacy-protects it by applying the respective privacy mechanism $\mathcal{M}_i$.
 $\mathcal{M}_i$ uses independent randomness such that it satisfies $\varepsilon_i$-differential privacy.

 \item \textbf{Data consumers} (possibly adversarial) entity $E_c$ receives the result $\mathbf{o}_i$ of the privacy-preserving processing of $D_i$ by $E_p$.
 According to Theorem~\ref{theor:compo-seq-ind}, the overall privacy guarantee of the outputs of $\mathcal{M}$ is equal to the sum of all the privacy budgets of the respective privacy mechanisms that compose $\mathcal{M}$, i.e.,~$\sum_{i \in T}\varepsilon_i$.

\end{enumerate}

We assume that all the interactions between $E_g$ and $E_p$ are secure and private, and thus $E_p$ is considered trusted and non-adversarial by $E_g$.
Notice that, in a real life scenario, $E_g$ and $E_c$ might overlap with each other, i.e.,~data producers might be data consumers as well.


\subsubsection{Privacy goal}
\label{subsec:lmdk-goal}
We argue that in continuous user-generated data publishing, events are not equally significant in terms of privacy.
We term a significant event---according to user- or data-related criteria---as a \emph{\thething}~event.
The identification of {\thething} events can be performed manually or automatically, and is an orthogonal problem to ours.
% and we address it subsequently in Section~\ref{subsec:lmdk-sel-sol}.
First, we consider the {\thething} timestamps, i.e.,~their position in time, non-sensitive and provided by the user as input along with the privacy budget $\varepsilon$.
For example, events $p_1$, $p_3$, $p_5$, $p_8$ in Figure~\ref{fig:lmdk-scenario} are {\thething} events.
In Definition~\ref{def:thething-evnt}, we formally introduce {\thethings} in the context of privacy-preserving data publishing.

% A significant event or item signals its consequence to us, toward us.
% https://www.quora.com/What-is-the-difference-between-significant-and-important
\begin{definition}
  [{\Thething} event]
  \label{def:thething-evnt}
  A {\thething} event is a significant---according to user- or data-related criteria---user-generated data item.
\end{definition}

Definition~\ref{def:thething-nb} extends the notion of neighboring data sets (see Section~\ref{subsec:prv-statistical}) to the context of {\thethings}. 

\begin{definition}
  [{\Thething} neighboring time series]
  \label{def:thething-nb}
  Two time series of the same length, with common starting and ending timestamps, are \emph{{\thething} neighboring} when their elements are pairwise, i.e.,~at the same timestamps, equal or neighboring and their neighboring elements are on common {\thethings} and/or at most on one regular event.
\end{definition}

% For example, the time series ($p_1$, \dots, $p_8$) with {\thethings} set the \{$p_1$, $p_3$, $p_5$\} is {\thething} neighboring to the time series of Figure~\ref{fig:lmdk-scenario}.
% Therefore, Corollary~\ref{cor:thething-nb} follows.

% \begin{corollary}
%   \label{cor:thething-nb}
%   Two {\thething} neighboring time series are event neighboring as well.
% \end{corollary}

In Definition~\ref{def:thething-prv}, we proceed to propose \emph{{\thething} privacy}, a configurable variation of differential privacy for time series with  significant events.

\begin{definition}
  [{\Thething} privacy]
  \label{def:thething-prv}
  Let $\mathcal{M}$ be a privacy mechanism with range $\mathcal{O}$ and domain $\mathcal{S}_T$ being the set of all  time series with length $|T|$, where $T$ is a sequence of timestamps.
  $\mathcal{M}$ satisfies {\thething} $\varepsilon$-differential privacy (or, simply, {\thething} privacy) if for all sets $O \subseteq \mathcal{O}$, and for every pair of {\thething}-neighboring time series $S_T$, $S_T'$, it holds that
  $$Pr[\mathcal{M}(S_T) \in O] \leq e^\varepsilon Pr[\mathcal{M}(S_T') \in O]$$
\end{definition}

User-level privacy can achieve {\thething} privacy, but it over-perturbs the final data by not distinguishing between {\thething} and regular events.
Theorem~\ref{theor:thething-prv} states how to achieve the desired privacy goal for the {\thethings} and any event, i.e.,~a total budget less than $\varepsilon$, and at the same time provide better utility overall. 

\begin{theorem}
  [{\Thething} privacy]
  \label{theor:thething-prv}
  Let $\mathcal{M}$ be a mechanism with input a time series $S_T$, where $T$ is the set of the  involved timestamps, and $L \subseteq T$ be the set of {\thething} timestamps.
  $\mathcal{M}$ is decomposed to $\varepsilon$-differential private sub-mechanisms $\mathcal{M}_t$, for every $t \in T$, which apply independent randomness to the event at $t$.
  Then, given a privacy budget $\varepsilon$, $\mathcal{M}$ satisfies $(\varepsilon, L)$-\emph{{\thething} privacy} if for any $t$ it holds that
  $$ \sum_{i\in L \cup \{t\}} \varepsilon_i \leq \varepsilon$$
\end{theorem}

\begin{proof}
  \label{pf:thething-prv}
  All mechanisms use independent randomness, and therefore for a time series $S_T = (D_i)_{i \in T}$ and outputs $(\pmb{o}_i)_{i \in T} \in O \subseteq \mathcal{O}$ it holds that

  $$Pr[\mathcal{M}(S_T) = (\pmb{o}_i)_{i \in T}] = \prod_{i \in T} Pr[\mathcal{M}_i(D_i) = \pmb{o}_i]$$

  Likewise, for any {\thething}-neighboring time series $S'_T$ of $S_T$ with the same outputs $(\pmb{o}_i)_{i \in T} \in O \subseteq \mathcal{O}$

  $$Pr[\mathcal{M}(S'_T) = (\pmb{o}_i)_{i \in T}] = \prod_{i \in T} Pr[\mathcal{M}_i(D'_i) = \pmb{o}_i]$$

  According to Definition~\ref{def:thething-nb}, there exists $L \cup \{t\} \subseteq T$ such that $D_i = D'_i$ for $i \in L \cup \{t\}$.
  Thus, we get

  $$\frac{Pr[\mathcal{M}(S_T) = (\pmb{o}_i)_{i \in T}]}{Pr[\mathcal{M}(S'_T) = (\pmb{o}_i)_{i \in T}]} = \prod_{i \in L \cup \{t\}} \frac{Pr[\mathcal{M}_i(D_i) = \pmb{o}_i]}{Pr[\mathcal{M}_i(D'_i) = \pmb{o}_i]}$$

  $D_i$ and $D'_i$ are neighboring for $i \in L \cup \{t\}$.
  $\mathcal{M}_i$ is differential private and from Definition~\ref{def:dp} we get that $\frac{Pr[\mathcal{M}_i(D_i) = \pmb{o}_i]}{Pr[\mathcal{M}_i(D'_i) = \pmb{o}_i]} \leq e^{\varepsilon_i}$.
  Hence, we can write

  $$\frac{Pr[\mathcal{M}(S_T) = (\pmb{o}_i)_{i \in T}]}{Pr[\mathcal{M}(S'_T) = (\pmb{o}_i)_{i \in T}]} \leq \prod_{i \in L \cup \{t\}} e^{\varepsilon_i} = e^{\sum_{i \in L \cup \{t\}} \varepsilon_i}$$

  For any $O \in \mathcal{O}$ we get $\frac{Pr[\mathcal{M}(S_T) \in O]}{Pr[\mathcal{M}(S'_T) \in O]} \leq e^{\sum_{i \in L \cup \{t\}} \varepsilon_i}$.
  If the formula of Theorem~\ref{theor:thething-prv} holds, then we get $\frac{Pr[\mathcal{M}(S_T) \in O]}{Pr[\mathcal{M}(S'_T) \in O]} \leq e^\varepsilon$.
  Due to Definition~\ref{def:thething-prv} this concludes our proof.
\end{proof}
problem: Structure 2021-10-08 21:32:06 +02:00			`\subsection{Problem definition}`
Chapters and sections 2021-09-07 16:06:42 +02:00			`\label{subsec:lmdk-prob}`
problem: Review thething 2021-10-25 01:27:48 +02:00			`In this section, we introduce a new privacy definition.`

the-thing: Added CIKM pre-print 2021-07-17 20:05:10 +02:00
problem: Structure 2021-10-08 21:32:06 +02:00			`\subsubsection{Setting}`
			`\label{subsec:lmdk-set}`
thething: Minor corrections 2021-10-10 19:49:38 +02:00			`Our problem setting consists of three entities: (i)~data generators (users), (ii)~data publishers (trusted non-adversarial entities), and (iii)~data consumers (possibly adversarial entities).`
problem: Minor corrections 2021-10-25 05:06:25 +02:00			`Users generate a finite series of sensitive data over time, which are processed in batch mode in a secure and private way locally (or by a trusted curator) and are later published in order to be consumed by potentially adversarial data analysts.`
the-thing: Added CIKM pre-print 2021-07-17 20:05:10 +02:00			`Data are produced as a series of events, which we call time series.`
problem: Minor corrections 2021-10-25 05:06:25 +02:00
			`% Users generate sensitive data, which are processed in a secure and private way by a trusted curator and are later published in order to be consumed by potentially adversarial data analysts.`
			`%The data unit produced by the users is an \emph{event}, i.e., a piece of timestamped user-related information.\kat{should we say geo-stamped?}.`
			`% Data are produced as a series of events, which we call time series.`
			`% An \emph{event} is defined as a triple of an identifying attribute of an individual and the possibly sensitive data at a timestamp.`
the-thing: Added CIKM pre-print 2021-07-17 20:05:10 +02:00			`%This workflow is repeated in a continuous manner, producing series of events, which we call time series.`
			`%, producing, processing, publishing, and consuming events in a private manner.`
thething: WIP 2021-10-08 20:12:55 +02:00			`\begin{enumerate}[(i)]`

problem: Review thething 2021-10-25 01:27:48 +02:00			`\item \textbf{Data generators} (users) entity $E_g$ interacts with a crowdsensing application and produces continuously privacy-sensitive data items in an arbitrary frequency during the application's usage period $T = (i)_{i \in \mathbb{N}}$.`
			`Thus, at each timestamp $t$, $E_g$ generates a data set $D_i \in \mathcal{D}$ where each of its members contributes a single data item.`
thething: WIP 2021-10-08 20:12:55 +02:00
			`\item \textbf{Data publishers} (trusted non-adversarial) entity $E_p$ receives the data sent by $E_g$ in the form of a series of events in $T$.`
problem: Review thething 2021-10-25 01:27:48 +02:00			`Following the \emph{global} processing and publishing scheme, $E_p$ collects at $t$ a data set $D_i$ and privacy-protects it by applying the respective privacy mechanism $\mathcal{M}_i$.`
			`$\mathcal{M}_i$ uses independent randomness such that it satisfies $\varepsilon_i$-differential privacy.`
thething: WIP 2021-10-08 20:12:55 +02:00
problem: Review thething 2021-10-25 01:27:48 +02:00			`\item \textbf{Data consumers} (possibly adversarial) entity $E_c$ receives the result $\mathbf{o}_i$ of the privacy-preserving processing of $D_i$ by $E_p$.`
			`According to Theorem~\ref{theor:compo-seq-ind}, the overall privacy guarantee of the outputs of $\mathcal{M}$ is equal to the sum of all the privacy budgets of the respective privacy mechanisms that compose $\mathcal{M}$, i.e.,~$\sum_{i \in T}\varepsilon_i$.`
thething: WIP 2021-10-08 20:12:55 +02:00
			`\end{enumerate}`

			`We assume that all the interactions between $E_g$ and $E_p$ are secure and private, and thus $E_p$ is considered trusted and non-adversarial by $E_g$.`
			`Notice that, in a real life scenario, $E_g$ and $E_c$ might overlap with each other, i.e.,~data producers might be data consumers as well.`


			`\subsubsection{Privacy goal}`
problem: Structure 2021-10-08 21:32:06 +02:00			`\label{subsec:lmdk-goal}`
problem: Review thething 2021-10-25 01:27:48 +02:00			`We argue that in continuous user-generated data publishing, events are not equally significant in terms of privacy.`
			`We term a significant event---according to user- or data-related criteria---as a \emph{\thething}~event.`
			`The identification of {\thething} events can be performed manually or automatically, and is an orthogonal problem to ours.`
			`% and we address it subsequently in Section~\ref{subsec:lmdk-sel-sol}.`
			`First, we consider the {\thething} timestamps, i.e.,~their position in time, non-sensitive and provided by the user as input along with the privacy budget $\varepsilon$.`
			`For example, events $p_1$, $p_3$, $p_5$, $p_8$ in Figure~\ref{fig:lmdk-scenario} are {\thething} events.`
			`In Definition~\ref{def:thething-evnt}, we formally introduce {\thethings} in the context of privacy-preserving data publishing.`
the-thing: Added CIKM pre-print 2021-07-17 20:05:10 +02:00
			`% A significant event or item signals its consequence to us, toward us.`
			`% https://www.quora.com/What-is-the-difference-between-significant-and-important`
thething: Added the landmark definition 2021-09-19 22:47:09 +02:00			`\begin{definition}`
thething: WIP 2021-10-08 20:12:55 +02:00			`[{\Thething} event]`
			`\label{def:thething-evnt}`
			`A {\thething} event is a significant---according to user- or data-related criteria---user-generated data item.`
thething: Added the landmark definition 2021-09-19 22:47:09 +02:00			`\end{definition}`
the-thing: Added CIKM pre-print 2021-07-17 20:05:10 +02:00
problem: Review thething 2021-10-25 01:27:48 +02:00			`Definition~\ref{def:thething-nb} extends the notion of neighboring data sets (see Section~\ref{subsec:prv-statistical}) to the context of {\thethings}.`
the-thing: Added CIKM pre-print 2021-07-17 20:05:10 +02:00
thething: WIP 2021-10-08 20:12:55 +02:00			`\begin{definition}`
			`[{\Thething} neighboring time series]`
			`\label{def:thething-nb}`
problem: Review thething 2021-10-25 01:27:48 +02:00			`Two time series of the same length, with common starting and ending timestamps, are \emph{{\thething} neighboring} when their elements are pairwise, i.e.,~at the same timestamps, equal or neighboring and their neighboring elements are on common {\thethings} and/or at most on one regular event.`
thething: WIP 2021-10-08 20:12:55 +02:00			`\end{definition}`
the-thing: Added CIKM pre-print 2021-07-17 20:05:10 +02:00
problem: Review thething 2021-10-25 01:27:48 +02:00			`% For example, the time series ($p_1$, \dots, $p_8$) with {\thethings} set the \{$p_1$, $p_3$, $p_5$\} is {\thething} neighboring to the time series of Figure~\ref{fig:lmdk-scenario}.`
			`% Therefore, Corollary~\ref{cor:thething-nb} follows.`
the-thing: Added CIKM pre-print 2021-07-17 20:05:10 +02:00
problem: Review thething 2021-10-25 01:27:48 +02:00			`% \begin{corollary}`
			`% \label{cor:thething-nb}`
			`% Two {\thething} neighboring time series are event neighboring as well.`
			`% \end{corollary}`
the-thing: Added CIKM pre-print 2021-07-17 20:05:10 +02:00
problem: Review thething 2021-10-25 01:27:48 +02:00			`In Definition~\ref{def:thething-prv}, we proceed to propose \emph{{\thething} privacy}, a configurable variation of differential privacy for time series with significant events.`
thething: Minor corrections for cor:thething-nb 2021-09-19 23:06:12 +02:00
the-thing: Added CIKM pre-print 2021-07-17 20:05:10 +02:00			`\begin{definition}`
thething: WIP 2021-10-08 20:12:55 +02:00			`[{\Thething} privacy]`
the-thing: Added CIKM pre-print 2021-07-17 20:05:10 +02:00			`\label{def:thething-prv}`
problem: Review thething 2021-10-25 01:27:48 +02:00			`Let $\mathcal{M}$ be a privacy mechanism with range $\mathcal{O}$ and domain $\mathcal{S}_T$ being the set of all time series with length $\|T\|$, where $T$ is a sequence of timestamps.`
			`$\mathcal{M}$ satisfies {\thething} $\varepsilon$-differential privacy (or, simply, {\thething} privacy) if for all sets $O \subseteq \mathcal{O}$, and for every pair of {\thething}-neighboring time series $S_T$, $S_T'$, it holds that`
the-thing: Added CIKM pre-print 2021-07-17 20:05:10 +02:00			`$$Pr[\mathcal{M}(S_T) \in O] \leq e^\varepsilon Pr[\mathcal{M}(S_T') \in O]$$`
			`\end{definition}`

problem: Review thething 2021-10-25 01:27:48 +02:00			`User-level privacy can achieve {\thething} privacy, but it over-perturbs the final data by not distinguishing between {\thething} and regular events.`
			`Theorem~\ref{theor:thething-prv} states how to achieve the desired privacy goal for the {\thethings} and any event, i.e.,~a total budget less than $\varepsilon$, and at the same time provide better utility overall.`
the-thing: Added CIKM pre-print 2021-07-17 20:05:10 +02:00
			`\begin{theorem}`
thething: WIP 2021-10-08 20:12:55 +02:00			`[{\Thething} privacy]`
the-thing: Added CIKM pre-print 2021-07-17 20:05:10 +02:00			`\label{theor:thething-prv}`
problem: OCD 2021-10-25 01:47:21 +02:00			`Let $\mathcal{M}$ be a mechanism with input a time series $S_T$, where $T$ is the set of the involved timestamps, and $L \subseteq T$ be the set of {\thething} timestamps.`
problem: Review thething 2021-10-25 01:27:48 +02:00			`$\mathcal{M}$ is decomposed to $\varepsilon$-differential private sub-mechanisms $\mathcal{M}_t$, for every $t \in T$, which apply independent randomness to the event at $t$.`
problem: Minor corrections 2021-10-25 05:06:25 +02:00			`Then, given a privacy budget $\varepsilon$, $\mathcal{M}$ satisfies $(\varepsilon, L)$-\emph{{\thething} privacy} if for any $t$ it holds that`
the-thing: Added CIKM pre-print 2021-07-17 20:05:10 +02:00			`$$ \sum_{i\in L \cup \{t\}} \varepsilon_i \leq \varepsilon$$`
			`\end{theorem}`

thething: Added pf:thething-prv 2021-09-19 23:02:00 +02:00			`\begin{proof}`
			`\label{pf:thething-prv}`
problem: Review thething 2021-10-25 01:27:48 +02:00			`All mechanisms use independent randomness, and therefore for a time series $S_T = (D_i)_{i \in T}$ and outputs $(\pmb{o}_i)_{i \in T} \in O \subseteq \mathcal{O}$ it holds that`
thething: Added pf:thething-prv 2021-09-19 23:02:00 +02:00
problem: Review thething 2021-10-25 01:27:48 +02:00			`$$Pr[\mathcal{M}(S_T) = (\pmb{o}_i)_{i \in T}] = \prod_{i \in T} Pr[\mathcal{M}_i(D_i) = \pmb{o}_i]$$`
thething: Added pf:thething-prv 2021-09-19 23:02:00 +02:00
problem: Review thething 2021-10-25 01:27:48 +02:00			`Likewise, for any {\thething}-neighboring time series $S'_T$ of $S_T$ with the same outputs $(\pmb{o}_i)_{i \in T} \in O \subseteq \mathcal{O}$`
thething: Added pf:thething-prv 2021-09-19 23:02:00 +02:00
problem: Review thething 2021-10-25 01:27:48 +02:00			`$$Pr[\mathcal{M}(S'_T) = (\pmb{o}_i)_{i \in T}] = \prod_{i \in T} Pr[\mathcal{M}_i(D'_i) = \pmb{o}_i]$$`
thething: Added pf:thething-prv 2021-09-19 23:02:00 +02:00
problem: Review thething 2021-10-25 01:27:48 +02:00			`According to Definition~\ref{def:thething-nb}, there exists $L \cup \{t\} \subseteq T$ such that $D_i = D'_i$ for $i \in L \cup \{t\}$.`
thething: Added pf:thething-prv 2021-09-19 23:02:00 +02:00			`Thus, we get`

problem: Review thething 2021-10-25 01:27:48 +02:00			`$$\frac{Pr[\mathcal{M}(S_T) = (\pmb{o}_i)_{i \in T}]}{Pr[\mathcal{M}(S'_T) = (\pmb{o}_i)_{i \in T}]} = \prod_{i \in L \cup \{t\}} \frac{Pr[\mathcal{M}_i(D_i) = \pmb{o}_i]}{Pr[\mathcal{M}_i(D'_i) = \pmb{o}_i]}$$`
thething: Added pf:thething-prv 2021-09-19 23:02:00 +02:00
			`$D_i$ and $D'_i$ are neighboring for $i \in L \cup \{t\}$.`
			`$\mathcal{M}_i$ is differential private and from Definition~\ref{def:dp} we get that $\frac{Pr[\mathcal{M}_i(D_i) = \pmb{o}_i]}{Pr[\mathcal{M}_i(D'_i) = \pmb{o}_i]} \leq e^{\varepsilon_i}$.`
			`Hence, we can write`

problem: Review thething 2021-10-25 01:27:48 +02:00			`$$\frac{Pr[\mathcal{M}(S_T) = (\pmb{o}_i)_{i \in T}]}{Pr[\mathcal{M}(S'_T) = (\pmb{o}_i)_{i \in T}]} \leq \prod_{i \in L \cup \{t\}} e^{\varepsilon_i} = e^{\sum_{i \in L \cup \{t\}} \varepsilon_i}$$`
thething: Added pf:thething-prv 2021-09-19 23:02:00 +02:00
problem: Review thething 2021-10-25 01:27:48 +02:00			`For any $O \in \mathcal{O}$ we get $\frac{Pr[\mathcal{M}(S_T) \in O]}{Pr[\mathcal{M}(S'_T) \in O]} \leq e^{\sum_{i \in L \cup \{t\}} \varepsilon_i}$.`
			`If the formula of Theorem~\ref{theor:thething-prv} holds, then we get $\frac{Pr[\mathcal{M}(S_T) \in O]}{Pr[\mathcal{M}(S'_T) \in O]} \leq e^\varepsilon$.`
thething: Added pf:thething-prv 2021-09-19 23:02:00 +02:00			`Due to Definition~\ref{def:thething-prv} this concludes our proof.`
			`\end{proof}`