the-last-thing/text/problem/thething/solution.tex

102 lines
7.2 KiB
TeX

\subsection{Achieving {\thething} privacy}
\label{subsec:lmdk-sol}
In this section, we propose the methodology for achieving {\thething} privacy.
\subsubsection{{\Thething} privacy mechanisms}
\label{subsec:lmdk-mechs}
\paragraph{\texttt{Uniform}}
Figure~\ref{fig:lmdk-uniform} shows the implementation of the baseline {\thething} privacy scheme for Example~\ref{ex:st-cont} which distributes uniformly the available privacy budget $\varepsilon$.
In this case, it is enough to distribute at each timestamp the total privacy budget divided by the number of timestamps corresponding to {\thethings}, plus one, i.e.,~$\frac{\varepsilon}{|L| + 1}$.
Consequently, at each timestamp we protect every {\thething}, while reserving a part of $\varepsilon$ for the current timestamp.
\begin{figure}[htp]
\centering
\includegraphics[width=.75\linewidth]{problem/lmdk-uniform}
\caption{The \texttt{Uniform} application scenario of {\thething} privacy.}
\label{fig:lmdk-uniform}
\end{figure}
\paragraph{\texttt{Skip}}
One might argue that we could skip the {\thething} data releases as we demonstrate in Figure~\ref{fig:lmdk-skip}, by republishing previous, regular event releases.
This would result in preserving all of the available privacy budget for regular events, equivalently to event-level protection, i.e.,~$\varepsilon_i = \varepsilon$, $\forall i \in T /\ L$.
\begin{figure}[htp]
\centering
\includegraphics[width=.75\linewidth]{problem/lmdk-skip}
\caption{Application scenario of the \texttt{Skip} {\thething} privacy scheme.}
\label{fig:lmdk-skip}
\end{figure}
In practice, however, this approach can eventually pose arbitrary privacy risks, especially when dealing with geotagged data.
Particularly, sporadic location data publishing or misapplying location cloaking could result in areas with sparse data points, indicating privacy-sensitive locations~\cite{gambs2010show, russell2018fitness}.
We study this problem and investigate possible solutions in Section~\ref{subsec:lmdk-sel-sol}.
\paragraph{\texttt{Adaptive}}
Next, we propose an adaptive privacy scheme (Figure~\ref{fig:lmdk-adaptive}) that accounts for changes in the input data by exploiting the post-processing property of differential privacy (Theorem~\ref{theor:p-proc}).
\begin{figure}[htp]
\centering
\includegraphics[width=.75\linewidth]{problem/lmdk-adaptive}
\caption{Concept of \texttt{Adaptive} {\thething} privacy.}
\label{fig:lmdk-adaptive}
\end{figure}
Initially, its budget management component reserves uniformly the available privacy budget $\varepsilon$ for each future release $\mathbf{o}$.
At each timestamp, the processing component decides to either sample from the time series the current input and publish it with noise or release an approximation based on previous releases.
In the case when it publishes with noise the original data, the analysis component estimates the data trends by calculating the difference between the current and the previous releases and compares the difference with the scale of the perturbation, i.e.,~$\frac{\Delta f}{\varepsilon}$~\cite{kellaris2014differentially}.
The outcome of this comparison determines the adaptation of the sampling rate of the processing component for the next events:
if the difference is greater it means that the data trends are evolving, and therefore it must increase the sampling rate.
In the case when the mechanism approximates a {\thething} (but not a regular timestamp), the budget management component distributes the reserved privacy budget to the next timestamps.
Due to the post-processing property of differential privacy (Theorem~\ref{theor:p-proc}), the analysis component does not consume any privacy budget allowing for better final data utility.
\subsubsection{{\Thething} privacy under temporal correlation}
\label{subsec:lmdk-tpl}
From the discussion so far, it is evident that for the budget distribution it is not the positions, but rather the number of the {\thethings} that matters.
However, this is not the case under the presence of temporal correlation.
The Hidden Markov Model scheme (as used in~\cite{cao2018quantifying}) stipulates two important independence properties: (i)~the future (or past) depends on the past (or future) via the present, and (ii)~the current observation is independent of the rest given the current state.
Hence, there is independence between an observation at a specific timestamp and previous/next data sets under the presence of the current input data set.
Intuitively, knowing the data set at timestamp $t$ stops the propagation of the Markov chain towards the next or previous timestamps in the time series.
In Section~\ref{subsec:compo} we showed that the temporal privacy loss $\alpha_t$ at a timestamp $t$ is calculated as the sum of the backward and forward privacy loss, $\alpha^B_t$ and $\alpha^F_t$, minus the privacy budget $\varepsilon_t$, to account for the extra privacy loss due to previous and next releases $\pmb{o}$ of $\mathcal{M}$ under temporal correlation.
By Theorem~\ref{theor:thething-prv}, at every timestamp $t$ we consider the data at $t$ and at the {\thething} timestamps $L$.
When sequentially composing the data releases for each timestamp $i$ in $L \cup \{t\}$ we consider the previous releases in the whole time series until the timestamp $i^{-}$ that is exactly before $i$ in the ordered $L \cup \{t\}$, and the next data releases in the whole time series until the timestamp $ i^{+}$ that is exactly after $i$ in the ordered $L \cup \{t\}$.
Figure~\ref{fig:lmdk-tpl} illustrates $i^{-}$ and $i^{+}$ in Example~\ref{ex:scenario}).
\begin{figure}[htp]
\centering
\includegraphics[width=.75\linewidth]{problem/lmdk-tpl}
\caption{The timestamps exactly before ($-$) and after ($+$) every timestamp, where that is applicable, for the calculation of the temporal privacy loss.}
\label{fig:lmdk-tpl}
\end{figure}
Therefore, in Definition~\ref{def:lmdk-tpl}, we formulate the {\thething} temporal privacy loss as follows.
\begin{definition}
[{\Thething} temporal privacy loss]
\label{def:lmdk-tpl}
Given a {\thething} set $L$ in a set of timestamps $T$, the potential overall temporal privacy loss of a privacy mechanism $\mathcal{M}$ at any timestamp in $L \cup \{t\}$ is
$$\sum_{i \in L \cup \{t\}} \alpha_i$$
where for $i^{-}, i^{+} \in L \cup \{t\}$ being the timestamps exactly before and after $i$, $\alpha_i$ is equal to
\begin{align}
\label{eq:lmdk-tpl}
\adjustbox{max width=0.9\linewidth}{
$\underbrace{\ln \frac{\Pr[(\pmb{o})_{i \in [i^{-} + 1, i]} | D_i]}{\Pr[(\pmb{o})_{i \in [i^{-} + 1, i]} | D'_i]}}_{\alpha^B_i} +
\underbrace{\ln \frac{\Pr[(\pmb{o})_{i \in [i, i^{+} - 1]} | D_i]}{\Pr[(\pmb{o})_{i \in [i, i^{+} - 1]} | D'_i]}}_{\alpha^F_i} -
\underbrace{\ln \frac{\Pr[\pmb{o}_i | D_i]}{\Pr[\pmb{o}_i | D'_i]}}_{\varepsilon_i}$
}
\end{align}
\end{definition}
As presented in~\cite{cao2018quantifying}, the temporal privacy loss of a time series (without {\thethings}) can be bounded by a given privacy budget $\varepsilon$.
Intuitively, by Equation~\ref{eq:lmdk-tpl} the temporal privacy loss incurred when considering {\thethings} is less than the temporal loss in the case without the knowledge of the {\thethings}.
Thus, the temporal privacy loss in {\thething} privacy can be also bounded by $\varepsilon$.