thething: WIP

This commit is contained in:
Manos Katsomallos 2021-10-08 20:15:13 +02:00
parent 8ef1ad4c5f
commit 69218c312a
3 changed files with 180 additions and 0 deletions

BIN
graphics/lmdk-skip.pdf Normal file

Binary file not shown.

BIN
graphics/lmdk-uniform.pdf Normal file

Binary file not shown.

View File

@ -0,0 +1,180 @@
\subsection{Achieving {\thething} privacy}
\label{subsec:lmdk-sol}
\subsubsection{{\Thething} privacy mechanisms}
\label{subsec:lmdk-mechs}
% \kat{add the two models -- uniform and dynamic and skip}
\paragraph{Uniform}
%\kat{isn't the uniform distribution a method? there is a section for the methods. }
Figure~\ref{fig:lmdk-uniform} shows the simplest model that implements Theorem~\ref{theor:thething-prv}, the \textbf{Uniform} distribution of privacy budget $\varepsilon$ for {\thething} privacy.
% \mk{We capitalize the first letter because it's the name of the method.}
% in comparison with user-level protection.
In this case, it is enough to distribute at each timestamp the total privacy budget divided by the number of timestamps corresponding to {\thethings}, plus one if we are releasing a regular timestamp.
Consequently, at each timestamp we protect every {\thething}, while reserving a part of $\varepsilon$ for the current timestamp.
%In this case, distributing $\frac{\varepsilon}{5}$ can guarantee {\thething} privacy.
\begin{figure}[htp]
\centering
\includegraphics[width=0.9\linewidth]{lmdk-uniform}
\caption{Uniform application scenario of {\thething} privacy.}
\label{fig:lmdk-uniform}
\end{figure}
\paragraph{Skip}
% Why skipping publications is problematic?
One might argue that we could \textbf{Skip} the \thething\ data releases.
% and limit the number of {\thethings}.
This would result in preserving all of the available privacy budget for regular events (because the set $L \cup \{t\}$ becomes $\{t\}$), equivalently to event-level protection.
In practice, however, this approach can eventually pose arbitrary privacy risks, especially when dealing with geotagged data.
Particularly, sporadic location data publishing~\cite{gambs2010show, russell2018fitness} or misapplying location cloaking~\cite{xssfopes2020tweet} could result in areas with sparse data points, indicating privacy-sensitive locations.
\begin{figure}[htp]
\centering
\includegraphics[width=0.9\linewidth]{lmdk-skip}
\caption{Application scenario of the Skip model in {\thething} privacy.}
\label{fig:lmdk-skip}
\end{figure}
\paragraph{Adaptive}
Next, we propose an \textbf{Adaptive} privacy mechanism taking into account changes in the input data and exploiting the post-processing property of differential privacy.
Initially, it reserves uniformly the available privacy budget for each future release.
At each timestamp, based on a sampling rate the mechanism either publishes with noise the original data or it releases an approximation based on previous releases.
In the case when it publishes with noise the original data, it also calculates the difference between the current and the previous release and compares the difference with the scale of the perturbation ($\frac{\Delta f}{\varepsilon}$).
The outcome of this comparison determines the adaptation of the sampling rate for the next events:
if the scale is greater it means that the input has not changed much, and therefore it must decrease the sampling rate.
In the case when the mechanism approximates a {\thething} (but not a regular timestamp), it distributes the reserved privacy budget
% divided by the number of remaining {\thething} plus one
to the next timestamps.
% \mk{WIP}
% \kat{write in text and remove the algorithm}
% \begin{algorithm}
% \caption{Adaptive {\thething} privacy mechanism}
% \label{algo:adapt-lmdk-priv}
% \SetKwInput{KwData}{Input}
% \SetKwInput{KwResult}{Output}
% \SetKwData{diffCur}{diffCur}
% \SetKwData{diffMin}{diffMin}
% \SetKwData{evalCur}{evalCur}
% \SetKwData{evalOrig}{evalOrig}
% \SetKwData{evalSum}{evalSum}
% \SetKwData{metricCur}{metricCur}
% \SetKwData{metricOrig}{metricOrig}
% \SetKwData{opt}{opt}
% \SetKwData{opti}{opt$_i$}
% \SetKwData{optim}{optim}
% \SetKwData{optimi}{optim$_i$}
% \SetKwData{opts}{opts}
% \SetKwData{reg}{reg}
% \SetKwData{S}{$S_T$}
% \SetKwData{L}{$L$}
% \SetKwData{epsilon}{$\varepsilon$}
% \SetKwFunction{calcMetric}{calcMetric}
% \SetKwFunction{evalSeq}{evalSeq}
% \SetKwFunction{getCombs}{getCombs}
% \SetKwFunction{getOpts}{getOpts}
% \DontPrintSemicolon
% \KwData{\S, \L, \epsilon}
% \KwResult{\optim}
% \BlankLine
% % \If{abs($$)}
% % \If{$i \in L$}{
% % \lmdks $\leftarrow$ \lmdks + 1
% % \ForEach{$j \in [i + 1, T]$}{
% % $varepsilon_j \leftarrow varepsilon_j + \frac{\varepsilon_i}{|T| - \lmdks + 1}$
% % }
% % }
% % Evaluate the original
% \metricOrig $\leftarrow$ \calcMetric{$\{t_n\}, \emptyset, \{l_k\}$}\;
% \evalOrig $\leftarrow$ \evalSeq{\metricOrig}\;
% % Get all possible option combinations
% \opts $\leftarrow$ \getOpts{$\{t_n\}, \{l_k\}$}\;
% % Track the minimum (best) evaluation
% \diffMin $\leftarrow$ $\infty$\;
% % Track the optimal sequence (the one with the best evaluation)
% \optim $\leftarrow$ $[]$\;
% \ForEach{\opt $\in$ \opts}{\label{algo:lmdk-sel-opt-for-each}
% \evalSum $\leftarrow 0$\;
% \ForEach{\opti $\in$ \opt}{
% \metricCur $\leftarrow$ \calcMetric{$\{t_n\}, \opti, \{l_k\}$}\;\label{algo:lmdk-sel-opt-comparison}
% \evalSum $\leftarrow$ \evalSum $+$ \evalSeq{\metricCur}\;
% % Compare with current optimal
% \diffCur $\leftarrow \left|\evalSum/\#\opt - \evalOrig\right|$\;
% \If{\diffCur $<$ \diffMin}{
% \diffMin $\leftarrow$ \diffCur\;
% \optim $\leftarrow$ \opt\;
% }
% }
% }\label{algo:lmdk-sel-opt-end}
% \Return{\optim}
% \end{algorithm}
\subsubsection{{\Thething} privacy under temporal correlation}
\label{subsec:lmdk-cor}
From the discussion so far, it is evident that for the budget distribution it is not the positions but rather the number of the {\thethings} that matters.
However, this is not the case under the presence of temporal correlation, which is inherent in continuously generated data.
% HMMs have two important independence properties:
% Markov hidden process: future depends on past via the present.
% Current observation independent of all else given current state.
% Intuitively, D^t or D^{t+1} "cuts off" the propagation of the Markov chain.
The Hidden Markov Model~\cite{baum1966statistical} stipulates two important independence properties: (i)~the future(past) depends on the past(future) via the present, and (ii)~the current observation is independent of the rest given the current state.
%Thus, the observation of a data release at a timestamp $t$ depends only on the respective input data set $D_t$, i.e.,~the current state.
Hence, there is independence between an observation at a specific timestamp and previous/next data sets under the presence of the current input data set.
Intuitively, knowing the data set at timestamp $t$ stops the propagation of the Markov chain towards the next or previous timestamps.
%\kat{do we see this in the formula 1 ?}
%when calculating the forward or backward privacy loss respectively.
Cao et al.~\cite{cao2017quantifying} propose a method for computing the total temporal privacy loss $\alpha_t$ at a timestamp $t$ as the sum of the backward and forward privacy loss, $\alpha^B_t$ and $\alpha^F_t$, minus the privacy budget $\varepsilon_t$
to account for the extra privacy loss due to previous and next releases $\pmb{o}$ of $\mathcal{M}$ under temporal correlation.
By Theorem~\ref{theor:thething-prv}, at every timestamp $t$ we consider the data at $t$ and at the {\thething} timestamps $L$.
%According to the Definitions~{\ref{def:bpl} and \ref{def:fpl}}, we calculate the backward and forward privacy loss by taking into account the privacy budget at previous and next data releases respectively.
When sequentially composing the data releases for each timestamp $i$ in $L \cup \{t\}$ we
%calculate the temporal privacy loss $\alpha_t$ at each timestamp $t \in L \cup \{i\}$ by
%consider the previous and next data releases at the timestamps $i^{-}, i^{+} \in L \cup \{t\} \setminus \{i\}$ respectively.
consider the previous releases in the whole time series until the timestamp $i^{-}$ that is exactly before $i$ in the ordered $L {\cup} \{t\}$, and the next data releases in the whole time series until the timestamp $ i^{+}$ that is exactly after $i$ in the ordered $L {\cup }\{t\} $.
%\kat{not sure I understand}
%Thus, we calculate the backward/forward privacy loss by taking into account the data releases after/before the previous/next data item.
That is:
% \dk{do we keep looking at all Landmarks both for backward and forward? I would assume that for backward we are looking to the Landmarks until the i and for the forward to the Landmarks after the i - if we would like to be consistent with Cao. Otherwise the writing here is confusing.}
% \mk{We are discussing about the case where we calculate the tpl at each timestamp i in L+{t}. Therefore, bpl at i is calculated until i- and fpl at i until i+.}
\begin{align}
\adjustbox{max width=0.9\linewidth}{
$\alpha_i =
\underbrace{\ln \frac{\Pr[\pmb{o}_{i^{-} + 1}, \dots, \pmb{o}_i | D_i]}{\Pr[\pmb{o}_{i^{-} + 1}, \dots, \pmb{o}_i | D'_i]}}_{\alpha^B_i} +
\underbrace{\ln \frac{\Pr[\pmb{o}_{i^{+} - 1}, \dots, \pmb{o}_i | D_i]}{\Pr[\pmb{o}_{i^{+} - 1}, \dots, \pmb{o}_i | D'_i]}}_{\alpha^F_i} -
\underbrace{\ln \frac{\Pr[\pmb{o}_i | D_i]}{\Pr[\pmb{o}_i | D'_i]}}_{\varepsilon_i}$
}
\end{align}
Finally, $\alpha_t$ is equal to the sum of all $\alpha_i , i\in L \cup\{t\}$.
%
% where $x_t$ (or $x'_t$) is the potential (neighboring) data item of an individual who is targeted by an adversary with knowledge $\mathbb{D}_t$.
%where $D_t$ and $D'_t$ are the neighboring input data sets (Definition~\ref{def:nb-d-s}) responsible for the output $\pmb{o}_t$.
%Notice that if $t$ is the first or last item in $L \cup \{i\}$ then we need to set $t_{\text{prv}} = 0$ or $t_{\text{nxt}} = \max(T) + 1$.
%In Section~\ref{sec:eval}, we experimentally show how the distribution of {\thethings} impacts the overall privacy loss of the user.