problem: WIP
This commit is contained in:
		@ -1,2 +1,199 @@
 | 
				
			|||||||
 | 
					\SetKwInput{KwResult}{Output}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\SetKwData{diffCur}{diffCur}
 | 
				
			||||||
 | 
					\SetKwData{diffMin}{diffMin}
 | 
				
			||||||
 | 
					\SetKwData{evalCur}{evalCur}
 | 
				
			||||||
 | 
					\SetKwData{evalOrig}{evalOrig}
 | 
				
			||||||
 | 
					\SetKwData{evalSum}{evalSum}
 | 
				
			||||||
 | 
					\SetKwData{metricCur}{metricCur}
 | 
				
			||||||
 | 
					\SetKwData{metricOrig}{metricOrig}
 | 
				
			||||||
 | 
					\SetKwData{opt}{opt}
 | 
				
			||||||
 | 
					\SetKwData{opti}{opt$_i$}
 | 
				
			||||||
 | 
					\SetKwData{optim}{optim}
 | 
				
			||||||
 | 
					\SetKwData{optimi}{optim$_i$}
 | 
				
			||||||
 | 
					\SetKwData{opts}{opts}
 | 
				
			||||||
 | 
					\SetKwData{reg}{reg}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\SetKwFunction{calcMetric}{calcMetric}
 | 
				
			||||||
 | 
					\SetKwFunction{evalSeq}{evalSeq}
 | 
				
			||||||
 | 
					\SetKwFunction{getCombs}{getCombs}
 | 
				
			||||||
 | 
					\SetKwFunction{getOpts}{getOpts}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
\section{Selection of events}
 | 
					\section{Selection of events}
 | 
				
			||||||
\label{sec:theotherthing}
 | 
					\label{sec:theotherthing}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Given a set of {\thethings} at respective timestamps $\{l_k\}$ in a series of events at $\{t_n\}$, such that $\{l_k\} \subseteq \{t_n\}$, a data publisher might release this information by:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\begin{enumerate}
 | 
				
			||||||
 | 
					  \item Selecting a set of options (Section~\ref{subsec:lmdk-set-opts}) consisting of different possible versions of $\{l_k\}$.
 | 
				
			||||||
 | 
					  
 | 
				
			||||||
 | 
					  \mk{`option' or `candidate'?}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  This could be:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  \begin{itemize}
 | 
				
			||||||
 | 
					    \item either a random set of $k$ other timestamps similar to the actual {\thething} timestamps (Section~\ref{subsec:lmdk-rnd}),
 | 
				
			||||||
 | 
					    \item or a set including $\{l_k\}$ and $x \in [1, n - k]$ additional dummy timestamps (Section~\ref{subsec:lmdk-dum-gen}).
 | 
				
			||||||
 | 
					  \end{itemize}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  \item Releasing a privacy-preserving version of the {\thething} timestamps (Section~\ref{subsec:priv-opt-sel}).
 | 
				
			||||||
 | 
					  We utilize the exponential mechanism with a utility function that calculates an indicator for each of the options in the set that we selected in the previous step.
 | 
				
			||||||
 | 
					  The utility depends on the positioning of the {\thething} timestamps of an option in the series, e.g.,~the distance from the previous/next {\thething}, the distance from the start/end of the series, etc.
 | 
				
			||||||
 | 
					\end{enumerate}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Following this process allows the release, and thereafter processing, of {\thething} timestamps.
 | 
				
			||||||
 | 
					Thus, we provide an extra layer of privacy protection when we separate {\thethings} from regular events.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\subsubsection{{\Thething} set options}
 | 
				
			||||||
 | 
					\label{subsec:lmdk-set-opts}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This step aims to select a set of candidate {\thething} timestamps options either by randomizing the actual timestamps (Section~\ref{subsec:lmdk-rnd}), or by inserting dummy timestamps (Section~\ref{subsec:lmdk-dum-gen}) to the actual {\thething} timestamps.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\paragraph{{\Thething} randomization}
 | 
				
			||||||
 | 
					\label{subsec:lmdk-rnd}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					A simple way to select a set of timestamps without disclosing the actual {\thethings} is by \emph{randomly} selecting an equally sized set of timestamps.
 | 
				
			||||||
 | 
					The randomization of the process, as we will discuss in more detail in Section~\ref{subsec:priv-opt-sel}, will depend on the positioning of the {\thethings} in the series of events.
 | 
				
			||||||
 | 
					In more detail, given a set of {\thething} timestamps $\{l_k\} \subseteq \{t_n\}$, where $\{t_n\}$ is an event sequence, we need to select all possible sets of size $k$ from $\{t_n\}$.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					However, the introduction of randomization could impact arbitrarily the effectiveness of non-uniform privacy-protection methods.
 | 
				
			||||||
 | 
					This applies mainly in cases where we try to achieve optimal privacy-protection of {\thething} events while maximizing the utility of the data that corresponds to the rest of the series of events.
 | 
				
			||||||
 | 
					As a consequence, it is possible to end up providing lower levels of protection to {\thething} data than the one necessary, i.e.,~worse than the users' privacy-protection expectations.
 | 
				
			||||||
 | 
					The methodology that we present next (Section~\ref{subsec:lmdk-dum-gen}) attempts to tackle the aforementioned shortcoming.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\paragraph{Dummy {\thething} generation}
 | 
				
			||||||
 | 
					\label{subsec:lmdk-dum-gen}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Selecting extra events, on top of the actual {\thethings}, as dummy {\thethings} can render actual ones indistinguishable.
 | 
				
			||||||
 | 
					The goal is to select a list of sets with additional timestamps from a series of events at timestamps $\{t_n\}$ for a set of {\thethings} at $\{l_k\} \subseteq \{t_n\}$.
 | 
				
			||||||
 | 
					Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} approach this problem with an optimal and heuristic methodology, respectively.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Function \calcMetric measures an indicator for the union of $\{l_k\}$ and a timestamp combination from $\{t_n\} \setminus \{l_k\}$.
 | 
				
			||||||
 | 
					Function \evalSeq evaluates the result of \calcMetric by, e.g.,~estimating the standard deviation of all the distances from the previous/next {\thething}.
 | 
				
			||||||
 | 
					Function \getOpts returns all possible \emph{valid} sets of combinations \opt such that $\{l_{k+i}\} \subset \{l_{k+j}\}, \forall i, j \in [k, n] \mid i < j$, i.e.,~larger options must contain all of the timestamps that are present in smaller ones.
 | 
				
			||||||
 | 
					Each combination contains a set of timestamps with sizes $k + 1, k + 2, \dots, n$, where each one of them is a combination of $\{l_k\}$ with $x \in [1, n - k]$ timestamps from $\{t_n\}$.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\begin{algorithm}
 | 
				
			||||||
 | 
					  \caption{Optimal dummy {\thething} set options selection}
 | 
				
			||||||
 | 
					  \label{algo:lmdk-sel-opt}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  \DontPrintSemicolon
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  \KwData{$\{t_n\}, \{l_k\}$}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  \SetKwInput{KwData}{Input}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  \KwResult{\optim}
 | 
				
			||||||
 | 
					  \BlankLine
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  % Evaluate the original
 | 
				
			||||||
 | 
					  \metricOrig $\leftarrow$ \calcMetric{$\{t_n\}, \emptyset, \{l_k\}$}\;
 | 
				
			||||||
 | 
					  \evalOrig $\leftarrow$ \evalSeq{\metricOrig}\;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  % Get all possible option combinations
 | 
				
			||||||
 | 
					  \opts $\leftarrow$ \getOpts{$\{t_n\}, \{l_k\}$}\;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  % Track the minimum (best) evaluation
 | 
				
			||||||
 | 
					  \diffMin $\leftarrow$ $\infty$\;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  % Track the optimal sequence (the one with the best evaluation)
 | 
				
			||||||
 | 
					  \optim $\leftarrow$ $[]$\;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  \ForEach{\opt $\in$ \opts}{\label{algo:lmdk-sel-opt-for-each}
 | 
				
			||||||
 | 
					    \evalSum $\leftarrow 0$\;
 | 
				
			||||||
 | 
					    \ForEach{\opti $\in$ \opt}{
 | 
				
			||||||
 | 
					      \metricCur $\leftarrow$ \calcMetric{$\{t_n\}, \opti, \{l_k\}$}\;\label{algo:lmdk-sel-opt-comparison}
 | 
				
			||||||
 | 
					      \evalSum $\leftarrow$ \evalSum $+$ \evalSeq{\metricCur}\;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					      % Compare with current optimal
 | 
				
			||||||
 | 
					      \diffCur $\leftarrow \left|\evalSum/\#\opt - \evalOrig\right|$\;
 | 
				
			||||||
 | 
					      \If{\diffCur $<$ \diffMin}{
 | 
				
			||||||
 | 
					        \diffMin $\leftarrow$ \diffCur\;
 | 
				
			||||||
 | 
					        \optim $\leftarrow$ \opt\;
 | 
				
			||||||
 | 
					      }
 | 
				
			||||||
 | 
					    }
 | 
				
			||||||
 | 
					  }\label{algo:lmdk-sel-opt-end}
 | 
				
			||||||
 | 
					  \Return{\optim}
 | 
				
			||||||
 | 
					\end{algorithm}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Algorithm~\ref{algo:lmdk-sel-opt}, in particular, between Lines~{\ref{algo:lmdk-sel-opt-for-each}-\ref{algo:lmdk-sel-opt-end}} evaluates each option in \opts.
 | 
				
			||||||
 | 
					It finds the option that is the most \emph{similar} to the original (Lines~{\ref{algo:lmdk-sel-opt-comparison}-\ref{algo:lmdk-sel-opt-end}}), i.e.,~the option that has an evaluation that differs the least from that of the sequence $\{t_n\}$ with {\thethings} $\{l_k\}$.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\begin{algorithm}
 | 
				
			||||||
 | 
					  \caption{Heuristic dummy {\thething} set options selection}
 | 
				
			||||||
 | 
					  \label{algo:lmdk-sel-heur}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  \DontPrintSemicolon
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  \KwData{$\{t_n\}, \{l_k\}$}
 | 
				
			||||||
 | 
					  \KwResult{\optim}
 | 
				
			||||||
 | 
					  \BlankLine
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  % Evaluate the original
 | 
				
			||||||
 | 
					  \metricOrig $\leftarrow$ \calcMetric{$\{t_n\}, \emptyset, \{l_k\}$}\;
 | 
				
			||||||
 | 
					  \evalOrig $\leftarrow$ \evalSeq{\metricOrig}\;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  % Get all possible option combinations
 | 
				
			||||||
 | 
					  \optim $\leftarrow$ $[]$\;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  $\{l_{k'}\} \leftarrow \{l_k\}$\;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  \While{$\{l_{k'}\} \neq \{t_n\}$}{\label{algo:lmdk-sel-heur-while}
 | 
				
			||||||
 | 
					    % Track the minimum (best) evaluation
 | 
				
			||||||
 | 
					    \diffMin $\leftarrow$ $\infty$\;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    \optimi $\leftarrow$ $0$\;
 | 
				
			||||||
 | 
					    % Find the combinations for one more point
 | 
				
			||||||
 | 
					    \ForEach{\reg $\in \{t_n\} \setminus \{l_{k'}\}$}{
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					      % Evaluate current
 | 
				
			||||||
 | 
					      \metricCur $\leftarrow$ \calcMetric{$\{t_n\}, \reg, \{l_{k'}\}$}\;\label{algo:lmdk-sel-heur-comparison}
 | 
				
			||||||
 | 
					      \evalCur $\leftarrow$ \evalSeq{\metricCur}\;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					      % Compare evaluations
 | 
				
			||||||
 | 
					      \diffCur $\leftarrow$ $\left|\evalCur - \evalOrig\right|$\;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					      \If{\diffCur $<$ \diffMin}{
 | 
				
			||||||
 | 
					        \diffMin $\leftarrow$ \diffCur\;
 | 
				
			||||||
 | 
					        \optimi $\leftarrow$ \reg\;
 | 
				
			||||||
 | 
					      }\label{algo:lmdk-sel-heur-comparison-end}
 | 
				
			||||||
 | 
					    }
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    % Save new point to landmarks
 | 
				
			||||||
 | 
					    $k' \leftarrow k' + 1$\;
 | 
				
			||||||
 | 
					    $l_{k'} \leftarrow \optimi$\;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    % Add new option
 | 
				
			||||||
 | 
					    \optim.add($\{l_{k'}\} \setminus \{l_k\}$)\;
 | 
				
			||||||
 | 
					  }\label{algo:lmdk-sel-heur-end}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  \Return{\optim}
 | 
				
			||||||
 | 
					\end{algorithm}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Algorithm~\ref{algo:lmdk-sel-heur}, follows an incremental methodology.
 | 
				
			||||||
 | 
					At each step it selects a new timestamp that corresponds to a regular ({non-\thething}) event from $\{t_n\} \setminus \{l_k\}$.
 | 
				
			||||||
 | 
					Similar to Algorithm~\ref{algo:lmdk-sel-opt}, the selection is done based on a predefined metric (Lines~{\ref{algo:lmdk-sel-heur-comparison}-\ref{algo:lmdk-sel-heur-comparison-end}}).
 | 
				
			||||||
 | 
					This process (Lines~{\ref{algo:lmdk-sel-heur-while}-\ref{algo:lmdk-sel-heur-end}}) goes on until we select a set that is equal to the size of the series of events, i.e.,~$\{l_{k'}\} = \{t_n\}$.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Note that the reverse heuristic approach, i.e.,~starting with $\{t_n\}$ {\thethings} and removing until $\{l_k\}$, performs worse than and occasionally the same with Algorithm~\ref{algo:lmdk-sel-heur}.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\subsubsection{Privacy-preserving option selection}
 | 
				
			||||||
 | 
					\label{subsec:priv-opt-sel}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					% Nearby events
 | 
				
			||||||
 | 
					Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}.
 | 
				
			||||||
 | 
					Thus, taking into account more recent events with respect to {\thethings} can result in less privacy loss and better privacy protection overall.
 | 
				
			||||||
 | 
					This leads to worse data utility.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					% Depending on the {\thething} discovery technique
 | 
				
			||||||
 | 
					The values of events near a {\thething} are usually similar to that of the latter.
 | 
				
			||||||
 | 
					Therefore, privacy-preserving mechanisms are likely to approximate their values based on the nearest {\thething} instead of investing extra privacy budget to perturb their actual values; thus, spending less privacy budget.
 | 
				
			||||||
 | 
					Saving privacy budget for releasing perturbed versions of actual event values can bring about better data utility. 
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					% Distant events
 | 
				
			||||||
 | 
					However, indicating the existence of randomized/dummy {\thethings} nearby actual {\thethings} can increase the adversarial confidence regarding the location of the latter within a series of events.
 | 
				
			||||||
 | 
					Hence, choosing randomized/dummy {\thethings} far from the actual {\thethings} (and thus less relevant) can limit the final privacy loss.
 | 
				
			||||||
 | 
				
			|||||||
		Reference in New Issue
	
	Block a user