diff --git a/text/problem/theotherthing/solution.tex b/text/problem/theotherthing/solution.tex new file mode 100644 index 0000000..842ac87 --- /dev/null +++ b/text/problem/theotherthing/solution.tex @@ -0,0 +1,155 @@ +\subsection{Protecting {\thethings}} +\label{subsec:lmdk-sel-sol} + +\subsubsection{{\Thething} set options} +\label{subsec:lmdk-set-opts} + +This step aims to select a set of candidate {\thething} timestamps options either by randomizing the actual timestamps (Section~\ref{subsec:lmdk-rnd}), or by inserting dummy timestamps (Section~\ref{subsec:lmdk-dum-gen}) to the actual {\thething} timestamps. + + +\paragraph{{\Thething} randomization} +\label{subsec:lmdk-rnd} + +A simple way to select a set of timestamps without disclosing the actual {\thethings} is by \emph{randomly} selecting an equally sized set of timestamps. +The randomization of the process, as we will discuss in more detail in Section~\ref{subsec:priv-opt-sel}, will depend on the positioning of the {\thethings} in the series of events. +In more detail, given a set of {\thething} timestamps $\{l_k\} \subseteq \{t_n\}$, where $\{t_n\}$ is an event sequence, we need to select all possible sets of size $k$ from $\{t_n\}$. + +However, the introduction of randomization could impact arbitrarily the effectiveness of non-uniform privacy-protection methods. +This applies mainly in cases where we try to achieve optimal privacy-protection of {\thething} events while maximizing the utility of the data that corresponds to the rest of the series of events. +As a consequence, it is possible to end up providing lower levels of protection to {\thething} data than the one necessary, i.e.,~worse than the users' privacy-protection expectations. +The methodology that we present next (Section~\ref{subsec:lmdk-dum-gen}) attempts to tackle the aforementioned shortcoming. + + +\paragraph{Dummy {\thething} generation} +\label{subsec:lmdk-dum-gen} + +Selecting extra events, on top of the actual {\thethings}, as dummy {\thethings} can render actual ones indistinguishable. +The goal is to select a list of sets with additional timestamps from a series of events at timestamps $\{t_n\}$ for a set of {\thethings} at $\{l_k\} \subseteq \{t_n\}$. +Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} approach this problem with an optimal and heuristic methodology, respectively. + +Function \calcMetric measures an indicator for the union of $\{l_k\}$ and a timestamp combination from $\{t_n\} \setminus \{l_k\}$. +Function \evalSeq evaluates the result of \calcMetric by, e.g.,~estimating the standard deviation of all the distances from the previous/next {\thething}. +Function \getOpts returns all possible \emph{valid} sets of combinations \opt such that $\{l_{k+i}\} \subset \{l_{k+j}\}, \forall i, j \in [k, n] \mid i < j$, i.e.,~larger options must contain all of the timestamps that are present in smaller ones. +Each combination contains a set of timestamps with sizes $k + 1, k + 2, \dots, n$, where each one of them is a combination of $\{l_k\}$ with $x \in [1, n - k]$ timestamps from $\{t_n\}$. + +\begin{algorithm} + \caption{Optimal dummy {\thething} set options selection} + \label{algo:lmdk-sel-opt} + + \DontPrintSemicolon + + \KwData{$\{t_n\}, \{l_k\}$} + + \SetKwInput{KwData}{Input} + + \KwResult{\optim} + \BlankLine + + % Evaluate the original + \metricOrig $\leftarrow$ \calcMetric{$\{t_n\}, \emptyset, \{l_k\}$}\; + \evalOrig $\leftarrow$ \evalSeq{\metricOrig}\; + + % Get all possible option combinations + \opts $\leftarrow$ \getOpts{$\{t_n\}, \{l_k\}$}\; + + % Track the minimum (best) evaluation + \diffMin $\leftarrow$ $\infty$\; + + % Track the optimal sequence (the one with the best evaluation) + \optim $\leftarrow$ $[]$\; + + \ForEach{\opt $\in$ \opts}{\label{algo:lmdk-sel-opt-for-each} + \evalSum $\leftarrow 0$\; + \ForEach{\opti $\in$ \opt}{ + \metricCur $\leftarrow$ \calcMetric{$\{t_n\}, \opti, \{l_k\}$}\;\label{algo:lmdk-sel-opt-comparison} + \evalSum $\leftarrow$ \evalSum $+$ \evalSeq{\metricCur}\; + + % Compare with current optimal + \diffCur $\leftarrow \left|\evalSum/\#\opt - \evalOrig\right|$\; + \If{\diffCur $<$ \diffMin}{ + \diffMin $\leftarrow$ \diffCur\; + \optim $\leftarrow$ \opt\; + } + } + }\label{algo:lmdk-sel-opt-end} + \Return{\optim} +\end{algorithm} + +Algorithm~\ref{algo:lmdk-sel-opt}, in particular, between Lines~{\ref{algo:lmdk-sel-opt-for-each}-\ref{algo:lmdk-sel-opt-end}} evaluates each option in \opts. +It finds the option that is the most \emph{similar} to the original (Lines~{\ref{algo:lmdk-sel-opt-comparison}-\ref{algo:lmdk-sel-opt-end}}), i.e.,~the option that has an evaluation that differs the least from that of the sequence $\{t_n\}$ with {\thethings} $\{l_k\}$. + +\begin{algorithm} + \caption{Heuristic dummy {\thething} set options selection} + \label{algo:lmdk-sel-heur} + + \DontPrintSemicolon + + \KwData{$\{t_n\}, \{l_k\}$} + \KwResult{\optim} + \BlankLine + + % Evaluate the original + \metricOrig $\leftarrow$ \calcMetric{$\{t_n\}, \emptyset, \{l_k\}$}\; + \evalOrig $\leftarrow$ \evalSeq{\metricOrig}\; + + % Get all possible option combinations + \optim $\leftarrow$ $[]$\; + + $\{l_{k'}\} \leftarrow \{l_k\}$\; + + \While{$\{l_{k'}\} \neq \{t_n\}$}{\label{algo:lmdk-sel-heur-while} + % Track the minimum (best) evaluation + \diffMin $\leftarrow$ $\infty$\; + + \optimi $\leftarrow$ $0$\; + % Find the combinations for one more point + \ForEach{\reg $\in \{t_n\} \setminus \{l_{k'}\}$}{ + + % Evaluate current + \metricCur $\leftarrow$ \calcMetric{$\{t_n\}, \reg, \{l_{k'}\}$}\;\label{algo:lmdk-sel-heur-comparison} + \evalCur $\leftarrow$ \evalSeq{\metricCur}\; + + % Compare evaluations + \diffCur $\leftarrow$ $\left|\evalCur - \evalOrig\right|$\; + + \If{\diffCur $<$ \diffMin}{ + \diffMin $\leftarrow$ \diffCur\; + \optimi $\leftarrow$ \reg\; + }\label{algo:lmdk-sel-heur-comparison-end} + } + + % Save new point to landmarks + $k' \leftarrow k' + 1$\; + $l_{k'} \leftarrow \optimi$\; + + % Add new option + \optim.add($\{l_{k'}\} \setminus \{l_k\}$)\; + }\label{algo:lmdk-sel-heur-end} + + \Return{\optim} +\end{algorithm} + +Algorithm~\ref{algo:lmdk-sel-heur}, follows an incremental methodology. +At each step it selects a new timestamp that corresponds to a regular ({non-\thething}) event from $\{t_n\} \setminus \{l_k\}$. +Similar to Algorithm~\ref{algo:lmdk-sel-opt}, the selection is done based on a predefined metric (Lines~{\ref{algo:lmdk-sel-heur-comparison}-\ref{algo:lmdk-sel-heur-comparison-end}}). +This process (Lines~{\ref{algo:lmdk-sel-heur-while}-\ref{algo:lmdk-sel-heur-end}}) goes on until we select a set that is equal to the size of the series of events, i.e.,~$\{l_{k'}\} = \{t_n\}$. + +Note that the reverse heuristic approach, i.e.,~starting with $\{t_n\}$ {\thethings} and removing until $\{l_k\}$, performs worse than and occasionally the same with Algorithm~\ref{algo:lmdk-sel-heur}. + + +\subsubsection{Privacy-preserving option selection} +\label{subsec:priv-opt-sel} + +% Nearby events +Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}. +Thus, taking into account more recent events with respect to {\thethings} can result in less privacy loss and better privacy protection overall. +This leads to worse data utility. + +% Depending on the {\thething} discovery technique +The values of events near a {\thething} are usually similar to that of the latter. +Therefore, privacy-preserving mechanisms are likely to approximate their values based on the nearest {\thething} instead of investing extra privacy budget to perturb their actual values; thus, spending less privacy budget. +Saving privacy budget for releasing perturbed versions of actual event values can bring about better data utility. + +% Distant events +However, indicating the existence of randomized/dummy {\thethings} nearby actual {\thethings} can increase the adversarial confidence regarding the location of the latter within a series of events. +Hence, choosing randomized/dummy {\thethings} far from the actual {\thethings} (and thus less relevant) can limit the final privacy loss.