the-last-thing/text/problem/theotherthing/solution.tex

\subsection{Protecting {\thethings}}
\label{subsec:lmdk-sel-sol}

The main idea of the privacy-preserving {\thething} selection component is to privately select extra {\thething} event timestamps, i.e.,~dummy {\thethings}, from the set of timestamps $T /\ L$ of the time series $S_T$ and add them to the original {\thething} set $L$.
Thus, we create a new set $L'$ such that $L \subset L' \subseteq T$.
We generate a set of dummy {\thething} set options by adding regular event timestamps from $T /\ L$ to $L$ (Section~\ref{subsec:lmdk-set-opts}).
Then (Section~\ref{subsec:lmdk-opt-sel}), we utilize the exponential mechanism, with a utility function that calculates an indicator for each of the options in the set based on how much it differs from the original {\thething} set $L$, and randomly select one ot the options that we created earlier.
This process provides an extra layer of privacy protection to {\thethings}, and thus allows the release, and thereafter processing, of {\thething} timestamps.

% We utilize the exponential mechanism with a utility function that calculates an indicator for each of the options in the set that we selected in the previous step.
% The utility depends on the positioning of the {\thething} timestamps of an option in the series, e.g.,~the distance from the previous/next {\thething}, the distance from the start/end of the series, etc.


\subsubsection{{\Thething} set options}
\label{subsec:lmdk-set-opts}

This step aims to select a set of candidate {\thething} timestamps options either by randomizing the actual timestamps (Section~\ref{subsec:lmdk-rnd}), or by inserting dummy timestamps (Section~\ref{subsec:lmdk-dum-gen}) to the actual {\thething} timestamps.


\paragraph{Dummy {\thething} generation}
\label{subsec:lmdk-dum-gen}

Selecting extra events, on top of the actual {\thethings}, as dummy {\thethings} can render actual ones indistinguishable.
The goal is to select a list of sets with additional timestamps from a series of events at timestamps $\{t_n\}$ for a set of {\thethings} at $\{l_k\} \subseteq \{t_n\}$.
Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} approach this problem with an optimal and heuristic methodology, respectively.

Function \calcMetric measures an indicator for the union of $\{l_k\}$ and a timestamp combination from $\{t_n\} \setminus \{l_k\}$.
Function \evalSeq evaluates the result of \calcMetric by, e.g.,~estimating the standard deviation of all the distances from the previous/next {\thething}.
Function \getOpts returns all possible \emph{valid} sets of combinations \opt such that $\{l_{k+i}\} \subset \{l_{k+j}\}, \forall i, j \in [k, n] \mid i < j$, i.e.,~larger options must contain all of the timestamps that are present in smaller ones.
Each combination contains a set of timestamps with sizes $k + 1, k + 2, \dots, n$, where each one of them is a combination of $\{l_k\}$ with $x \in [1, n - k]$ timestamps from $\{t_n\}$.

\begin{algorithm}
  \caption{Optimal dummy {\thething} set options selection}
  \label{algo:lmdk-sel-opt}

  \DontPrintSemicolon

  \KwData{$\{t_n\}, \{l_k\}$}

  \SetKwInput{KwData}{Input}

  \KwResult{\optim}
  \BlankLine

  % Evaluate the original
  \metricOrig $\leftarrow$ \calcMetric{$\{t_n\}, \emptyset, \{l_k\}$}\;
  \evalOrig $\leftarrow$ \evalSeq{\metricOrig}\;

  % Get all possible option combinations
  \opts $\leftarrow$ \getOpts{$\{t_n\}, \{l_k\}$}\;

  % Track the minimum (best) evaluation
  \diffMin $\leftarrow$ $\infty$\;

  % Track the optimal sequence (the one with the best evaluation)
  \optim $\leftarrow$ $[]$\;

  \ForEach{\opt $\in$ \opts}{\label{algo:lmdk-sel-opt-for-each}
    \evalSum $\leftarrow 0$\;
    \ForEach{\opti $\in$ \opt}{
      \metricCur $\leftarrow$ \calcMetric{$\{t_n\}, \opti, \{l_k\}$}\;\label{algo:lmdk-sel-opt-comparison}
      \evalSum $\leftarrow$ \evalSum $+$ \evalSeq{\metricCur}\;

      % Compare with current optimal
      \diffCur $\leftarrow \left|\evalSum/\#\opt - \evalOrig\right|$\;
      \If{\diffCur $<$ \diffMin}{
        \diffMin $\leftarrow$ \diffCur\;
        \optim $\leftarrow$ \opt\;
      }
    }
  }\label{algo:lmdk-sel-opt-end}
  \Return{\optim}
\end{algorithm}

Algorithm~\ref{algo:lmdk-sel-opt}, in particular, between Lines~{\ref{algo:lmdk-sel-opt-for-each}-\ref{algo:lmdk-sel-opt-end}} evaluates each option in \opts.
It finds the option that is the most \emph{similar} to the original (Lines~{\ref{algo:lmdk-sel-opt-comparison}-\ref{algo:lmdk-sel-opt-end}}), i.e.,~the option that has an evaluation that differs the least from that of the sequence $\{t_n\}$ with {\thethings} $\{l_k\}$.

\begin{algorithm}
  \caption{Heuristic dummy {\thething} set options selection}
  \label{algo:lmdk-sel-heur}

  \DontPrintSemicolon

  \KwData{$\{t_n\}, \{l_k\}$}
  \KwResult{\optim}
  \BlankLine

  % Evaluate the original
  \metricOrig $\leftarrow$ \calcMetric{$\{t_n\}, \emptyset, \{l_k\}$}\;
  \evalOrig $\leftarrow$ \evalSeq{\metricOrig}\;

  % Get all possible option combinations
  \optim $\leftarrow$ $[]$\;

  $\{l_{k'}\} \leftarrow \{l_k\}$\;

  \While{$\{l_{k'}\} \neq \{t_n\}$}{\label{algo:lmdk-sel-heur-while}
    % Track the minimum (best) evaluation
    \diffMin $\leftarrow$ $\infty$\;

    \optimi $\leftarrow$ $0$\;
    % Find the combinations for one more point
    \ForEach{\reg $\in \{t_n\} \setminus \{l_{k'}\}$}{

      % Evaluate current
      \metricCur $\leftarrow$ \calcMetric{$\{t_n\}, \reg, \{l_{k'}\}$}\;\label{algo:lmdk-sel-heur-comparison}
      \evalCur $\leftarrow$ \evalSeq{\metricCur}\;

      % Compare evaluations
      \diffCur $\leftarrow$ $\left|\evalCur - \evalOrig\right|$\;

      \If{\diffCur $<$ \diffMin}{
        \diffMin $\leftarrow$ \diffCur\;
        \optimi $\leftarrow$ \reg\;
      }\label{algo:lmdk-sel-heur-comparison-end}
    }

    % Save new point to landmarks
    $k' \leftarrow k' + 1$\;
    $l_{k'} \leftarrow \optimi$\;

    % Add new option
    \optim.add($\{l_{k'}\} \setminus \{l_k\}$)\;
  }\label{algo:lmdk-sel-heur-end}

  \Return{\optim}
\end{algorithm}

Algorithm~\ref{algo:lmdk-sel-heur}, follows an incremental methodology.
At each step it selects a new timestamp that corresponds to a regular ({non-\thething}) event from $\{t_n\} \setminus \{l_k\}$.
Similar to Algorithm~\ref{algo:lmdk-sel-opt}, the selection is done based on a predefined metric (Lines~{\ref{algo:lmdk-sel-heur-comparison}-\ref{algo:lmdk-sel-heur-comparison-end}}).
This process (Lines~{\ref{algo:lmdk-sel-heur-while}-\ref{algo:lmdk-sel-heur-end}}) goes on until we select a set that is equal to the size of the series of events, i.e.,~$\{l_{k'}\} = \{t_n\}$.

Note that the reverse heuristic approach, i.e.,~starting with $\{t_n\}$ {\thethings} and removing until $\{l_k\}$, performs worse than and occasionally the same with Algorithm~\ref{algo:lmdk-sel-heur}.


\subsubsection{Privacy-preserving option selection}
\label{subsec:lmdk-opt-sel}

% Nearby events
Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}.
Thus, taking into account more recent events with respect to {\thethings} can result in less privacy loss and better privacy protection overall.
This leads to worse data utility.

% Depending on the {\thething} discovery technique
The values of events near a {\thething} are usually similar to that of the latter.
Therefore, privacy-preserving mechanisms are likely to approximate their values based on the nearest {\thething} instead of investing extra privacy budget to perturb their actual values; thus, spending less privacy budget.
Saving privacy budget for releasing perturbed versions of actual event values can bring about better data utility. 

% Distant events
However, indicating the existence of randomized/dummy {\thethings} nearby actual {\thethings} can increase the adversarial confidence regarding the location of the latter within a series of events.
Hence, choosing randomized/dummy {\thethings} far from the actual {\thethings} (and thus less relevant) can limit the final privacy loss.
theotherthing: Solution 2021-10-10 19:49:47 +02:00			`\subsection{Protecting {\thethings}}`
			`\label{subsec:lmdk-sel-sol}`

problem: Intro of lmdk-sel-sol 2021-10-12 01:40:27 +02:00			`The main idea of the privacy-preserving {\thething} selection component is to privately select extra {\thething} event timestamps, i.e.,~dummy {\thethings}, from the set of timestamps $T /\ L$ of the time series $S_T$ and add them to the original {\thething} set $L$.`
			`Thus, we create a new set $L'$ such that $L \subset L' \subseteq T$.`
			`We generate a set of dummy {\thething} set options by adding regular event timestamps from $T /\ L$ to $L$ (Section~\ref{subsec:lmdk-set-opts}).`
			`Then (Section~\ref{subsec:lmdk-opt-sel}), we utilize the exponential mechanism, with a utility function that calculates an indicator for each of the options in the set based on how much it differs from the original {\thething} set $L$, and randomly select one ot the options that we created earlier.`
			`This process provides an extra layer of privacy protection to {\thethings}, and thus allows the release, and thereafter processing, of {\thething} timestamps.`
theotherthing: Solution 2021-10-10 19:49:47 +02:00
problem: Intro of lmdk-sel-sol 2021-10-12 01:40:27 +02:00			`% We utilize the exponential mechanism with a utility function that calculates an indicator for each of the options in the set that we selected in the previous step.`
			`% The utility depends on the positioning of the {\thething} timestamps of an option in the series, e.g.,~the distance from the previous/next {\thething}, the distance from the start/end of the series, etc.`
theotherthing: Solution 2021-10-10 19:49:47 +02:00

problem: Intro of lmdk-sel-sol 2021-10-12 01:40:27 +02:00			`\subsubsection{{\Thething} set options}`
			`\label{subsec:lmdk-set-opts}`
theotherthing: Solution 2021-10-10 19:49:47 +02:00
problem: Intro of lmdk-sel-sol 2021-10-12 01:40:27 +02:00			`This step aims to select a set of candidate {\thething} timestamps options either by randomizing the actual timestamps (Section~\ref{subsec:lmdk-rnd}), or by inserting dummy timestamps (Section~\ref{subsec:lmdk-dum-gen}) to the actual {\thething} timestamps.`
theotherthing: Solution 2021-10-10 19:49:47 +02:00

			`\paragraph{Dummy {\thething} generation}`
			`\label{subsec:lmdk-dum-gen}`

			`Selecting extra events, on top of the actual {\thethings}, as dummy {\thethings} can render actual ones indistinguishable.`
			`The goal is to select a list of sets with additional timestamps from a series of events at timestamps $\{t_n\}$ for a set of {\thethings} at $\{l_k\} \subseteq \{t_n\}$.`
			`Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} approach this problem with an optimal and heuristic methodology, respectively.`

			`Function \calcMetric measures an indicator for the union of $\{l_k\}$ and a timestamp combination from $\{t_n\} \setminus \{l_k\}$.`
			`Function \evalSeq evaluates the result of \calcMetric by, e.g.,~estimating the standard deviation of all the distances from the previous/next {\thething}.`
			`Function \getOpts returns all possible \emph{valid} sets of combinations \opt such that $\{l_{k+i}\} \subset \{l_{k+j}\}, \forall i, j \in [k, n] \mid i < j$, i.e.,~larger options must contain all of the timestamps that are present in smaller ones.`
			`Each combination contains a set of timestamps with sizes $k + 1, k + 2, \dots, n$, where each one of them is a combination of $\{l_k\}$ with $x \in [1, n - k]$ timestamps from $\{t_n\}$.`

			`\begin{algorithm}`
			`\caption{Optimal dummy {\thething} set options selection}`
			`\label{algo:lmdk-sel-opt}`

			`\DontPrintSemicolon`

			`\KwData{$\{t_n\}, \{l_k\}$}`

			`\SetKwInput{KwData}{Input}`

			`\KwResult{\optim}`
			`\BlankLine`

			`% Evaluate the original`
			`\metricOrig $\leftarrow$ \calcMetric{$\{t_n\}, \emptyset, \{l_k\}$}\;`
			`\evalOrig $\leftarrow$ \evalSeq{\metricOrig}\;`

			`% Get all possible option combinations`
			`\opts $\leftarrow$ \getOpts{$\{t_n\}, \{l_k\}$}\;`

			`% Track the minimum (best) evaluation`
			`\diffMin $\leftarrow$ $\infty$\;`

			`% Track the optimal sequence (the one with the best evaluation)`
			`\optim $\leftarrow$ $[]$\;`

			`\ForEach{\opt $\in$ \opts}{\label{algo:lmdk-sel-opt-for-each}`
			`\evalSum $\leftarrow 0$\;`
			`\ForEach{\opti $\in$ \opt}{`
			`\metricCur $\leftarrow$ \calcMetric{$\{t_n\}, \opti, \{l_k\}$}\;\label{algo:lmdk-sel-opt-comparison}`
			`\evalSum $\leftarrow$ \evalSum $+$ \evalSeq{\metricCur}\;`

			`% Compare with current optimal`
			`\diffCur $\leftarrow \left\|\evalSum/\#\opt - \evalOrig\right\|$\;`
			`\If{\diffCur $<$ \diffMin}{`
			`\diffMin $\leftarrow$ \diffCur\;`
			`\optim $\leftarrow$ \opt\;`
			`}`
			`}`
			`}\label{algo:lmdk-sel-opt-end}`
			`\Return{\optim}`
			`\end{algorithm}`

			`Algorithm~\ref{algo:lmdk-sel-opt}, in particular, between Lines~{\ref{algo:lmdk-sel-opt-for-each}-\ref{algo:lmdk-sel-opt-end}} evaluates each option in \opts.`
			`It finds the option that is the most \emph{similar} to the original (Lines~{\ref{algo:lmdk-sel-opt-comparison}-\ref{algo:lmdk-sel-opt-end}}), i.e.,~the option that has an evaluation that differs the least from that of the sequence $\{t_n\}$ with {\thethings} $\{l_k\}$.`

			`\begin{algorithm}`
			`\caption{Heuristic dummy {\thething} set options selection}`
			`\label{algo:lmdk-sel-heur}`

			`\DontPrintSemicolon`

			`\KwData{$\{t_n\}, \{l_k\}$}`
			`\KwResult{\optim}`
			`\BlankLine`

			`% Evaluate the original`
			`\metricOrig $\leftarrow$ \calcMetric{$\{t_n\}, \emptyset, \{l_k\}$}\;`
			`\evalOrig $\leftarrow$ \evalSeq{\metricOrig}\;`

			`% Get all possible option combinations`
			`\optim $\leftarrow$ $[]$\;`

			`$\{l_{k'}\} \leftarrow \{l_k\}$\;`

			`\While{$\{l_{k'}\} \neq \{t_n\}$}{\label{algo:lmdk-sel-heur-while}`
			`% Track the minimum (best) evaluation`
			`\diffMin $\leftarrow$ $\infty$\;`

			`\optimi $\leftarrow$ $0$\;`
			`% Find the combinations for one more point`
			`\ForEach{\reg $\in \{t_n\} \setminus \{l_{k'}\}$}{`

			`% Evaluate current`
			`\metricCur $\leftarrow$ \calcMetric{$\{t_n\}, \reg, \{l_{k'}\}$}\;\label{algo:lmdk-sel-heur-comparison}`
			`\evalCur $\leftarrow$ \evalSeq{\metricCur}\;`

			`% Compare evaluations`
			`\diffCur $\leftarrow$ $\left\|\evalCur - \evalOrig\right\|$\;`

			`\If{\diffCur $<$ \diffMin}{`
			`\diffMin $\leftarrow$ \diffCur\;`
			`\optimi $\leftarrow$ \reg\;`
			`}\label{algo:lmdk-sel-heur-comparison-end}`
			`}`

			`% Save new point to landmarks`
			`$k' \leftarrow k' + 1$\;`
			`$l_{k'} \leftarrow \optimi$\;`

			`% Add new option`
			`\optim.add($\{l_{k'}\} \setminus \{l_k\}$)\;`
			`}\label{algo:lmdk-sel-heur-end}`

			`\Return{\optim}`
			`\end{algorithm}`

			`Algorithm~\ref{algo:lmdk-sel-heur}, follows an incremental methodology.`
			`At each step it selects a new timestamp that corresponds to a regular ({non-\thething}) event from $\{t_n\} \setminus \{l_k\}$.`
			`Similar to Algorithm~\ref{algo:lmdk-sel-opt}, the selection is done based on a predefined metric (Lines~{\ref{algo:lmdk-sel-heur-comparison}-\ref{algo:lmdk-sel-heur-comparison-end}}).`
			`This process (Lines~{\ref{algo:lmdk-sel-heur-while}-\ref{algo:lmdk-sel-heur-end}}) goes on until we select a set that is equal to the size of the series of events, i.e.,~$\{l_{k'}\} = \{t_n\}$.`

			`Note that the reverse heuristic approach, i.e.,~starting with $\{t_n\}$ {\thethings} and removing until $\{l_k\}$, performs worse than and occasionally the same with Algorithm~\ref{algo:lmdk-sel-heur}.`


			`\subsubsection{Privacy-preserving option selection}`
problem: Intro of lmdk-sel-sol 2021-10-12 01:40:27 +02:00			`\label{subsec:lmdk-opt-sel}`
theotherthing: Solution 2021-10-10 19:49:47 +02:00
			`% Nearby events`
			`Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}.`
			`Thus, taking into account more recent events with respect to {\thethings} can result in less privacy loss and better privacy protection overall.`
			`This leads to worse data utility.`

			`% Depending on the {\thething} discovery technique`
			`The values of events near a {\thething} are usually similar to that of the latter.`
			`Therefore, privacy-preserving mechanisms are likely to approximate their values based on the nearest {\thething} instead of investing extra privacy budget to perturb their actual values; thus, spending less privacy budget.`
			`Saving privacy budget for releasing perturbed versions of actual event values can bring about better data utility.`

			`% Distant events`
			`However, indicating the existence of randomized/dummy {\thethings} nearby actual {\thethings} can increase the adversarial confidence regarding the location of the latter within a series of events.`
			`Hence, choosing randomized/dummy {\thethings} far from the actual {\thethings} (and thus less relevant) can limit the final privacy loss.`