the-last-thing/text/problem/theotherthing/solution.tex

\subsection{Protecting {\thethings}}
\label{subsec:lmdk-sel-sol}

The main idea of the privacy-preserving {\thething} selection component is to privately select extra {\thething} event timestamps, i.e.,~dummy {\thethings}, from the set of timestamps $T /\ L$ of the time series $S_T$ and add them to the original {\thething} set $L$.
Selecting extra events, on top of the actual {\thethings}, as dummy {\thethings} can render actual ones indistinguishable.
The goal is to select a list of sets with additional timestamps from a series of events at timestamps $T$ for a set of {\thethings} at $L \subseteq T$.
Thus, we create a new set $L'$ such that $L \subset L' \subseteq T$.

First, we generate a set of dummy {\thething} set options by adding regular event timestamps from $T /\ L$ to $L$ (Section~\ref{subsec:lmdk-set-opts}).
Then, we utilize the exponential mechanism, with a utility function that calculates an indicator for each of the options in the set based on how much it differs from the original {\thething} set $L$, and randomly select one of the options (Section~\ref{subsec:lmdk-opt-sel}).
This process provides an extra layer of privacy protection to {\thethings}, and thus allows the release, and thereafter processing, of {\thething} timestamps.

% We utilize the exponential mechanism with a utility function that calculates an indicator for each of the options in the set that we selected in the previous step.
% The utility depends on the positioning of the {\thething} timestamps of an option in the series, e.g.,~the distance from the previous/next {\thething}, the distance from the start/end of the series, etc.


\subsubsection{{\Thething} set options generation}
\label{subsec:lmdk-set-opts}

Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} approach this problem with an optimal and heuristic methodology, respectively.
Function \evalSeq evaluates the result of the union of $L$ and a timestamp combination from $T \setminus L$ by, e.g.,~estimating the standard deviation of all the distances from the previous/next {\thething}.
\getOpts returns all the possible \emph{valid} sets of combinations \opt such that larger options contain all of the timestamps that are present in smaller ones.
Each combination contains a set of timestamps with sizes $\left|L\right| + 1, \left|L\right| + 2, \dots, \left|T\right|$, where each one of them is a combination of $L$ with $x \in [1, \left|T\right| - \left|L\right|]$ timestamps from $T$.

\paragraph{Optimal}
Algorithm~\ref{algo:lmdk-sel-opt}, between Lines~{\ref{algo:lmdk-sel-opt-for-each}--\ref{algo:lmdk-sel-opt-end}} evaluates each option in \opts.
It finds the option that is the most \emph{similar} to the original (Lines~{\ref{algo:lmdk-sel-opt-comparison}-\ref{algo:lmdk-sel-opt-end}}), i.e.,~the option that has an evaluation that differs the least from that of the sequence $T$ with {\thethings} $L$.

\begin{algorithm}
  \caption{Optimal dummy {\thething} set options generation}
  \label{algo:lmdk-sel-opt}

  \DontPrintSemicolon

  \KwData{$T, L$}

  \SetKwInput{KwData}{Input}

  \KwResult{\optim}
  \BlankLine

  % Evaluate the original
  \evalOrig $\leftarrow$ \evalSeq{$T, \emptyset, L$}\;

  % Track the minimum (best) evaluation
  \diffMin $\leftarrow$ $\infty$\;

  % Track the optimal sequence (the one with the best evaluation)
  \opts $\leftarrow$ $[]$\;

  \ForEach{\opt $\in$ \getOpts{$T, L$}}{ \label{algo:lmdk-sel-opt-for-each}
    \evalCur $\leftarrow 0$\;
    \ForEach{\opti $\in$ \opt}{
      \evalCur $\leftarrow$ \evalCur $+$ \evalSeq{$T, \opti, L$}/\#\opt\; \label{algo:lmdk-sel-opt-comparison}
    }
    % Compare with current optimal
    \diffCur $\leftarrow \left|\evalCur - \evalOrig\right|$\;
    \If{\diffCur $<$ \diffMin}{
      \diffMin $\leftarrow$ \diffCur\;
      \opts $\leftarrow$ \opt\;
    }
  } \label{algo:lmdk-sel-opt-end}
  \Return{\opts}
\end{algorithm}

Algorithm~\ref{algo:lmdk-sel-opt} guarantees to return the optimal set of dummy {\thethings} with regard to the original set $L$.
However, it is rather costly in terms of complexity: given $n$ regular events and a combination of size $r$, it requires $\mathcal{O}(C(n, r) + 2^C(n, r))$ time and $\mathcal{O}(r*C(n, r))$ space.
Next, we present a heuristic solution with improved time and space requirements.


\paragraph{Heuristic}
Algorithm~\ref{algo:lmdk-sel-heur}, follows an incremental methodology.
At each step it selects a new timestamp, that corresponds to a regular ({non-\thething}) event from $T \setminus L$, to create an option.

\begin{algorithm}
  \caption{Heuristic dummy {\thething} set options selection}
  \label{algo:lmdk-sel-heur}

  \DontPrintSemicolon

  \KwData{$T, L$}
  \KwResult{\opts}
  \BlankLine

  % Evaluate the original
  \evalOrig $\leftarrow$ \evalSeq{$T, \emptyset, L$}\;

  % Get all possible option combinations
  \opts $\leftarrow$ $[]$\;

  $L' \leftarrow L$\;

  \While{$L' \neq T$}{\label{algo:lmdk-sel-heur-while}
    % Track the minimum (best) evaluation
    \diffMin $\leftarrow$ $\infty$\;

    \optimi $\leftarrow$ Null\;
    % Find the combinations for one more point
    \ForEach{\reg $\in T \setminus L'$}{

      % Evaluate current
      \evalCur $\leftarrow$ \evalSeq{$T, \reg, L'$}\; \label{algo:lmdk-sel-heur-comparison}

      % Compare evaluations
      \diffCur $\leftarrow$ $\left|\evalCur - \evalOrig\right|$\;

      \If{\diffCur $<$ \diffMin}{
        \diffMin $\leftarrow$ \diffCur\;
        \optimi $\leftarrow$ \reg\;
      }\label{algo:lmdk-sel-heur-cmp-end}
    }

    % Save new point to landmarks
    $L'$.add(\optimi)\;

    % Add new option
    \opts.append($L' \setminus L$)\;
  }\label{algo:lmdk-sel-heur-end}

  \Return{\opts}
\end{algorithm}

Similar to Algorithm~\ref{algo:lmdk-sel-opt}, it selects new options based on a predefined metric (Lines~{\ref{algo:lmdk-sel-heur-comparison}-\ref{algo:lmdk-sel-heur-cmp-end}}).
This process (Lines~{\ref{algo:lmdk-sel-heur-while}-\ref{algo:lmdk-sel-heur-end}}) goes on until we select a set that is equal to the size of the series of events, i.e.,~$L' = T$.

In terms of complexity, given $n$ regular events it requires $\mathcal{O}(n^2)$ time and space.
Note that the reverse heuristic approach, i.e.,~starting with $T$ {\thethings} and removing until $L$, performs similarly with Algorithm~\ref{algo:lmdk-sel-heur}.


\paragraph{Partitioned}
We improve the complexity of Algorithm~\ref{algo:lmdk-sel-opt} by partitioning the {\thething} timestamp sequence $L$.
Algorithm~\ref{algo:lmdk-sel-hist}, \getHist generates a histogram from $L$ with bins of size \h.
We find \h by using the Freedman–Diaconis rule which is resilient to outliers and takes into account the data variability and data size~\cite{meshgi2015expanding}.
For every possible histogram version, the \getDiff function finds the difference between two histograms; for this operation we utilize the Euclidean distance~(see Section~\ref{subsec:sel-utl} for more details).

\begin{algorithm}
  \caption{Partitioned dummy {\thething} set options selection}
  \label{algo:lmdk-sel-hist}

  \DontPrintSemicolon

  \KwData{$T, L$}
  \KwResult{\opts}
  \BlankLine

  \hist, \h $\leftarrow$ \getHist{$T, L$}\;

  \histCur $\leftarrow$ hist\;

  \opts $\leftarrow$ $[]$\;

  \While{sum($L'$) $\neq$ len($T$)}{ \label{algo:lmdk-sel-hist-while}
    % Track the minimum (best) evaluation
    \diffMin $\leftarrow$ $\infty$\;

    % The candidate option
    \opt $\leftarrow$ \histCur\;

    % Check every possibility
    \ForEach{\hi \reg $L'$}{ \label{algo:lmdk-sel-hist-cmp-start}

      % Can we add one more point?
      \If{\hi $+$ $1$ $\leq$ \h}{
        \histTmp $\leftarrow$ \histCur\;
        \histTmp$[i]$ $\leftarrow$ \histTmp$[i]$ $+$ $1$\;
        % Find difference from original
        \diffCur $\leftarrow$ \getDiff{\hist, \histTmp}\;

        % Remember if it is the best that you've seen
        \If{\diffCur $<$ \diffMin}{ \label{algo:lmdk-sel-hist-cmp}
          \diffMin $\leftarrow$ \diffCur\;
          \opt $\leftarrow$ \histTmp\;
        }

      }

    } \label{algo:lmdk-sel-hist-cmp-end}

    % Update current histogram
    \histCur $\leftarrow$ \opt\;
    % Add current best to options
    \opts $\leftarrow$ \opt\;

  } \label{algo:lmdk-sel-hist-end}

  \Return{\opts}
\end{algorithm}

In Lines~{\ref{algo:lmdk-sel-hist-cmp-start}-\ref{algo:lmdk-sel-hist-cmp-end}} we check every possible histogram version by incrementing each bin by $1$ and comparing it to the original (Line~\ref{algo:lmdk-sel-hist-cmp}).
In the end of the process, we return \opts which contains all the versions of \hist that are closest to \hist for all possible sizes of \hist.


\subsubsection{Privacy-preserving option selection}
\label{subsec:lmdk-opt-sel}

The Algorithms of Section~\ref{subsec:lmdk-set-opts} return a set of possible versions of the original {\thething} set $L$ by adding extra timestamps in it from the series of events at timestamps $T \supseteq L$.
In the next step of the process, we randomly select a set by utilizing the exponential mechanism (Section~\ref{subsec:prv-mech}).
For this procedure, we allocate a small fraction of the available privacy budget, i.e.,~$1$\% or even less (see Section~\ref{subsec:sel-eps} for more details).


\paragraph{Score function}
Prior to selecting a set, the exponential mechanism evaluates each set using a score function.

One way evaluate each set is by taking into account the temporal position the events in the sequence.
% Nearby events
Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}.
Thus, taking into account more recent events with respect to {\thethings} can result in less privacy loss and better privacy protection overall.
This leads to worse data utility.
% Depending on the {\thething} discovery technique
The values of events near a {\thething} are usually similar to that of the latter.
Therefore, privacy-preserving mechanisms are likely to approximate their values based on the nearest {\thething} instead of investing extra privacy budget to perturb their actual values; thus, spending less privacy budget.
Saving privacy budget for releasing perturbed versions of actual event values can bring about better data utility. 
% Distant events
However, indicating the existence of dummy {\thethings} nearby actual {\thethings} can increase the adversarial confidence regarding the location of the latter within a series of events.
Hence, choosing dummy {\thethings} far from the actual {\thethings} (and thus less relevant) can limit the final privacy loss.

Another approach for the score function is to consider the number of events in each set.
On the one hand, sets with more dummy {\thethings} may render actual {\thethings} more indistinguishable probabilistically.
That is due to the fact that, it is harder for an adversary to pick a {\thething} when the ratio of {\thethings} to the size of the set gets lower.
On the other hand, more dummy {\thethings} lead to distributing the privacy budget to more events, and therefore investing less at each timestamp.
Thus, providing a better level of privacy protection.


\paragraph{Option release}
The options that Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} generate contain actual timestamps which can be utilized directly by the {\thething} privacy mechanisms that we presented in Section~\ref{subsec:lmdk-mechs}.
However, Algorithm~\ref{algo:lmdk-sel-hist} returns histograms instead of timestamps.
Therefore, we need to process the result of the exponential mechanism further by creating a sample from the true {\thethings} and populating it with the remaining amount of choices, i.e.,~$\left|L'\right| - \left|L\right|$ by performing sampling without replacement from the resulting option $L$.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								\subsection{Protecting {\thethings}}
 								\label{subsec:lmdk-sel-sol}
-												problem: Intro of lmdk-sel-sol

											
										
										
											2021-10-12 01:40:27 +02:00
+								The main idea of the privacy-preserving {\thething} selection component is to privately select extra {\thething} event timestamps, i.e.,~dummy {\thethings}, from the set of timestamps $T /\ L$ of the time series $S_T$ and add them to the original {\thething} set $L$.
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								Selecting extra events, on top of the actual {\thethings}, as dummy {\thethings} can render actual ones indistinguishable.
 								The goal is to select a list of sets with additional timestamps from a series of events at timestamps $T$ for a set of {\thethings} at $L \subseteq T$.
-												problem: Intro of lmdk-sel-sol

											
										
										
											2021-10-12 01:40:27 +02:00
+								Thus, we create a new set $L'$ such that $L \subset L' \subseteq T$.
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
 								First, we generate a set of dummy {\thething} set options by adding regular event timestamps from $T /\ L$ to $L$ (Section~\ref{subsec:lmdk-set-opts}).
 								Then, we utilize the exponential mechanism, with a utility function that calculates an indicator for each of the options in the set based on how much it differs from the original {\thething} set $L$, and randomly select one of the options (Section~\ref{subsec:lmdk-opt-sel}).
-												problem: Intro of lmdk-sel-sol

											
										
										
											2021-10-12 01:40:27 +02:00
+								This process provides an extra layer of privacy protection to {\thethings}, and thus allows the release, and thereafter processing, of {\thething} timestamps.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
-												problem: Intro of lmdk-sel-sol

											
										
										
											2021-10-12 01:40:27 +02:00
+								% We utilize the exponential mechanism with a utility function that calculates an indicator for each of the options in the set that we selected in the previous step.
 								% The utility depends on the positioning of the {\thething} timestamps of an option in the series, e.g.,~the distance from the previous/next {\thething}, the distance from the start/end of the series, etc.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								\subsubsection{{\Thething} set options generation}
-												problem: Intro of lmdk-sel-sol

											
										
										
											2021-10-12 01:40:27 +02:00
+								\label{subsec:lmdk-set-opts}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} approach this problem with an optimal and heuristic methodology, respectively.
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								Function \evalSeq evaluates the result of the union of $L$ and a timestamp combination from $T \setminus L$ by, e.g.,~estimating the standard deviation of all the distances from the previous/next {\thething}.
 								\getOpts returns all the possible \emph{valid} sets of combinations \opt such that larger options contain all of the timestamps that are present in smaller ones.
 								Each combination contains a set of timestamps with sizes $\left|L\right| + 1, \left|L\right| + 2, \dots, \left|T\right|$, where each one of them is a combination of $L$ with $x \in [1, \left|T\right| - \left|L\right|]$ timestamps from $T$.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								\paragraph{Optimal}
 								Algorithm~\ref{algo:lmdk-sel-opt}, between Lines~{\ref{algo:lmdk-sel-opt-for-each}--\ref{algo:lmdk-sel-opt-end}} evaluates each option in \opts.
 								It finds the option that is the most \emph{similar} to the original (Lines~{\ref{algo:lmdk-sel-opt-comparison}-\ref{algo:lmdk-sel-opt-end}}), i.e.,~the option that has an evaluation that differs the least from that of the sequence $T$ with {\thethings} $L$.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								\begin{algorithm}
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  \caption{Optimal dummy {\thething} set options generation}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								  \label{algo:lmdk-sel-opt}
 								  \DontPrintSemicolon
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  \KwData{$T, L$}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								  \SetKwInput{KwData}{Input}
 								  \KwResult{\optim}
 								  \BlankLine
 								  % Evaluate the original
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  \evalOrig $\leftarrow$ \evalSeq{$T, \emptyset, L$}\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								  % Track the minimum (best) evaluation
 								  \diffMin $\leftarrow$ $\infty$\;
 								  % Track the optimal sequence (the one with the best evaluation)
-												problem: Wrote lmdk-opt-sel

											
										
										
											2021-10-12 12:01:20 +02:00
+								  \opts $\leftarrow$ $[]$\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
-												problem: Wrote lmdk-opt-sel

											
										
										
											2021-10-12 12:01:20 +02:00
+								  \ForEach{\opt $\in$ \getOpts{$T, L$}}{ \label{algo:lmdk-sel-opt-for-each}
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								    \evalCur $\leftarrow 0$\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								    \ForEach{\opti $\in$ \opt}{
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								      \evalCur $\leftarrow$ \evalCur $+$ \evalSeq{$T, \opti, L$}/\#\opt\; \label{algo:lmdk-sel-opt-comparison}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								    }
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								    % Compare with current optimal
 								    \diffCur $\leftarrow \left|\evalCur - \evalOrig\right|$\;
 								    \If{\diffCur $<$ \diffMin}{
 								      \diffMin $\leftarrow$ \diffCur\;
-												problem: Wrote lmdk-opt-sel

											
										
										
											2021-10-12 12:01:20 +02:00
+								      \opts $\leftarrow$ \opt\;
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								    }
 								  } \label{algo:lmdk-sel-opt-end}
-												problem: Wrote lmdk-opt-sel

											
										
										
											2021-10-12 12:01:20 +02:00
+								  \Return{\opts}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								\end{algorithm}
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								Algorithm~\ref{algo:lmdk-sel-opt} guarantees to return the optimal set of dummy {\thethings} with regard to the original set $L$.
 								However, it is rather costly in terms of complexity: given $n$ regular events and a combination of size $r$, it requires $\mathcal{O}(C(n, r) + 2^C(n, r))$ time and $\mathcal{O}(r*C(n, r))$ space.
 								Next, we present a heuristic solution with improved time and space requirements.
 								\paragraph{Heuristic}
 								Algorithm~\ref{algo:lmdk-sel-heur}, follows an incremental methodology.
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								At each step it selects a new timestamp, that corresponds to a regular ({non-\thething}) event from $T \setminus L$, to create an option.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								\begin{algorithm}
 								  \caption{Heuristic dummy {\thething} set options selection}
 								  \label{algo:lmdk-sel-heur}
 								  \DontPrintSemicolon
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  \KwData{$T, L$}
-												problem: Wrote lmdk-opt-sel

											
										
										
											2021-10-12 12:01:20 +02:00
+								  \KwResult{\opts}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								  \BlankLine
 								  % Evaluate the original
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  \evalOrig $\leftarrow$ \evalSeq{$T, \emptyset, L$}\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								  % Get all possible option combinations
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								  \opts $\leftarrow$ $[]$\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  $L' \leftarrow L$\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  \While{$L' \neq T$}{\label{algo:lmdk-sel-heur-while}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								    % Track the minimum (best) evaluation
 								    \diffMin $\leftarrow$ $\infty$\;
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								    \optimi $\leftarrow$ Null\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								    % Find the combinations for one more point
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								    \ForEach{\reg $\in T \setminus L'$}{
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								      % Evaluate current
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								      \evalCur $\leftarrow$ \evalSeq{$T, \reg, L'$}\; \label{algo:lmdk-sel-heur-comparison}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								      % Compare evaluations
 								      \diffCur $\leftarrow$ $\left|\evalCur - \evalOrig\right|$\;
 								      \If{\diffCur $<$ \diffMin}{
 								        \diffMin $\leftarrow$ \diffCur\;
 								        \optimi $\leftarrow$ \reg\;
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								      }\label{algo:lmdk-sel-heur-cmp-end}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								    }
 								    % Save new point to landmarks
-												OCD

											
										
										
											2021-10-12 04:31:54 +02:00
+								    $L'$.add(\optimi)\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								    % Add new option
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								    \opts.append($L' \setminus L$)\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								  }\label{algo:lmdk-sel-heur-end}
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								  \Return{\opts}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								\end{algorithm}
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								Similar to Algorithm~\ref{algo:lmdk-sel-opt}, it selects new options based on a predefined metric (Lines~{\ref{algo:lmdk-sel-heur-comparison}-\ref{algo:lmdk-sel-heur-cmp-end}}).
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								This process (Lines~{\ref{algo:lmdk-sel-heur-while}-\ref{algo:lmdk-sel-heur-end}}) goes on until we select a set that is equal to the size of the series of events, i.e.,~$L' = T$.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								In terms of complexity, given $n$ regular events it requires $\mathcal{O}(n^2)$ time and space.
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								Note that the reverse heuristic approach, i.e.,~starting with $T$ {\thethings} and removing until $L$, performs similarly with Algorithm~\ref{algo:lmdk-sel-heur}.
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								\paragraph{Partitioned}
 								We improve the complexity of Algorithm~\ref{algo:lmdk-sel-opt} by partitioning the {\thething} timestamp sequence $L$.
 								Algorithm~\ref{algo:lmdk-sel-hist}, \getHist generates a histogram from $L$ with bins of size \h.
 								We find \h by using the Freedman–Diaconis rule which is resilient to outliers and takes into account the data variability and data size~\cite{meshgi2015expanding}.
 								For every possible histogram version, the \getDiff function finds the difference between two histograms; for this operation we utilize the Euclidean distance~(see Section~\ref{subsec:sel-utl} for more details).
 								\begin{algorithm}
 								  \caption{Partitioned dummy {\thething} set options selection}
 								  \label{algo:lmdk-sel-hist}
 								  \DontPrintSemicolon
 								  \KwData{$T, L$}
 								  \KwResult{\opts}
 								  \BlankLine
 								  \hist, \h $\leftarrow$ \getHist{$T, L$}\;
 								  \histCur $\leftarrow$ hist\;
 								  \opts $\leftarrow$ $[]$\;
 								  \While{sum($L'$) $\neq$ len($T$)}{ \label{algo:lmdk-sel-hist-while}
 								    % Track the minimum (best) evaluation
 								    \diffMin $\leftarrow$ $\infty$\;
 								    % The candidate option
 								    \opt $\leftarrow$ \histCur\;
 								    % Check every possibility
 								    \ForEach{\hi \reg $L'$}{ \label{algo:lmdk-sel-hist-cmp-start}
 								      % Can we add one more point?
 								      \If{\hi $+$ $1$ $\leq$ \h}{
 								        \histTmp $\leftarrow$ \histCur\;
 								        \histTmp$[i]$ $\leftarrow$ \histTmp$[i]$ $+$ $1$\;
 								        % Find difference from original
 								        \diffCur $\leftarrow$ \getDiff{\hist, \histTmp}\;
 								        % Remember if it is the best that you've seen
 								        \If{\diffCur $<$ \diffMin}{ \label{algo:lmdk-sel-hist-cmp}
 								          \diffMin $\leftarrow$ \diffCur\;
 								          \opt $\leftarrow$ \histTmp\;
 								        }
 								      }
 								    } \label{algo:lmdk-sel-hist-cmp-end}
 								    % Update current histogram
 								    \histCur $\leftarrow$ \opt\;
 								    % Add current best to options
 								    \opts $\leftarrow$ \opt\;
 								  } \label{algo:lmdk-sel-hist-end}
 								  \Return{\opts}
 								\end{algorithm}
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
-												problem: Additional remarks in lmdk-sel-sol

											
										
										
											2021-10-12 13:00:26 +02:00
+								In Lines~{\ref{algo:lmdk-sel-hist-cmp-start}-\ref{algo:lmdk-sel-hist-cmp-end}} we check every possible histogram version by incrementing each bin by $1$ and comparing it to the original (Line~\ref{algo:lmdk-sel-hist-cmp}).
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								In the end of the process, we return \opts which contains all the versions of \hist that are closest to \hist for all possible sizes of \hist.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								\subsubsection{Privacy-preserving option selection}
-												problem: Intro of lmdk-sel-sol

											
										
										
											2021-10-12 01:40:27 +02:00
+								\label{subsec:lmdk-opt-sel}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
-												problem: Wrote lmdk-opt-sel

											
										
										
											2021-10-12 12:01:20 +02:00
+								The Algorithms of Section~\ref{subsec:lmdk-set-opts} return a set of possible versions of the original {\thething} set $L$ by adding extra timestamps in it from the series of events at timestamps $T \supseteq L$.
 								In the next step of the process, we randomly select a set by utilizing the exponential mechanism (Section~\ref{subsec:prv-mech}).
-												problem: Added the conclusion of sel-eps

											
										
										
											2021-10-13 09:01:07 +02:00
+								For this procedure, we allocate a small fraction of the available privacy budget, i.e.,~$1$\% or even less (see Section~\ref{subsec:sel-eps} for more details).
-												problem: Additional remarks in lmdk-sel-sol

											
										
										
											2021-10-12 13:00:26 +02:00
 								\paragraph{Score function}
-												problem: Wrote lmdk-opt-sel

											
										
										
											2021-10-12 12:01:20 +02:00
+								Prior to selecting a set, the exponential mechanism evaluates each set using a score function.
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
-												problem: Wrote lmdk-opt-sel

											
										
										
											2021-10-12 12:01:20 +02:00
+								One way evaluate each set is by taking into account the temporal position the events in the sequence.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								% Nearby events
 								Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}.
 								Thus, taking into account more recent events with respect to {\thethings} can result in less privacy loss and better privacy protection overall.
 								This leads to worse data utility.
 								% Depending on the {\thething} discovery technique
 								The values of events near a {\thething} are usually similar to that of the latter.
 								Therefore, privacy-preserving mechanisms are likely to approximate their values based on the nearest {\thething} instead of investing extra privacy budget to perturb their actual values; thus, spending less privacy budget.
 								Saving privacy budget for releasing perturbed versions of actual event values can bring about better data utility.
 								% Distant events
-												problem: Wrote lmdk-opt-sel

											
										
										
											2021-10-12 12:01:20 +02:00
+								However, indicating the existence of dummy {\thethings} nearby actual {\thethings} can increase the adversarial confidence regarding the location of the latter within a series of events.
 								Hence, choosing dummy {\thethings} far from the actual {\thethings} (and thus less relevant) can limit the final privacy loss.
 								Another approach for the score function is to consider the number of events in each set.
 								On the one hand, sets with more dummy {\thethings} may render actual {\thethings} more indistinguishable probabilistically.
 								That is due to the fact that, it is harder for an adversary to pick a {\thething} when the ratio of {\thethings} to the size of the set gets lower.
 								On the other hand, more dummy {\thethings} lead to distributing the privacy budget to more events, and therefore investing less at each timestamp.
 								Thus, providing a better level of privacy protection.
-												problem: Additional remarks in lmdk-sel-sol

											
										
										
											2021-10-12 13:00:26 +02:00
 								\paragraph{Option release}
 								The options that Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} generate contain actual timestamps which can be utilized directly by the {\thething} privacy mechanisms that we presented in Section~\ref{subsec:lmdk-mechs}.
 								However, Algorithm~\ref{algo:lmdk-sel-hist} returns histograms instead of timestamps.
 								Therefore, we need to process the result of the exponential mechanism further by creating a sample from the true {\thethings} and populating it with the remaining amount of choices, i.e.,~$\left|L'\right| - \left|L\right|$ by performing sampling without replacement from the resulting option $L$.