the-last-thing/text/problem/theotherthing/solution.tex

\subsection{Protecting {\thethings}}
\label{subsec:lmdk-sel-sol}

The main idea of the privacy-preserving {\thething} selection component is to privately select extra {\thething} event timestamps, i.e.,~dummy {\thethings}, from the set of timestamps $T /\ L$ of the time series $S_T$ and add them to the original {\thething} set $L$.
Selecting extra events, on top of the actual {\thethings}, as dummy {\thethings} can render actual ones indistinguishable.
The goal is to select a list of sets with additional timestamps from a series of events at timestamps $T$ for a set of {\thethings} at $L \subseteq T$.
Thus, we create a new set $L'$ such that $L \subset L' \subseteq T$.

First, we generate a set of dummy {\thething} set options by adding regular event timestamps from $T /\ L$ to $L$ (Section~\ref{subsec:lmdk-set-opts}).
Then, we utilize the exponential mechanism, with a utility function that calculates an indicator for each of the options in the set based on how much it differs from the original {\thething} set $L$, and randomly select one of the options (Section~\ref{subsec:lmdk-opt-sel}).
This process provides an extra layer of privacy protection to {\thethings}, and thus allows the release, and thereafter processing, of {\thething} timestamps.

% We utilize the exponential mechanism with a utility function that calculates an indicator for each of the options in the set that we selected in the previous step.
% The utility depends on the positioning of the {\thething} timestamps of an option in the series, e.g.,~the distance from the previous/next {\thething}, the distance from the start/end of the series, etc.


\subsubsection{{\Thething} set options generation}
\label{subsec:lmdk-set-opts}

Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} approach this problem with an optimal and heuristic methodology, respectively.
Function \evalSeq evaluates the result of the union of $L$ and a timestamp combination from $T \setminus L$ by, e.g.,~estimating the standard deviation of all the distances from the previous/next {\thething}.
\getOpts returns all the possible \emph{valid} sets of combinations \opt such that larger options contain all of the timestamps that are present in smaller ones.
Each combination contains a set of timestamps with sizes $\left|L\right| + 1, \left|L\right| + 2, \dots, \left|T\right|$, where each one of them is a combination of $L$ with $x \in [1, \left|T\right| - \left|L\right|]$ timestamps from $T$.

\paragraph{Optimal}
Algorithm~\ref{algo:lmdk-sel-opt}, between Lines~{\ref{algo:lmdk-sel-opt-for-each}--\ref{algo:lmdk-sel-opt-end}} evaluates each option in \opts.
It finds the option that is the most \emph{similar} to the original (Lines~{\ref{algo:lmdk-sel-opt-comparison}-\ref{algo:lmdk-sel-opt-end}}), i.e.,~the option that has an evaluation that differs the least from that of the sequence $T$ with {\thethings} $L$.

\begin{algorithm}
  \caption{Optimal dummy {\thething} set options generation}
  \label{algo:lmdk-sel-opt}

  \DontPrintSemicolon

  \KwData{$T, L$}

  \SetKwInput{KwData}{Input}

  \KwResult{\optim}
  \BlankLine

  % Evaluate the original
  \evalOrig $\leftarrow$ \evalSeq{$T, \emptyset, L$}\;

  % Get all possible option combinations
  \opts $\leftarrow$ \getOpts{$T, L$}\;

  % Track the minimum (best) evaluation
  \diffMin $\leftarrow$ $\infty$\;

  % Track the optimal sequence (the one with the best evaluation)
  \optim $\leftarrow$ $[]$\;

  \ForEach{\opt $\in$ \opts}{ \label{algo:lmdk-sel-opt-for-each}
    \evalCur $\leftarrow 0$\;
    \ForEach{\opti $\in$ \opt}{
      \evalCur $\leftarrow$ \evalCur $+$ \evalSeq{$T, \opti, L$}/\#\opt\; \label{algo:lmdk-sel-opt-comparison}
    }
    % Compare with current optimal
    \diffCur $\leftarrow \left|\evalCur - \evalOrig\right|$\;
    \If{\diffCur $<$ \diffMin}{
      \diffMin $\leftarrow$ \diffCur\;
      \optim $\leftarrow$ \opt\;
    }
  } \label{algo:lmdk-sel-opt-end}
  \Return{\optim}
\end{algorithm}

Algorithm~\ref{algo:lmdk-sel-opt} guarantees to return the optimal set of dummy {\thethings} with regard to the original set $L$.
However, it is rather costly in terms of complexity: given $n$ regular events and a combination of size $r$, it requires $\mathcal{O}(C(n, r) + 2^C(n, r))$ time and $\mathcal{O}(r*C(n, r))$ space.
Next, we present a heuristic solution with improved time and space requirements.


\paragraph{Heuristic}
Algorithm~\ref{algo:lmdk-sel-heur}, follows an incremental methodology.
At each step it selects a new timestamp, that corresponds to a regular ({non-\thething}) event from $T \setminus L$, to create an option.

\begin{algorithm}
  \caption{Heuristic dummy {\thething} set options selection}
  \label{algo:lmdk-sel-heur}

  \DontPrintSemicolon

  \KwData{$T, L$}
  \KwResult{\optim}
  \BlankLine

  % Evaluate the original
  \evalOrig $\leftarrow$ \evalSeq{$T, \emptyset, L$}\;

  % Get all possible option combinations
  \opts $\leftarrow$ $[]$\;

  $L' \leftarrow L$\;

  \While{$L' \neq T$}{\label{algo:lmdk-sel-heur-while}
    % Track the minimum (best) evaluation
    \diffMin $\leftarrow$ $\infty$\;

    \optimi $\leftarrow$ Null\;
    % Find the combinations for one more point
    \ForEach{\reg $\in T \setminus L'$}{

      % Evaluate current
      \evalCur $\leftarrow$ \evalSeq{$T, \reg, L'$}\; \label{algo:lmdk-sel-heur-comparison}

      % Compare evaluations
      \diffCur $\leftarrow$ $\left|\evalCur - \evalOrig\right|$\;

      \If{\diffCur $<$ \diffMin}{
        \diffMin $\leftarrow$ \diffCur\;
        \optimi $\leftarrow$ \reg\;
      }\label{algo:lmdk-sel-heur-cmp-end}
    }

    % Save new point to landmarks
    $L'$.add(\optimi)\;

    % Add new option
    \opts.append($L' \setminus L$)\;
  }\label{algo:lmdk-sel-heur-end}

  \Return{\opts}
\end{algorithm}

Similar to Algorithm~\ref{algo:lmdk-sel-opt}, it selects new options based on a predefined metric (Lines~{\ref{algo:lmdk-sel-heur-comparison}-\ref{algo:lmdk-sel-heur-cmp-end}}).
This process (Lines~{\ref{algo:lmdk-sel-heur-while}-\ref{algo:lmdk-sel-heur-end}}) goes on until we select a set that is equal to the size of the series of events, i.e.,~$L' = T$.

In terms of complexity, given $n$ regular events it requires $\mathcal{O}(n^2)$ time and space.
Note that the reverse heuristic approach, i.e.,~starting with $T$ {\thethings} and removing until $L$, performs similarly with Algorithm~\ref{algo:lmdk-sel-heur}.


\paragraph{Partitioned}
We improve the complexity of Algorithm~\ref{algo:lmdk-sel-opt} by partitioning the {\thething} timestamp sequence $L$.
Algorithm~\ref{algo:lmdk-sel-hist}, \getHist generates a histogram from $L$ with bins of size \h.
We find \h by using the Freedman–Diaconis rule which is resilient to outliers and takes into account the data variability and data size~\cite{meshgi2015expanding}.
For every possible histogram version, the \getDiff function finds the difference between two histograms; for this operation we utilize the Euclidean distance~(see Section~\ref{subsec:sel-utl} for more details).

\begin{algorithm}
  \caption{Partitioned dummy {\thething} set options selection}
  \label{algo:lmdk-sel-hist}

  \DontPrintSemicolon

  \KwData{$T, L$}
  \KwResult{\opts}
  \BlankLine

  \hist, \h $\leftarrow$ \getHist{$T, L$}\;

  \histCur $\leftarrow$ hist\;

  \opts $\leftarrow$ $[]$\;

  \While{sum($L'$) $\neq$ len($T$)}{ \label{algo:lmdk-sel-hist-while}
    % Track the minimum (best) evaluation
    \diffMin $\leftarrow$ $\infty$\;

    % The candidate option
    \opt $\leftarrow$ \histCur\;

    % Check every possibility
    \ForEach{\hi \reg $L'$}{ \label{algo:lmdk-sel-hist-cmp-start}

      % Can we add one more point?
      \If{\hi $+$ $1$ $\leq$ \h}{
        \histTmp $\leftarrow$ \histCur\;
        \histTmp$[i]$ $\leftarrow$ \histTmp$[i]$ $+$ $1$\;
        % Find difference from original
        \diffCur $\leftarrow$ \getDiff{\hist, \histTmp}\;

        % Remember if it is the best that you've seen
        \If{\diffCur $<$ \diffMin}{ \label{algo:lmdk-sel-hist-cmp}
          \diffMin $\leftarrow$ \diffCur\;
          \opt $\leftarrow$ \histTmp\;
        }

      }

    } \label{algo:lmdk-sel-hist-cmp-end}

    % Update current histogram
    \histCur $\leftarrow$ \opt\;
    % Add current best to options
    \opts $\leftarrow$ \opt\;

  } \label{algo:lmdk-sel-hist-end}

  \Return{\opts}
\end{algorithm}

Between Lines~{\ref{algo:lmdk-sel-hist-cmp-start}-\ref{algo:lmdk-sel-hist-cmp-end}} we check every possible histogram version by incrementing each bin by $1$ and comparing it to the original (Line~\ref{algo:lmdk-sel-hist-cmp}).
In the end of the process, we return \opts which contains all the versions of \hist that are closest to \hist for all possible sizes of \hist.


\subsubsection{Privacy-preserving option selection}
\label{subsec:lmdk-opt-sel}

\mk{WIP}

% Nearby events
Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}.
Thus, taking into account more recent events with respect to {\thethings} can result in less privacy loss and better privacy protection overall.
This leads to worse data utility.

% Depending on the {\thething} discovery technique
The values of events near a {\thething} are usually similar to that of the latter.
Therefore, privacy-preserving mechanisms are likely to approximate their values based on the nearest {\thething} instead of investing extra privacy budget to perturb their actual values; thus, spending less privacy budget.
Saving privacy budget for releasing perturbed versions of actual event values can bring about better data utility. 

% Distant events
However, indicating the existence of randomized/dummy {\thethings} nearby actual {\thethings} can increase the adversarial confidence regarding the location of the latter within a series of events.
Hence, choosing randomized/dummy {\thethings} far from the actual {\thethings} (and thus less relevant) can limit the final privacy loss.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								\subsection{Protecting {\thethings}}
 								\label{subsec:lmdk-sel-sol}
-												problem: Intro of lmdk-sel-sol

											
										
										
											2021-10-12 01:40:27 +02:00
+								The main idea of the privacy-preserving {\thething} selection component is to privately select extra {\thething} event timestamps, i.e.,~dummy {\thethings}, from the set of timestamps $T /\ L$ of the time series $S_T$ and add them to the original {\thething} set $L$.
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								Selecting extra events, on top of the actual {\thethings}, as dummy {\thethings} can render actual ones indistinguishable.
 								The goal is to select a list of sets with additional timestamps from a series of events at timestamps $T$ for a set of {\thethings} at $L \subseteq T$.
-												problem: Intro of lmdk-sel-sol

											
										
										
											2021-10-12 01:40:27 +02:00
+								Thus, we create a new set $L'$ such that $L \subset L' \subseteq T$.
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
 								First, we generate a set of dummy {\thething} set options by adding regular event timestamps from $T /\ L$ to $L$ (Section~\ref{subsec:lmdk-set-opts}).
 								Then, we utilize the exponential mechanism, with a utility function that calculates an indicator for each of the options in the set based on how much it differs from the original {\thething} set $L$, and randomly select one of the options (Section~\ref{subsec:lmdk-opt-sel}).
-												problem: Intro of lmdk-sel-sol

											
										
										
											2021-10-12 01:40:27 +02:00
+								This process provides an extra layer of privacy protection to {\thethings}, and thus allows the release, and thereafter processing, of {\thething} timestamps.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
-												problem: Intro of lmdk-sel-sol

											
										
										
											2021-10-12 01:40:27 +02:00
+								% We utilize the exponential mechanism with a utility function that calculates an indicator for each of the options in the set that we selected in the previous step.
 								% The utility depends on the positioning of the {\thething} timestamps of an option in the series, e.g.,~the distance from the previous/next {\thething}, the distance from the start/end of the series, etc.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								\subsubsection{{\Thething} set options generation}
-												problem: Intro of lmdk-sel-sol

											
										
										
											2021-10-12 01:40:27 +02:00
+								\label{subsec:lmdk-set-opts}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} approach this problem with an optimal and heuristic methodology, respectively.
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								Function \evalSeq evaluates the result of the union of $L$ and a timestamp combination from $T \setminus L$ by, e.g.,~estimating the standard deviation of all the distances from the previous/next {\thething}.
 								\getOpts returns all the possible \emph{valid} sets of combinations \opt such that larger options contain all of the timestamps that are present in smaller ones.
 								Each combination contains a set of timestamps with sizes $\left|L\right| + 1, \left|L\right| + 2, \dots, \left|T\right|$, where each one of them is a combination of $L$ with $x \in [1, \left|T\right| - \left|L\right|]$ timestamps from $T$.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								\paragraph{Optimal}
 								Algorithm~\ref{algo:lmdk-sel-opt}, between Lines~{\ref{algo:lmdk-sel-opt-for-each}--\ref{algo:lmdk-sel-opt-end}} evaluates each option in \opts.
 								It finds the option that is the most \emph{similar} to the original (Lines~{\ref{algo:lmdk-sel-opt-comparison}-\ref{algo:lmdk-sel-opt-end}}), i.e.,~the option that has an evaluation that differs the least from that of the sequence $T$ with {\thethings} $L$.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								\begin{algorithm}
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  \caption{Optimal dummy {\thething} set options generation}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								  \label{algo:lmdk-sel-opt}
 								  \DontPrintSemicolon
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  \KwData{$T, L$}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								  \SetKwInput{KwData}{Input}
 								  \KwResult{\optim}
 								  \BlankLine
 								  % Evaluate the original
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  \evalOrig $\leftarrow$ \evalSeq{$T, \emptyset, L$}\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								  % Get all possible option combinations
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  \opts $\leftarrow$ \getOpts{$T, L$}\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								  % Track the minimum (best) evaluation
 								  \diffMin $\leftarrow$ $\infty$\;
 								  % Track the optimal sequence (the one with the best evaluation)
 								  \optim $\leftarrow$ $[]$\;
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  \ForEach{\opt $\in$ \opts}{ \label{algo:lmdk-sel-opt-for-each}
 								    \evalCur $\leftarrow 0$\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								    \ForEach{\opti $\in$ \opt}{
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								      \evalCur $\leftarrow$ \evalCur $+$ \evalSeq{$T, \opti, L$}/\#\opt\; \label{algo:lmdk-sel-opt-comparison}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								    }
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								    % Compare with current optimal
 								    \diffCur $\leftarrow \left|\evalCur - \evalOrig\right|$\;
 								    \If{\diffCur $<$ \diffMin}{
 								      \diffMin $\leftarrow$ \diffCur\;
 								      \optim $\leftarrow$ \opt\;
 								    }
 								  } \label{algo:lmdk-sel-opt-end}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								  \Return{\optim}
 								\end{algorithm}
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								Algorithm~\ref{algo:lmdk-sel-opt} guarantees to return the optimal set of dummy {\thethings} with regard to the original set $L$.
 								However, it is rather costly in terms of complexity: given $n$ regular events and a combination of size $r$, it requires $\mathcal{O}(C(n, r) + 2^C(n, r))$ time and $\mathcal{O}(r*C(n, r))$ space.
 								Next, we present a heuristic solution with improved time and space requirements.
 								\paragraph{Heuristic}
 								Algorithm~\ref{algo:lmdk-sel-heur}, follows an incremental methodology.
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								At each step it selects a new timestamp, that corresponds to a regular ({non-\thething}) event from $T \setminus L$, to create an option.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								\begin{algorithm}
 								  \caption{Heuristic dummy {\thething} set options selection}
 								  \label{algo:lmdk-sel-heur}
 								  \DontPrintSemicolon
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  \KwData{$T, L$}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								  \KwResult{\optim}
 								  \BlankLine
 								  % Evaluate the original
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  \evalOrig $\leftarrow$ \evalSeq{$T, \emptyset, L$}\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								  % Get all possible option combinations
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								  \opts $\leftarrow$ $[]$\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  $L' \leftarrow L$\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								  \While{$L' \neq T$}{\label{algo:lmdk-sel-heur-while}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								    % Track the minimum (best) evaluation
 								    \diffMin $\leftarrow$ $\infty$\;
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								    \optimi $\leftarrow$ Null\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								    % Find the combinations for one more point
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								    \ForEach{\reg $\in T \setminus L'$}{
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								      % Evaluate current
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								      \evalCur $\leftarrow$ \evalSeq{$T, \reg, L'$}\; \label{algo:lmdk-sel-heur-comparison}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								      % Compare evaluations
 								      \diffCur $\leftarrow$ $\left|\evalCur - \evalOrig\right|$\;
 								      \If{\diffCur $<$ \diffMin}{
 								        \diffMin $\leftarrow$ \diffCur\;
 								        \optimi $\leftarrow$ \reg\;
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								      }\label{algo:lmdk-sel-heur-cmp-end}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								    }
 								    % Save new point to landmarks
-												OCD

											
										
										
											2021-10-12 04:31:54 +02:00
+								    $L'$.add(\optimi)\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								    % Add new option
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								    \opts.append($L' \setminus L$)\;
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								  }\label{algo:lmdk-sel-heur-end}
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								  \Return{\opts}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								\end{algorithm}
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								Similar to Algorithm~\ref{algo:lmdk-sel-opt}, it selects new options based on a predefined metric (Lines~{\ref{algo:lmdk-sel-heur-comparison}-\ref{algo:lmdk-sel-heur-cmp-end}}).
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								This process (Lines~{\ref{algo:lmdk-sel-heur-while}-\ref{algo:lmdk-sel-heur-end}}) goes on until we select a set that is equal to the size of the series of events, i.e.,~$L' = T$.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								In terms of complexity, given $n$ regular events it requires $\mathcal{O}(n^2)$ time and space.
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								Note that the reverse heuristic approach, i.e.,~starting with $T$ {\thethings} and removing until $L$, performs similarly with Algorithm~\ref{algo:lmdk-sel-heur}.
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								\paragraph{Partitioned}
 								We improve the complexity of Algorithm~\ref{algo:lmdk-sel-opt} by partitioning the {\thething} timestamp sequence $L$.
 								Algorithm~\ref{algo:lmdk-sel-hist}, \getHist generates a histogram from $L$ with bins of size \h.
 								We find \h by using the Freedman–Diaconis rule which is resilient to outliers and takes into account the data variability and data size~\cite{meshgi2015expanding}.
 								For every possible histogram version, the \getDiff function finds the difference between two histograms; for this operation we utilize the Euclidean distance~(see Section~\ref{subsec:sel-utl} for more details).
 								\begin{algorithm}
 								  \caption{Partitioned dummy {\thething} set options selection}
 								  \label{algo:lmdk-sel-hist}
 								  \DontPrintSemicolon
 								  \KwData{$T, L$}
 								  \KwResult{\opts}
 								  \BlankLine
 								  \hist, \h $\leftarrow$ \getHist{$T, L$}\;
 								  \histCur $\leftarrow$ hist\;
 								  \opts $\leftarrow$ $[]$\;
 								  \While{sum($L'$) $\neq$ len($T$)}{ \label{algo:lmdk-sel-hist-while}
 								    % Track the minimum (best) evaluation
 								    \diffMin $\leftarrow$ $\infty$\;
 								    % The candidate option
 								    \opt $\leftarrow$ \histCur\;
 								    % Check every possibility
 								    \ForEach{\hi \reg $L'$}{ \label{algo:lmdk-sel-hist-cmp-start}
 								      % Can we add one more point?
 								      \If{\hi $+$ $1$ $\leq$ \h}{
 								        \histTmp $\leftarrow$ \histCur\;
 								        \histTmp$[i]$ $\leftarrow$ \histTmp$[i]$ $+$ $1$\;
 								        % Find difference from original
 								        \diffCur $\leftarrow$ \getDiff{\hist, \histTmp}\;
 								        % Remember if it is the best that you've seen
 								        \If{\diffCur $<$ \diffMin}{ \label{algo:lmdk-sel-hist-cmp}
 								          \diffMin $\leftarrow$ \diffCur\;
 								          \opt $\leftarrow$ \histTmp\;
 								        }
 								      }
 								    } \label{algo:lmdk-sel-hist-cmp-end}
 								    % Update current histogram
 								    \histCur $\leftarrow$ \opt\;
 								    % Add current best to options
 								    \opts $\leftarrow$ \opt\;
 								  } \label{algo:lmdk-sel-hist-end}
 								  \Return{\opts}
 								\end{algorithm}
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
-												problem: Reviewed lmdk-set-opts

											
										
										
											2021-10-12 11:00:50 +02:00
+								Between Lines~{\ref{algo:lmdk-sel-hist-cmp-start}-\ref{algo:lmdk-sel-hist-cmp-end}} we check every possible histogram version by incrementing each bin by $1$ and comparing it to the original (Line~\ref{algo:lmdk-sel-hist-cmp}).
 								In the end of the process, we return \opts which contains all the versions of \hist that are closest to \hist for all possible sizes of \hist.
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
 								\subsubsection{Privacy-preserving option selection}
-												problem: Intro of lmdk-sel-sol

											
										
										
											2021-10-12 01:40:27 +02:00
+								\label{subsec:lmdk-opt-sel}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
-												evaluation: lmdk-sel-sol

											
										
										
											2021-10-12 04:21:46 +02:00
+								\mk{WIP}
-												theotherthing: Solution

											
										
										
											2021-10-10 19:49:47 +02:00
+								% Nearby events
 								Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}.
 								Thus, taking into account more recent events with respect to {\thethings} can result in less privacy loss and better privacy protection overall.
 								This leads to worse data utility.
 								% Depending on the {\thething} discovery technique
 								The values of events near a {\thething} are usually similar to that of the latter.
 								Therefore, privacy-preserving mechanisms are likely to approximate their values based on the nearest {\thething} instead of investing extra privacy budget to perturb their actual values; thus, spending less privacy budget.
 								Saving privacy budget for releasing perturbed versions of actual event values can bring about better data utility.
 								% Distant events
 								However, indicating the existence of randomized/dummy {\thethings} nearby actual {\thethings} can increase the adversarial confidence regarding the location of the latter within a series of events.
 								Hence, choosing randomized/dummy {\thethings} far from the actual {\thethings} (and thus less relevant) can limit the final privacy loss.