\subsection{Protecting {\thethings}} \label{subsec:lmdk-sel-sol} The main idea of the privacy-preserving {\thething} selection component is to privately select extra {\thething} event timestamps, i.e.,~dummy {\thethings}, from the set of timestamps $T /\ L$ of the time series $S_T$ and add them to the original {\thething} set $L$. Selecting extra events, on top of the actual {\thethings}, as dummy {\thethings} can render actual ones indistinguishable. The goal is to select a list of sets with additional timestamps from a series of events at timestamps $T$ for a set of {\thethings} at $L \subseteq T$. Thus, we create a new set $L'$ such that $L \subset L' \subseteq T$. First, we generate a set of dummy {\thething} set options by adding regular event timestamps from $T /\ L$ to $L$ (Section~\ref{subsec:lmdk-set-opts}). Then, we utilize the exponential mechanism, with a utility function that calculates an indicator for each of the options in the set based on how much it differs from the original {\thething} set $L$, and randomly select one of the options (Section~\ref{subsec:lmdk-opt-sel}). This process provides an extra layer of privacy protection to {\thethings}, and thus allows the release, and thereafter processing, of {\thething} timestamps. % We utilize the exponential mechanism with a utility function that calculates an indicator for each of the options in the set that we selected in the previous step. % The utility depends on the positioning of the {\thething} timestamps of an option in the series, e.g.,~the distance from the previous/next {\thething}, the distance from the start/end of the series, etc. \subsubsection{{\Thething} set options generation} \label{subsec:lmdk-set-opts} Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} approach this problem with an optimal and heuristic methodology, respectively. Function \evalSeq evaluates the result of the union of $L$ and a timestamp combination from $T \setminus L$ by, e.g.,~estimating the standard deviation of all the distances from the previous/next {\thething}. \getOpts returns all the possible \emph{valid} sets of combinations \opt such that larger options contain all of the timestamps that are present in smaller ones. Each combination contains a set of timestamps with sizes $\left|L\right| + 1, \left|L\right| + 2, \dots, \left|T\right|$, where each one of them is a combination of $L$ with $x \in [1, \left|T\right| - \left|L\right|]$ timestamps from $T$. \paragraph{Optimal} Algorithm~\ref{algo:lmdk-sel-opt}, between Lines~{\ref{algo:lmdk-sel-opt-for-each}--\ref{algo:lmdk-sel-opt-end}} evaluates each option in \opts. It finds the option that is the most \emph{similar} to the original (Lines~{\ref{algo:lmdk-sel-opt-comparison}-\ref{algo:lmdk-sel-opt-end}}), i.e.,~the option that has an evaluation that differs the least from that of the sequence $T$ with {\thethings} $L$. \begin{algorithm} \caption{Optimal dummy {\thething} set options generation} \label{algo:lmdk-sel-opt} \DontPrintSemicolon \KwData{$T, L$} \SetKwInput{KwData}{Input} \KwResult{\optim} \BlankLine % Evaluate the original \evalOrig $\leftarrow$ \evalSeq{$T, \emptyset, L$}\; % Get all possible option combinations \opts $\leftarrow$ \getOpts{$T, L$}\; % Track the minimum (best) evaluation \diffMin $\leftarrow$ $\infty$\; % Track the optimal sequence (the one with the best evaluation) \optim $\leftarrow$ $[]$\; \ForEach{\opt $\in$ \opts}{ \label{algo:lmdk-sel-opt-for-each} \evalCur $\leftarrow 0$\; \ForEach{\opti $\in$ \opt}{ \evalCur $\leftarrow$ \evalCur $+$ \evalSeq{$T, \opti, L$}/\#\opt\; \label{algo:lmdk-sel-opt-comparison} } % Compare with current optimal \diffCur $\leftarrow \left|\evalCur - \evalOrig\right|$\; \If{\diffCur $<$ \diffMin}{ \diffMin $\leftarrow$ \diffCur\; \optim $\leftarrow$ \opt\; } } \label{algo:lmdk-sel-opt-end} \Return{\optim} \end{algorithm} Algorithm~\ref{algo:lmdk-sel-opt} guarantees to return the optimal set of dummy {\thethings} with regard to the original set $L$. However, it is rather costly in terms of complexity: given $n$ regular events and a combination of size $r$, it requires $\mathcal{O}(C(n, r) + 2^C(n, r))$ time and $\mathcal{O}(r*C(n, r))$ space. Next, we present a heuristic solution with improved time and space requirements. \paragraph{Heuristic} Algorithm~\ref{algo:lmdk-sel-heur}, follows an incremental methodology. At each step it selects a new timestamp, that corresponds to a regular ({non-\thething}) event from $T \setminus L$, to create an option. \begin{algorithm} \caption{Heuristic dummy {\thething} set options selection} \label{algo:lmdk-sel-heur} \DontPrintSemicolon \KwData{$T, L$} \KwResult{\optim} \BlankLine % Evaluate the original \evalOrig $\leftarrow$ \evalSeq{$T, \emptyset, L$}\; % Get all possible option combinations \opts $\leftarrow$ $[]$\; $L' \leftarrow L$\; \While{$L' \neq T$}{\label{algo:lmdk-sel-heur-while} % Track the minimum (best) evaluation \diffMin $\leftarrow$ $\infty$\; \optimi $\leftarrow$ Null\; % Find the combinations for one more point \ForEach{\reg $\in T \setminus L'$}{ % Evaluate current \evalCur $\leftarrow$ \evalSeq{$T, \reg, L'$}\; \label{algo:lmdk-sel-heur-comparison} % Compare evaluations \diffCur $\leftarrow$ $\left|\evalCur - \evalOrig\right|$\; \If{\diffCur $<$ \diffMin}{ \diffMin $\leftarrow$ \diffCur\; \optimi $\leftarrow$ \reg\; }\label{algo:lmdk-sel-heur-cmp-end} } % Save new point to landmarks $L'$.add(\optimi)\; % Add new option \opts.append($L' \setminus L$)\; }\label{algo:lmdk-sel-heur-end} \Return{\opts} \end{algorithm} Similar to Algorithm~\ref{algo:lmdk-sel-opt}, it selects new options based on a predefined metric (Lines~{\ref{algo:lmdk-sel-heur-comparison}-\ref{algo:lmdk-sel-heur-cmp-end}}). This process (Lines~{\ref{algo:lmdk-sel-heur-while}-\ref{algo:lmdk-sel-heur-end}}) goes on until we select a set that is equal to the size of the series of events, i.e.,~$L' = T$. In terms of complexity, given $n$ regular events it requires $\mathcal{O}(n^2)$ time and space. Note that the reverse heuristic approach, i.e.,~starting with $T$ {\thethings} and removing until $L$, performs similarly with Algorithm~\ref{algo:lmdk-sel-heur}. \paragraph{Partitioned} We improve the complexity of Algorithm~\ref{algo:lmdk-sel-opt} by partitioning the {\thething} timestamp sequence $L$. Algorithm~\ref{algo:lmdk-sel-hist}, \getHist generates a histogram from $L$ with bins of size \h. We find \h by using the Freedman–Diaconis rule which is resilient to outliers and takes into account the data variability and data size~\cite{meshgi2015expanding}. For every possible histogram version, the \getDiff function finds the difference between two histograms; for this operation we utilize the Euclidean distance~(see Section~\ref{subsec:sel-utl} for more details). \begin{algorithm} \caption{Partitioned dummy {\thething} set options selection} \label{algo:lmdk-sel-hist} \DontPrintSemicolon \KwData{$T, L$} \KwResult{\opts} \BlankLine \hist, \h $\leftarrow$ \getHist{$T, L$}\; \histCur $\leftarrow$ hist\; \opts $\leftarrow$ $[]$\; \While{sum($L'$) $\neq$ len($T$)}{ \label{algo:lmdk-sel-hist-while} % Track the minimum (best) evaluation \diffMin $\leftarrow$ $\infty$\; % The candidate option \opt $\leftarrow$ \histCur\; % Check every possibility \ForEach{\hi \reg $L'$}{ \label{algo:lmdk-sel-hist-cmp-start} % Can we add one more point? \If{\hi $+$ $1$ $\leq$ \h}{ \histTmp $\leftarrow$ \histCur\; \histTmp$[i]$ $\leftarrow$ \histTmp$[i]$ $+$ $1$\; % Find difference from original \diffCur $\leftarrow$ \getDiff{\hist, \histTmp}\; % Remember if it is the best that you've seen \If{\diffCur $<$ \diffMin}{ \label{algo:lmdk-sel-hist-cmp} \diffMin $\leftarrow$ \diffCur\; \opt $\leftarrow$ \histTmp\; } } } \label{algo:lmdk-sel-hist-cmp-end} % Update current histogram \histCur $\leftarrow$ \opt\; % Add current best to options \opts $\leftarrow$ \opt\; } \label{algo:lmdk-sel-hist-end} \Return{\opts} \end{algorithm} Between Lines~{\ref{algo:lmdk-sel-hist-cmp-start}-\ref{algo:lmdk-sel-hist-cmp-end}} we check every possible histogram version by incrementing each bin by $1$ and comparing it to the original (Line~\ref{algo:lmdk-sel-hist-cmp}). In the end of the process, we return \opts which contains all the versions of \hist that are closest to \hist for all possible sizes of \hist. \subsubsection{Privacy-preserving option selection} \label{subsec:lmdk-opt-sel} \mk{WIP} % Nearby events Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}. Thus, taking into account more recent events with respect to {\thethings} can result in less privacy loss and better privacy protection overall. This leads to worse data utility. % Depending on the {\thething} discovery technique The values of events near a {\thething} are usually similar to that of the latter. Therefore, privacy-preserving mechanisms are likely to approximate their values based on the nearest {\thething} instead of investing extra privacy budget to perturb their actual values; thus, spending less privacy budget. Saving privacy budget for releasing perturbed versions of actual event values can bring about better data utility. % Distant events However, indicating the existence of randomized/dummy {\thethings} nearby actual {\thethings} can increase the adversarial confidence regarding the location of the latter within a series of events. Hence, choosing randomized/dummy {\thethings} far from the actual {\thethings} (and thus less relevant) can limit the final privacy loss.