evaluation: lmdk-sel-sol
This commit is contained in:
		| @ -2,40 +2,37 @@ | |||||||
| \label{subsec:lmdk-sel-sol} | \label{subsec:lmdk-sel-sol} | ||||||
|  |  | ||||||
| The main idea of the privacy-preserving {\thething} selection component is to privately select extra {\thething} event timestamps, i.e.,~dummy {\thethings}, from the set of timestamps $T /\ L$ of the time series $S_T$ and add them to the original {\thething} set $L$. | The main idea of the privacy-preserving {\thething} selection component is to privately select extra {\thething} event timestamps, i.e.,~dummy {\thethings}, from the set of timestamps $T /\ L$ of the time series $S_T$ and add them to the original {\thething} set $L$. | ||||||
|  | Selecting extra events, on top of the actual {\thethings}, as dummy {\thethings} can render actual ones indistinguishable. | ||||||
|  | The goal is to select a list of sets with additional timestamps from a series of events at timestamps $T$ for a set of {\thethings} at $L \subseteq T$. | ||||||
| Thus, we create a new set $L'$ such that $L \subset L' \subseteq T$. | Thus, we create a new set $L'$ such that $L \subset L' \subseteq T$. | ||||||
| We generate a set of dummy {\thething} set options by adding regular event timestamps from $T /\ L$ to $L$ (Section~\ref{subsec:lmdk-set-opts}). |  | ||||||
| Then (Section~\ref{subsec:lmdk-opt-sel}), we utilize the exponential mechanism, with a utility function that calculates an indicator for each of the options in the set based on how much it differs from the original {\thething} set $L$, and randomly select one ot the options that we created earlier. | First, we generate a set of dummy {\thething} set options by adding regular event timestamps from $T /\ L$ to $L$ (Section~\ref{subsec:lmdk-set-opts}). | ||||||
|  | Then, we utilize the exponential mechanism, with a utility function that calculates an indicator for each of the options in the set based on how much it differs from the original {\thething} set $L$, and randomly select one of the options (Section~\ref{subsec:lmdk-opt-sel}). | ||||||
| This process provides an extra layer of privacy protection to {\thethings}, and thus allows the release, and thereafter processing, of {\thething} timestamps. | This process provides an extra layer of privacy protection to {\thethings}, and thus allows the release, and thereafter processing, of {\thething} timestamps. | ||||||
|  |  | ||||||
| % We utilize the exponential mechanism with a utility function that calculates an indicator for each of the options in the set that we selected in the previous step. | % We utilize the exponential mechanism with a utility function that calculates an indicator for each of the options in the set that we selected in the previous step. | ||||||
| % The utility depends on the positioning of the {\thething} timestamps of an option in the series, e.g.,~the distance from the previous/next {\thething}, the distance from the start/end of the series, etc. | % The utility depends on the positioning of the {\thething} timestamps of an option in the series, e.g.,~the distance from the previous/next {\thething}, the distance from the start/end of the series, etc. | ||||||
|  |  | ||||||
|  |  | ||||||
| \subsubsection{{\Thething} set options} | \subsubsection{{\Thething} set options generation} | ||||||
| \label{subsec:lmdk-set-opts} | \label{subsec:lmdk-set-opts} | ||||||
|  |  | ||||||
| This step aims to select a set of candidate {\thething} timestamps options either by randomizing the actual timestamps (Section~\ref{subsec:lmdk-rnd}), or by inserting dummy timestamps (Section~\ref{subsec:lmdk-dum-gen}) to the actual {\thething} timestamps. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| \paragraph{Dummy {\thething} generation} |  | ||||||
| \label{subsec:lmdk-dum-gen} |  | ||||||
|  |  | ||||||
| Selecting extra events, on top of the actual {\thethings}, as dummy {\thethings} can render actual ones indistinguishable. |  | ||||||
| The goal is to select a list of sets with additional timestamps from a series of events at timestamps $\{t_n\}$ for a set of {\thethings} at $\{l_k\} \subseteq \{t_n\}$. |  | ||||||
| Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} approach this problem with an optimal and heuristic methodology, respectively. | Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} approach this problem with an optimal and heuristic methodology, respectively. | ||||||
|  | Function \evalSeq evaluates the result of the union of $L$ and a timestamp combination from $T \setminus L$ by, e.g.,~estimating the standard deviation of all the distances from the previous/next {\thething}. | ||||||
|  | \getOpts returns all the possible \emph{valid} sets of combinations \opt such that larger options contain all of the timestamps that are present in smaller ones. | ||||||
|  | Each combination contains a set of timestamps with sizes $\left|L\right| + 1, \left|L\right| + 2, \dots, \left|T\right|$, where each one of them is a combination of $L$ with $x \in [1, \left|T\right| - \left|L\right|]$ timestamps from $T$. | ||||||
|  |  | ||||||
| Function \calcMetric measures an indicator for the union of $\{l_k\}$ and a timestamp combination from $\{t_n\} \setminus \{l_k\}$. | \paragraph{Optimal} | ||||||
| Function \evalSeq evaluates the result of \calcMetric by, e.g.,~estimating the standard deviation of all the distances from the previous/next {\thething}. | Algorithm~\ref{algo:lmdk-sel-opt}, between Lines~{\ref{algo:lmdk-sel-opt-for-each}--\ref{algo:lmdk-sel-opt-end}} evaluates each option in \opts. | ||||||
| Function \getOpts returns all possible \emph{valid} sets of combinations \opt such that $\{l_{k+i}\} \subset \{l_{k+j}\}, \forall i, j \in [k, n] \mid i < j$, i.e.,~larger options must contain all of the timestamps that are present in smaller ones. | It finds the option that is the most \emph{similar} to the original (Lines~{\ref{algo:lmdk-sel-opt-comparison}-\ref{algo:lmdk-sel-opt-end}}), i.e.,~the option that has an evaluation that differs the least from that of the sequence $T$ with {\thethings} $L$. | ||||||
| Each combination contains a set of timestamps with sizes $k + 1, k + 2, \dots, n$, where each one of them is a combination of $\{l_k\}$ with $x \in [1, n - k]$ timestamps from $\{t_n\}$. |  | ||||||
|  |  | ||||||
| \begin{algorithm} | \begin{algorithm} | ||||||
|   \caption{Optimal dummy {\thething} set options selection} |   \caption{Optimal dummy {\thething} set options generation} | ||||||
|   \label{algo:lmdk-sel-opt} |   \label{algo:lmdk-sel-opt} | ||||||
|  |  | ||||||
|   \DontPrintSemicolon |   \DontPrintSemicolon | ||||||
|  |  | ||||||
|   \KwData{$\{t_n\}, \{l_k\}$} |   \KwData{$T, L$} | ||||||
|  |  | ||||||
|   \SetKwInput{KwData}{Input} |   \SetKwInput{KwData}{Input} | ||||||
|  |  | ||||||
| @ -43,11 +40,10 @@ Each combination contains a set of timestamps with sizes $k + 1, k + 2, \dots, n | |||||||
|   \BlankLine |   \BlankLine | ||||||
|  |  | ||||||
|   % Evaluate the original |   % Evaluate the original | ||||||
|   \metricOrig $\leftarrow$ \calcMetric{$\{t_n\}, \emptyset, \{l_k\}$}\; |   \evalOrig $\leftarrow$ \evalSeq{$T, \emptyset, L$}\; | ||||||
|   \evalOrig $\leftarrow$ \evalSeq{\metricOrig}\; |  | ||||||
|  |  | ||||||
|   % Get all possible option combinations |   % Get all possible option combinations | ||||||
|   \opts $\leftarrow$ \getOpts{$\{t_n\}, \{l_k\}$}\; |   \opts $\leftarrow$ \getOpts{$T, L$}\; | ||||||
|  |  | ||||||
|   % Track the minimum (best) evaluation |   % Track the minimum (best) evaluation | ||||||
|   \diffMin $\leftarrow$ $\infty$\; |   \diffMin $\leftarrow$ $\infty$\; | ||||||
| @ -55,25 +51,29 @@ Each combination contains a set of timestamps with sizes $k + 1, k + 2, \dots, n | |||||||
|   % Track the optimal sequence (the one with the best evaluation) |   % Track the optimal sequence (the one with the best evaluation) | ||||||
|   \optim $\leftarrow$ $[]$\; |   \optim $\leftarrow$ $[]$\; | ||||||
|  |  | ||||||
|   \ForEach{\opt $\in$ \opts}{\label{algo:lmdk-sel-opt-for-each} |   \ForEach{\opt $\in$ \opts}{ \label{algo:lmdk-sel-opt-for-each} | ||||||
|     \evalSum $\leftarrow 0$\; |     \evalCur $\leftarrow 0$\; | ||||||
|     \ForEach{\opti $\in$ \opt}{ |     \ForEach{\opti $\in$ \opt}{ | ||||||
|       \metricCur $\leftarrow$ \calcMetric{$\{t_n\}, \opti, \{l_k\}$}\;\label{algo:lmdk-sel-opt-comparison} |       \evalCur $\leftarrow$ \evalCur $+$ \evalSeq{$T, \opti, L$}/\#\opt\; \label{algo:lmdk-sel-opt-comparison} | ||||||
|       \evalSum $\leftarrow$ \evalSum $+$ \evalSeq{\metricCur}\; |  | ||||||
|  |  | ||||||
|       % Compare with current optimal |  | ||||||
|       \diffCur $\leftarrow \left|\evalSum/\#\opt - \evalOrig\right|$\; |  | ||||||
|       \If{\diffCur $<$ \diffMin}{ |  | ||||||
|         \diffMin $\leftarrow$ \diffCur\; |  | ||||||
|         \optim $\leftarrow$ \opt\; |  | ||||||
|       } |  | ||||||
|     } |     } | ||||||
|   }\label{algo:lmdk-sel-opt-end} |     % Compare with current optimal | ||||||
|  |     \diffCur $\leftarrow \left|\evalCur - \evalOrig\right|$\; | ||||||
|  |     \If{\diffCur $<$ \diffMin}{ | ||||||
|  |       \diffMin $\leftarrow$ \diffCur\; | ||||||
|  |       \optim $\leftarrow$ \opt\; | ||||||
|  |     } | ||||||
|  |   } \label{algo:lmdk-sel-opt-end} | ||||||
|   \Return{\optim} |   \Return{\optim} | ||||||
| \end{algorithm} | \end{algorithm} | ||||||
|  |  | ||||||
| Algorithm~\ref{algo:lmdk-sel-opt}, in particular, between Lines~{\ref{algo:lmdk-sel-opt-for-each}-\ref{algo:lmdk-sel-opt-end}} evaluates each option in \opts. | Algorithm~\ref{algo:lmdk-sel-opt} guarantees to return the optimal set of dummy {\thethings} with regard to the original set $L$. | ||||||
| It finds the option that is the most \emph{similar} to the original (Lines~{\ref{algo:lmdk-sel-opt-comparison}-\ref{algo:lmdk-sel-opt-end}}), i.e.,~the option that has an evaluation that differs the least from that of the sequence $\{t_n\}$ with {\thethings} $\{l_k\}$. | However, it is rather costly in terms of complexity: given $n$ regular events and a combination of size $r$, it requires $\mathcal{O}(C(n, r) + 2^C(n, r))$ time and $\mathcal{O}(r*C(n, r))$ space. | ||||||
|  | Next, we present a heuristic solution with improved time and space requirements. | ||||||
|  |  | ||||||
|  |  | ||||||
|  | \paragraph{Heuristic} | ||||||
|  | Algorithm~\ref{algo:lmdk-sel-heur}, follows an incremental methodology. | ||||||
|  | At each step it selects a new timestamp that corresponds to a regular ({non-\thething}) event from $T \setminus L$. | ||||||
|  |  | ||||||
| \begin{algorithm} | \begin{algorithm} | ||||||
|   \caption{Heuristic dummy {\thething} set options selection} |   \caption{Heuristic dummy {\thething} set options selection} | ||||||
| @ -81,30 +81,28 @@ It finds the option that is the most \emph{similar} to the original (Lines~{\ref | |||||||
|  |  | ||||||
|   \DontPrintSemicolon |   \DontPrintSemicolon | ||||||
|  |  | ||||||
|   \KwData{$\{t_n\}, \{l_k\}$} |   \KwData{$T, L$} | ||||||
|   \KwResult{\optim} |   \KwResult{\optim} | ||||||
|   \BlankLine |   \BlankLine | ||||||
|  |  | ||||||
|   % Evaluate the original |   % Evaluate the original | ||||||
|   \metricOrig $\leftarrow$ \calcMetric{$\{t_n\}, \emptyset, \{l_k\}$}\; |   \evalOrig $\leftarrow$ \evalSeq{$T, \emptyset, L$}\; | ||||||
|   \evalOrig $\leftarrow$ \evalSeq{\metricOrig}\; |  | ||||||
|  |  | ||||||
|   % Get all possible option combinations |   % Get all possible option combinations | ||||||
|   \optim $\leftarrow$ $[]$\; |   \optim $\leftarrow$ $[]$\; | ||||||
|  |  | ||||||
|   $\{l_{k'}\} \leftarrow \{l_k\}$\; |   $L' \leftarrow L$\; | ||||||
|  |  | ||||||
|   \While{$\{l_{k'}\} \neq \{t_n\}$}{\label{algo:lmdk-sel-heur-while} |   \While{$L' \neq T$}{\label{algo:lmdk-sel-heur-while} | ||||||
|     % Track the minimum (best) evaluation |     % Track the minimum (best) evaluation | ||||||
|     \diffMin $\leftarrow$ $\infty$\; |     \diffMin $\leftarrow$ $\infty$\; | ||||||
|  |  | ||||||
|     \optimi $\leftarrow$ $0$\; |     \optimi $\leftarrow$ Null\; | ||||||
|     % Find the combinations for one more point |     % Find the combinations for one more point | ||||||
|     \ForEach{\reg $\in \{t_n\} \setminus \{l_{k'}\}$}{ |     \ForEach{\reg $\in T \setminus L'$}{ | ||||||
|  |  | ||||||
|       % Evaluate current |       % Evaluate current | ||||||
|       \metricCur $\leftarrow$ \calcMetric{$\{t_n\}, \reg, \{l_{k'}\}$}\;\label{algo:lmdk-sel-heur-comparison} |       \evalCur $\leftarrow$ \evalSeq{$T, \reg, L'$}\; \label{algo:lmdk-sel-heur-comparison} | ||||||
|       \evalCur $\leftarrow$ \evalSeq{\metricCur}\; |  | ||||||
|  |  | ||||||
|       % Compare evaluations |       % Compare evaluations | ||||||
|       \diffCur $\leftarrow$ $\left|\evalCur - \evalOrig\right|$\; |       \diffCur $\leftarrow$ $\left|\evalCur - \evalOrig\right|$\; | ||||||
| @ -116,27 +114,31 @@ It finds the option that is the most \emph{similar} to the original (Lines~{\ref | |||||||
|     } |     } | ||||||
|  |  | ||||||
|     % Save new point to landmarks |     % Save new point to landmarks | ||||||
|     $k' \leftarrow k' + 1$\; |     $L'.add(\optimi)$\; | ||||||
|     $l_{k'} \leftarrow \optimi$\; |  | ||||||
|  |  | ||||||
|     % Add new option |     % Add new option | ||||||
|     \optim.add($\{l_{k'}\} \setminus \{l_k\}$)\; |     \optim.append($L' \setminus L$)\; | ||||||
|   }\label{algo:lmdk-sel-heur-end} |   }\label{algo:lmdk-sel-heur-end} | ||||||
|  |  | ||||||
|   \Return{\optim} |   \Return{\optim} | ||||||
| \end{algorithm} | \end{algorithm} | ||||||
|  |  | ||||||
| Algorithm~\ref{algo:lmdk-sel-heur}, follows an incremental methodology. |  | ||||||
| At each step it selects a new timestamp that corresponds to a regular ({non-\thething}) event from $\{t_n\} \setminus \{l_k\}$. |  | ||||||
| Similar to Algorithm~\ref{algo:lmdk-sel-opt}, the selection is done based on a predefined metric (Lines~{\ref{algo:lmdk-sel-heur-comparison}-\ref{algo:lmdk-sel-heur-comparison-end}}). | Similar to Algorithm~\ref{algo:lmdk-sel-opt}, the selection is done based on a predefined metric (Lines~{\ref{algo:lmdk-sel-heur-comparison}-\ref{algo:lmdk-sel-heur-comparison-end}}). | ||||||
| This process (Lines~{\ref{algo:lmdk-sel-heur-while}-\ref{algo:lmdk-sel-heur-end}}) goes on until we select a set that is equal to the size of the series of events, i.e.,~$\{l_{k'}\} = \{t_n\}$. | This process (Lines~{\ref{algo:lmdk-sel-heur-while}-\ref{algo:lmdk-sel-heur-end}}) goes on until we select a set that is equal to the size of the series of events, i.e.,~$L' = T$. | ||||||
|  |  | ||||||
| Note that the reverse heuristic approach, i.e.,~starting with $\{t_n\}$ {\thethings} and removing until $\{l_k\}$, performs worse than and occasionally the same with Algorithm~\ref{algo:lmdk-sel-heur}. | In terms of complexity: given $n$ regular events it requires $\mathcal{O}(n^2)$ time and space. | ||||||
|  | Note that the reverse heuristic approach, i.e.,~starting with $T$ {\thethings} and removing until $L$, performs similarly with Algorithm~\ref{algo:lmdk-sel-heur}. | ||||||
|  |  | ||||||
|  |  | ||||||
|  |  | ||||||
|  | \mk{WIP: Histograms} | ||||||
|  |  | ||||||
|  |  | ||||||
| \subsubsection{Privacy-preserving option selection} | \subsubsection{Privacy-preserving option selection} | ||||||
| \label{subsec:lmdk-opt-sel} | \label{subsec:lmdk-opt-sel} | ||||||
|  |  | ||||||
|  | \mk{WIP} | ||||||
|  |  | ||||||
| % Nearby events | % Nearby events | ||||||
| Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}. | Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}. | ||||||
| Thus, taking into account more recent events with respect to {\thethings} can result in less privacy loss and better privacy protection overall. | Thus, taking into account more recent events with respect to {\thethings} can result in less privacy loss and better privacy protection overall. | ||||||
|  | |||||||
		Reference in New Issue
	
	Block a user