diff --git a/text/problem/theotherthing/solution.tex b/text/problem/theotherthing/solution.tex index dfa9185..fe37676 100644 --- a/text/problem/theotherthing/solution.tex +++ b/text/problem/theotherthing/solution.tex @@ -42,16 +42,13 @@ It finds the option that is the most \emph{similar} to the original (Lines~{\ref % Evaluate the original \evalOrig $\leftarrow$ \evalSeq{$T, \emptyset, L$}\; - % Get all possible option combinations - \opts $\leftarrow$ \getOpts{$T, L$}\; - % Track the minimum (best) evaluation \diffMin $\leftarrow$ $\infty$\; % Track the optimal sequence (the one with the best evaluation) - \optim $\leftarrow$ $[]$\; + \opts $\leftarrow$ $[]$\; - \ForEach{\opt $\in$ \opts}{ \label{algo:lmdk-sel-opt-for-each} + \ForEach{\opt $\in$ \getOpts{$T, L$}}{ \label{algo:lmdk-sel-opt-for-each} \evalCur $\leftarrow 0$\; \ForEach{\opti $\in$ \opt}{ \evalCur $\leftarrow$ \evalCur $+$ \evalSeq{$T, \opti, L$}/\#\opt\; \label{algo:lmdk-sel-opt-comparison} @@ -60,10 +57,10 @@ It finds the option that is the most \emph{similar} to the original (Lines~{\ref \diffCur $\leftarrow \left|\evalCur - \evalOrig\right|$\; \If{\diffCur $<$ \diffMin}{ \diffMin $\leftarrow$ \diffCur\; - \optim $\leftarrow$ \opt\; + \opts $\leftarrow$ \opt\; } } \label{algo:lmdk-sel-opt-end} - \Return{\optim} + \Return{\opts} \end{algorithm} Algorithm~\ref{algo:lmdk-sel-opt} guarantees to return the optimal set of dummy {\thethings} with regard to the original set $L$. @@ -82,7 +79,7 @@ At each step it selects a new timestamp, that corresponds to a regular ({non-\th \DontPrintSemicolon \KwData{$T, L$} - \KwResult{\optim} + \KwResult{\opts} \BlankLine % Evaluate the original @@ -196,18 +193,25 @@ In the end of the process, we return \opts which contains all the versions of \h \subsubsection{Privacy-preserving option selection} \label{subsec:lmdk-opt-sel} -\mk{WIP} +The Algorithms of Section~\ref{subsec:lmdk-set-opts} return a set of possible versions of the original {\thething} set $L$ by adding extra timestamps in it from the series of events at timestamps $T \supseteq L$. +In the next step of the process, we randomly select a set by utilizing the exponential mechanism (Section~\ref{subsec:prv-mech}). +Prior to selecting a set, the exponential mechanism evaluates each set using a score function. +One way evaluate each set is by taking into account the temporal position the events in the sequence. % Nearby events Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}. Thus, taking into account more recent events with respect to {\thethings} can result in less privacy loss and better privacy protection overall. This leads to worse data utility. - % Depending on the {\thething} discovery technique The values of events near a {\thething} are usually similar to that of the latter. Therefore, privacy-preserving mechanisms are likely to approximate their values based on the nearest {\thething} instead of investing extra privacy budget to perturb their actual values; thus, spending less privacy budget. Saving privacy budget for releasing perturbed versions of actual event values can bring about better data utility. - % Distant events -However, indicating the existence of randomized/dummy {\thethings} nearby actual {\thethings} can increase the adversarial confidence regarding the location of the latter within a series of events. -Hence, choosing randomized/dummy {\thethings} far from the actual {\thethings} (and thus less relevant) can limit the final privacy loss. +However, indicating the existence of dummy {\thethings} nearby actual {\thethings} can increase the adversarial confidence regarding the location of the latter within a series of events. +Hence, choosing dummy {\thethings} far from the actual {\thethings} (and thus less relevant) can limit the final privacy loss. + +Another approach for the score function is to consider the number of events in each set. +On the one hand, sets with more dummy {\thethings} may render actual {\thethings} more indistinguishable probabilistically. +That is due to the fact that, it is harder for an adversary to pick a {\thething} when the ratio of {\thethings} to the size of the set gets lower. +On the other hand, more dummy {\thethings} lead to distributing the privacy budget to more events, and therefore investing less at each timestamp. +Thus, providing a better level of privacy protection.