problem: Wrote lmdk-opt-sel

This commit is contained in:
Manos Katsomallos 2021-10-12 12:01:20 +02:00
parent dd1f5beec8
commit 0b1ce3cbbc

View File

@ -42,16 +42,13 @@ It finds the option that is the most \emph{similar} to the original (Lines~{\ref
% Evaluate the original % Evaluate the original
\evalOrig $\leftarrow$ \evalSeq{$T, \emptyset, L$}\; \evalOrig $\leftarrow$ \evalSeq{$T, \emptyset, L$}\;
% Get all possible option combinations
\opts $\leftarrow$ \getOpts{$T, L$}\;
% Track the minimum (best) evaluation % Track the minimum (best) evaluation
\diffMin $\leftarrow$ $\infty$\; \diffMin $\leftarrow$ $\infty$\;
% Track the optimal sequence (the one with the best evaluation) % Track the optimal sequence (the one with the best evaluation)
\optim $\leftarrow$ $[]$\; \opts $\leftarrow$ $[]$\;
\ForEach{\opt $\in$ \opts}{ \label{algo:lmdk-sel-opt-for-each} \ForEach{\opt $\in$ \getOpts{$T, L$}}{ \label{algo:lmdk-sel-opt-for-each}
\evalCur $\leftarrow 0$\; \evalCur $\leftarrow 0$\;
\ForEach{\opti $\in$ \opt}{ \ForEach{\opti $\in$ \opt}{
\evalCur $\leftarrow$ \evalCur $+$ \evalSeq{$T, \opti, L$}/\#\opt\; \label{algo:lmdk-sel-opt-comparison} \evalCur $\leftarrow$ \evalCur $+$ \evalSeq{$T, \opti, L$}/\#\opt\; \label{algo:lmdk-sel-opt-comparison}
@ -60,10 +57,10 @@ It finds the option that is the most \emph{similar} to the original (Lines~{\ref
\diffCur $\leftarrow \left|\evalCur - \evalOrig\right|$\; \diffCur $\leftarrow \left|\evalCur - \evalOrig\right|$\;
\If{\diffCur $<$ \diffMin}{ \If{\diffCur $<$ \diffMin}{
\diffMin $\leftarrow$ \diffCur\; \diffMin $\leftarrow$ \diffCur\;
\optim $\leftarrow$ \opt\; \opts $\leftarrow$ \opt\;
} }
} \label{algo:lmdk-sel-opt-end} } \label{algo:lmdk-sel-opt-end}
\Return{\optim} \Return{\opts}
\end{algorithm} \end{algorithm}
Algorithm~\ref{algo:lmdk-sel-opt} guarantees to return the optimal set of dummy {\thethings} with regard to the original set $L$. Algorithm~\ref{algo:lmdk-sel-opt} guarantees to return the optimal set of dummy {\thethings} with regard to the original set $L$.
@ -82,7 +79,7 @@ At each step it selects a new timestamp, that corresponds to a regular ({non-\th
\DontPrintSemicolon \DontPrintSemicolon
\KwData{$T, L$} \KwData{$T, L$}
\KwResult{\optim} \KwResult{\opts}
\BlankLine \BlankLine
% Evaluate the original % Evaluate the original
@ -196,18 +193,25 @@ In the end of the process, we return \opts which contains all the versions of \h
\subsubsection{Privacy-preserving option selection} \subsubsection{Privacy-preserving option selection}
\label{subsec:lmdk-opt-sel} \label{subsec:lmdk-opt-sel}
\mk{WIP} The Algorithms of Section~\ref{subsec:lmdk-set-opts} return a set of possible versions of the original {\thething} set $L$ by adding extra timestamps in it from the series of events at timestamps $T \supseteq L$.
In the next step of the process, we randomly select a set by utilizing the exponential mechanism (Section~\ref{subsec:prv-mech}).
Prior to selecting a set, the exponential mechanism evaluates each set using a score function.
One way evaluate each set is by taking into account the temporal position the events in the sequence.
% Nearby events % Nearby events
Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}. Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}.
Thus, taking into account more recent events with respect to {\thethings} can result in less privacy loss and better privacy protection overall. Thus, taking into account more recent events with respect to {\thethings} can result in less privacy loss and better privacy protection overall.
This leads to worse data utility. This leads to worse data utility.
% Depending on the {\thething} discovery technique % Depending on the {\thething} discovery technique
The values of events near a {\thething} are usually similar to that of the latter. The values of events near a {\thething} are usually similar to that of the latter.
Therefore, privacy-preserving mechanisms are likely to approximate their values based on the nearest {\thething} instead of investing extra privacy budget to perturb their actual values; thus, spending less privacy budget. Therefore, privacy-preserving mechanisms are likely to approximate their values based on the nearest {\thething} instead of investing extra privacy budget to perturb their actual values; thus, spending less privacy budget.
Saving privacy budget for releasing perturbed versions of actual event values can bring about better data utility. Saving privacy budget for releasing perturbed versions of actual event values can bring about better data utility.
% Distant events % Distant events
However, indicating the existence of randomized/dummy {\thethings} nearby actual {\thethings} can increase the adversarial confidence regarding the location of the latter within a series of events. However, indicating the existence of dummy {\thethings} nearby actual {\thethings} can increase the adversarial confidence regarding the location of the latter within a series of events.
Hence, choosing randomized/dummy {\thethings} far from the actual {\thethings} (and thus less relevant) can limit the final privacy loss. Hence, choosing dummy {\thethings} far from the actual {\thethings} (and thus less relevant) can limit the final privacy loss.
Another approach for the score function is to consider the number of events in each set.
On the one hand, sets with more dummy {\thethings} may render actual {\thethings} more indistinguishable probabilistically.
That is due to the fact that, it is harder for an adversary to pick a {\thething} when the ratio of {\thethings} to the size of the set gets lower.
On the other hand, more dummy {\thethings} lead to distributing the privacy budget to more events, and therefore investing less at each timestamp.
Thus, providing a better level of privacy protection.