problem: Intro of lmdk-sel-sol

This commit is contained in:
Manos Katsomallos 2021-10-12 01:40:27 +02:00
parent 81b31fcc87
commit 10f4f417e9

View File

@ -1,25 +1,22 @@
\subsection{Protecting {\thethings}} \subsection{Protecting {\thethings}}
\label{subsec:lmdk-sel-sol} \label{subsec:lmdk-sel-sol}
The main idea of the privacy-preserving {\thething} selection component is to privately select extra {\thething} event timestamps, i.e.,~dummy {\thethings}, from the set of timestamps $T /\ L$ of the time series $S_T$ and add them to the original {\thething} set $L$.
Thus, we create a new set $L'$ such that $L \subset L' \subseteq T$.
We generate a set of dummy {\thething} set options by adding regular event timestamps from $T /\ L$ to $L$ (Section~\ref{subsec:lmdk-set-opts}).
Then (Section~\ref{subsec:lmdk-opt-sel}), we utilize the exponential mechanism, with a utility function that calculates an indicator for each of the options in the set based on how much it differs from the original {\thething} set $L$, and randomly select one ot the options that we created earlier.
This process provides an extra layer of privacy protection to {\thethings}, and thus allows the release, and thereafter processing, of {\thething} timestamps.
% We utilize the exponential mechanism with a utility function that calculates an indicator for each of the options in the set that we selected in the previous step.
% The utility depends on the positioning of the {\thething} timestamps of an option in the series, e.g.,~the distance from the previous/next {\thething}, the distance from the start/end of the series, etc.
\subsubsection{{\Thething} set options} \subsubsection{{\Thething} set options}
\label{subsec:lmdk-set-opts} \label{subsec:lmdk-set-opts}
This step aims to select a set of candidate {\thething} timestamps options either by randomizing the actual timestamps (Section~\ref{subsec:lmdk-rnd}), or by inserting dummy timestamps (Section~\ref{subsec:lmdk-dum-gen}) to the actual {\thething} timestamps. This step aims to select a set of candidate {\thething} timestamps options either by randomizing the actual timestamps (Section~\ref{subsec:lmdk-rnd}), or by inserting dummy timestamps (Section~\ref{subsec:lmdk-dum-gen}) to the actual {\thething} timestamps.
\paragraph{{\Thething} randomization}
\label{subsec:lmdk-rnd}
A simple way to select a set of timestamps without disclosing the actual {\thethings} is by \emph{randomly} selecting an equally sized set of timestamps.
The randomization of the process, as we will discuss in more detail in Section~\ref{subsec:priv-opt-sel}, will depend on the positioning of the {\thethings} in the series of events.
In more detail, given a set of {\thething} timestamps $\{l_k\} \subseteq \{t_n\}$, where $\{t_n\}$ is an event sequence, we need to select all possible sets of size $k$ from $\{t_n\}$.
However, the introduction of randomization could impact arbitrarily the effectiveness of non-uniform privacy-protection methods.
This applies mainly in cases where we try to achieve optimal privacy-protection of {\thething} events while maximizing the utility of the data that corresponds to the rest of the series of events.
As a consequence, it is possible to end up providing lower levels of protection to {\thething} data than the one necessary, i.e.,~worse than the users' privacy-protection expectations.
The methodology that we present next (Section~\ref{subsec:lmdk-dum-gen}) attempts to tackle the aforementioned shortcoming.
\paragraph{Dummy {\thething} generation} \paragraph{Dummy {\thething} generation}
\label{subsec:lmdk-dum-gen} \label{subsec:lmdk-dum-gen}
@ -138,7 +135,7 @@ Note that the reverse heuristic approach, i.e.,~starting with $\{t_n\}$ {\thethi
\subsubsection{Privacy-preserving option selection} \subsubsection{Privacy-preserving option selection}
\label{subsec:priv-opt-sel} \label{subsec:lmdk-opt-sel}
% Nearby events % Nearby events
Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}. Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}.