theotherthing: Finished review
This commit is contained in:
parent
6ff77eab96
commit
8ad8456861
Binary file not shown.
@ -1,35 +1,32 @@
|
|||||||
\section{Selection of events}
|
\section{Selection of events}
|
||||||
\label{sec:theotherthing}
|
\label{sec:theotherthing}
|
||||||
|
|
||||||
In Section~\ref{sec:thething}, we introduced the notion of {\thething} events in privacy-preserving time series publishing.
|
In Section~\ref{sec:thething}, we introduced the notion of {\thething} events in privacy-preserving time series publishing.
|
||||||
The differentiation among regular and {\thething} events stipulates a privacy budget allocation that deviates from the application of existing differential privacy protection levels.
|
The differentiation among regular and {\thething} events stipulates a privacy budget allocation that deviates from the application of existing differential privacy protection levels.
|
||||||
Based on this novel event categorization, we designed three models (Section~\ref{subsec:lmdk-mechs}) that achieve {\thething} privacy.
|
Based on this novel event categorization, we designed three schemes (Section~\ref{subsec:lmdk-mechs}) that achieve {\thething} privacy.
|
||||||
For this, we assumed that the timestamps in the {\thething} set $L$ are not privacy-sensitive, and therefore we used them in our models as they were.
|
For this, we assumed that the timestamps in the {\thething} set $L$ are not privacy-sensitive, and therefore we used them in our models as they were.
|
||||||
|
|
||||||
This may pose a direct or indirect privacy threat to the data generators (users).
|
This may pose a direct or indirect privacy risk to the users.
|
||||||
For the former, we consider the case where we desire to publish $L$ as complimentary information to the release of the event values.
|
For the former, we consider the case where we desire to publish $L$ as complimentary information to the release of the event values.
|
||||||
For the latter, a potentially adversarial data consumer (analyst) may infer $L$ by observing the values of the privacy budget which is usually an inseparable attribute of the data release as an indicator of the privacy guarantee to the users and as an estimate of the data utility to the data analysts.
|
For the latter, a potentially adversarial data analyst may infer $L$ by observing the values of the privacy budget, which is usually an inseparable attribute of the data release as an indicator of the privacy guarantee to the users and as an estimate of the data utility to the analysts.
|
||||||
Hence, in both cases, a user-defined $L$ which is supposed to facilitate th configurable privacy protection of the user could end up posing a privacy threat to them.
|
Hence, in both cases, a user-defined $L$, which is supposed to facilitate the configurable privacy protection of the user, could end up posing a privacy risk to them.
|
||||||
|
|
||||||
In Example~\ref{ex:lmdk-risk}, we demonstrate the extreme case of the application of the Skip {\thething} privacy model from Figure~\ref{fig:lmdk-skip}, where we approximate {\thethings} and invest all of the available privacy budget to regular events, i.e.,~$\varepsilon_i = 0$, $\forall i \in L$.
|
In Example~\ref{ex:lmdk-risk}, we demonstrate the extreme case of the application of the \texttt{Skip} {\thething} privacy scheme from Figure~\ref{fig:lmdk-skip}, where we approximate {\thethings} with the latest data release and invest all of the available privacy budget to regular events.
|
||||||
|
|
||||||
\begin{example}
|
\begin{example}
|
||||||
\label{ex:lmdk-risk}
|
\label{ex:lmdk-risk}
|
||||||
|
Figure~\ref{fig:lmdk-risk} shows the privacy risk that the application of a {\thething} privacy scheme that nullifies or approximates outputs, similar to \texttt{Skip}, might cause.
|
||||||
Figure~\ref{fig:lmdk-risk} shows the privacy risks that the application of a {\thething} privacy model that nullifies or approximates outputs, similar to Skip, might cause.
|
We point out in red the details that might cause indirect information inference.
|
||||||
We point out (in light red shade) the details that might cause indirect information inference.
|
|
||||||
In this extreme case, the minimization of the privacy budget in combination with nullifying the output (either by not publishing or by adding a lot of noise) or approximating the current output with previously released outputs might hint to any adversary that the current event is a {\thething}.
|
In this extreme case, the minimization of the privacy budget in combination with nullifying the output (either by not publishing or by adding a lot of noise) or approximating the current output with previously released outputs might hint to any adversary that the current event is a {\thething}.
|
||||||
|
|
||||||
\begin{figure}[htp]
|
\begin{figure}[htp]
|
||||||
\centering
|
\centering
|
||||||
\includegraphics[width=\linewidth]{problem/lmdk-risk}
|
\includegraphics[width=.75\linewidth]{problem/lmdk-risk}
|
||||||
\caption{The privacy risks (in light red shade) that the application of the {\thething} privacy Skip model might pose.}
|
\caption{The privacy risk (highlighted in red) that the application of the {\thething} privacy \texttt{Skip} scheme might pose.}
|
||||||
\label{fig:lmdk-risk}
|
\label{fig:lmdk-risk}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
Apart from the privacy budget that we invested at {\thethings}, we can observe a pattern for the budgets at regular events as well.
|
Apart from the privacy budget that we invested at {\thethings}, we can observe a pattern for the budgets at regular events as well.
|
||||||
Therefore, an adversary who observes the values of the privacy budget can easily infer not only the number but also the exact temporal position of {\thethings}.
|
Therefore, an adversary who observes the values of the privacy budget can easily infer not only the number but also the exact temporal position of the {\thethings}.
|
||||||
|
|
||||||
\end{example}
|
\end{example}
|
||||||
|
|
||||||
\SetKwInput{KwResult}{Output}
|
\SetKwInput{KwResult}{Output}
|
||||||
@ -61,6 +58,8 @@ In Example~\ref{ex:lmdk-risk}, we demonstrate the extreme case of the applicatio
|
|||||||
\SetKwFunction{getHist}{getHist}
|
\SetKwFunction{getHist}{getHist}
|
||||||
\SetKwFunction{getOpts}{getOpts}
|
\SetKwFunction{getOpts}{getOpts}
|
||||||
\SetKwFunction{getNorm}{getNorm}
|
\SetKwFunction{getNorm}{getNorm}
|
||||||
|
\SetKwFunction{len}{len}
|
||||||
|
\SetKwFunction{sumHist}{sum}
|
||||||
|
|
||||||
\input{problem/theotherthing/contribution}
|
\input{problem/theotherthing/contribution}
|
||||||
\input{problem/theotherthing/problem}
|
\input{problem/theotherthing/problem}
|
||||||
|
@ -1,30 +1,29 @@
|
|||||||
\subsection{Protecting {\thethings}}
|
\subsection{Protecting {\thethings}}
|
||||||
\label{subsec:lmdk-sel-sol}
|
\label{subsec:lmdk-sel-sol}
|
||||||
|
The main idea of the privacy-preserving dummy {\thething} selection module is to privately select extra {\thething} event timestamps, i.e.,~dummy {\thethings}, from the set of timestamps $T \setminus L$ of the time series $S_T$ and add them to the original {\thething} set $L$.
|
||||||
|
Selecting extra events, on top of the actual {\thethings}, as dummy {\thethings}, can render the actual ones indistinguishable.
|
||||||
|
The goal is to create a new set $L'$ such that $L \subset L' \subseteq T$.
|
||||||
|
|
||||||
The main idea of the privacy-preserving {\thething} selection component is to privately select extra {\thething} event timestamps, i.e.,~dummy {\thethings}, from the set of timestamps $T /\ L$ of the time series $S_T$ and add them to the original {\thething} set $L$.
|
First, we generate a set of dummy {\thething} set options by adding regular event timestamps from $T \setminus L$ to $L$ (Section~\ref{subsec:lmdk-set-opts}).
|
||||||
Selecting extra events, on top of the actual {\thethings}, as dummy {\thethings} can render actual ones indistinguishable.
|
Then, we utilize the exponential mechanism, with a utility function that calculates an indicator for each of the options in the set, based on how much it differs from the original {\thething} set $L$, and randomly select one of the options (Section~\ref{subsec:lmdk-opt-sel}).
|
||||||
The goal is to select a list of sets with additional timestamps from a series of events at timestamps $T$ for a set of {\thethings} at $L \subseteq T$.
|
This process provides an extra layer of privacy protection to {\thethings}, and thus allows the processing, and thereafter releasing, of {\thething} timestamps.
|
||||||
Thus, we create a new set $L'$ such that $L \subset L' \subseteq T$.
|
|
||||||
|
|
||||||
First, we generate a set of dummy {\thething} set options by adding regular event timestamps from $T /\ L$ to $L$ (Section~\ref{subsec:lmdk-set-opts}).
|
|
||||||
Then, we utilize the exponential mechanism, with a utility function that calculates an indicator for each of the options in the set based on how much it differs from the original {\thething} set $L$, and randomly select one of the options (Section~\ref{subsec:lmdk-opt-sel}).
|
|
||||||
This process provides an extra layer of privacy protection to {\thethings}, and thus allows the release, and thereafter processing, of {\thething} timestamps.
|
|
||||||
|
|
||||||
% We utilize the exponential mechanism with a utility function that calculates an indicator for each of the options in the set that we selected in the previous step.
|
|
||||||
% The utility depends on the positioning of the {\thething} timestamps of an option in the series, e.g.,~the distance from the previous/next {\thething}, the distance from the start/end of the series, etc.
|
|
||||||
|
|
||||||
|
|
||||||
\subsubsection{{\Thething} set options generation}
|
\subsubsection{Dummy {\thething} selection}
|
||||||
\label{subsec:lmdk-set-opts}
|
\label{subsec:lmdk-set-opts}
|
||||||
|
|
||||||
Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} approach this problem with an optimal and heuristic methodology, respectively.
|
Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} approach this problem with an optimal and heuristic methodology, respectively.
|
||||||
Function \evalSeq evaluates the result of the union of $L$ and a timestamp combination from $T \setminus L$ by, e.g.,~estimating the standard deviation of all the distances from the previous/next {\thething}.
|
Function \evalSeq evaluates the result of the union of $L$ and a timestamp combination from $T \setminus L$ by, e.g.,~estimating the standard deviation of all the distances from the previous/next {\thething}.
|
||||||
\getOpts returns all the possible \emph{valid} sets of combinations \opt such that larger options contain all of the timestamps that are present in smaller ones.
|
\getOpts returns all the possible \emph{valid} sets of combinations \opt such that larger options contain all of the timestamps that are present in smaller ones.
|
||||||
Each combination contains a set of timestamps with sizes $\left|L\right| + 1, \left|L\right| + 2, \dots, \left|T\right|$, where each one of them is a combination of $L$ with $x \in [1, \left|T\right| - \left|L\right|]$ timestamps from $T$.
|
Each combination contains a set of timestamps with sizes $\left|L\right| + 1, \left|L\right| + 2, \dots, \left|T\right|$, where each one of them is a combination of $L$ with $x \in [1, \left|T\right| - \left|L\right|]$ timestamps from $T$.
|
||||||
|
|
||||||
\paragraph{Optimal}
|
|
||||||
Algorithm~\ref{algo:lmdk-sel-opt}, between Lines~{\ref{algo:lmdk-sel-opt-for-each}--\ref{algo:lmdk-sel-opt-end}} evaluates each option in \opts.
|
\paragraph{\texttt{Optimal}}
|
||||||
|
The \texttt{Optimal} algorithm (Algorithm~\ref{algo:lmdk-sel-opt}) generates every possible combination (options) of {\thething} sets $L'$ containing one set from every possible size, i.e,~$|L| + 1, |L| + 2, \dots, |T|$.
|
||||||
|
Each $L'$ contains the original {\thethings} along with timestamps of regular events from $T \setminus L$ (dummy {\thethings}).
|
||||||
|
Then, it evaluates each option by comparing each of its sets with the original {\thething} set $L$ and estimating an overall similarity score for each option (Lines~{\ref{algo:lmdk-sel-opt-for-each}--\ref{algo:lmdk-sel-opt-end}}).
|
||||||
|
We discuss possible utility score functions later on in Section~\ref{subsec:lmdk-opt-sel}.
|
||||||
It finds the option that is the most \emph{similar} to the original (Lines~{\ref{algo:lmdk-sel-opt-comparison}-\ref{algo:lmdk-sel-opt-end}}), i.e.,~the option that has an evaluation that differs the least from that of the sequence $T$ with {\thethings} $L$.
|
It finds the option that is the most \emph{similar} to the original (Lines~{\ref{algo:lmdk-sel-opt-comparison}-\ref{algo:lmdk-sel-opt-end}}), i.e.,~the option that has an evaluation that differs the least from that of the sequence $T$ with {\thethings} $L$.
|
||||||
|
The goal of this process is to select the option that contains the combination of dummy {\thething} sets that achieve the best score.
|
||||||
|
|
||||||
\begin{algorithm}
|
\begin{algorithm}
|
||||||
\caption{Optimal dummy {\thething} set options generation}
|
\caption{Optimal dummy {\thething} set options generation}
|
||||||
@ -32,11 +31,8 @@ It finds the option that is the most \emph{similar} to the original (Lines~{\ref
|
|||||||
|
|
||||||
\DontPrintSemicolon
|
\DontPrintSemicolon
|
||||||
|
|
||||||
\KwData{$T, L$}
|
\KwData{the time series timestamps $T$, the {\thething} set $L$}
|
||||||
|
\KwResult{the selected {\thething} set options \opts}
|
||||||
\SetKwInput{KwData}{Input}
|
|
||||||
|
|
||||||
\KwResult{\optim}
|
|
||||||
\BlankLine
|
\BlankLine
|
||||||
|
|
||||||
% Evaluate the original
|
% Evaluate the original
|
||||||
@ -63,14 +59,16 @@ It finds the option that is the most \emph{similar} to the original (Lines~{\ref
|
|||||||
\Return{\opts}
|
\Return{\opts}
|
||||||
\end{algorithm}
|
\end{algorithm}
|
||||||
|
|
||||||
Algorithm~\ref{algo:lmdk-sel-opt} guarantees to return the optimal set of dummy {\thethings} with regard to the original set $L$.
|
Algorithm~\ref{algo:lmdk-sel-opt} guarantees to return the optimal option with regard to the original set $L$.
|
||||||
However, it is rather costly in terms of complexity: given $n$ regular events and a combination of size $r$, it requires $\mathcal{O}(C(n, r) + 2^C(n, r))$ time and $\mathcal{O}(r*C(n, r))$ space.
|
However, it is rather costly in terms of complexity.
|
||||||
Next, we present a heuristic solution with improved time and space requirements.
|
In more detail, given $|T \setminus L|$ regular events and a combination of size $r$, it requires $O(C(|T \setminus L|, r) + 2^{C(|T \setminus L|, r)})$ time and $O(r*C(|T \setminus L|, r))$ space.
|
||||||
|
Next, we present a \texttt{Heuristic} solution with improved time and space requirements.
|
||||||
|
|
||||||
|
|
||||||
\paragraph{Heuristic}
|
\paragraph{\texttt{Heuristic}}
|
||||||
Algorithm~\ref{algo:lmdk-sel-heur}, follows an incremental methodology.
|
The \texttt{Heuristic} algorithm (Algorithm~\ref{algo:lmdk-sel-heur}) follows an incremental methodology and at each step it selects a new timestamp, corresponding to a regular event from $T \setminus L'$.
|
||||||
At each step it selects a new timestamp, that corresponds to a regular ({non-\thething}) event from $T \setminus L$, to create an option.
|
In this case, the elements of $L'$ at each step differ by one from the one that the algorithm selected in the previous step.
|
||||||
|
Similar to the \texttt{Optimal}, it selects a new set based on a predefined similarity metric until it selects a set that is equal to the size of the series of events, i.e.,~$L' = T$.
|
||||||
|
|
||||||
\begin{algorithm}
|
\begin{algorithm}
|
||||||
\caption{Heuristic dummy {\thething} set options selection}
|
\caption{Heuristic dummy {\thething} set options selection}
|
||||||
@ -78,8 +76,8 @@ At each step it selects a new timestamp, that corresponds to a regular ({non-\th
|
|||||||
|
|
||||||
\DontPrintSemicolon
|
\DontPrintSemicolon
|
||||||
|
|
||||||
\KwData{$T, L$}
|
\KwData{the time series timestamps $T$, the {\thething} set $L$}
|
||||||
\KwResult{\opts}
|
\KwResult{the selected {\thething} set options \opts}
|
||||||
\BlankLine
|
\BlankLine
|
||||||
|
|
||||||
% Evaluate the original
|
% Evaluate the original
|
||||||
@ -122,106 +120,85 @@ At each step it selects a new timestamp, that corresponds to a regular ({non-\th
|
|||||||
|
|
||||||
Similar to Algorithm~\ref{algo:lmdk-sel-opt}, it selects new options based on a predefined metric (Lines~{\ref{algo:lmdk-sel-heur-comparison}-\ref{algo:lmdk-sel-heur-cmp-end}}).
|
Similar to Algorithm~\ref{algo:lmdk-sel-opt}, it selects new options based on a predefined metric (Lines~{\ref{algo:lmdk-sel-heur-comparison}-\ref{algo:lmdk-sel-heur-cmp-end}}).
|
||||||
This process (Lines~{\ref{algo:lmdk-sel-heur-while}-\ref{algo:lmdk-sel-heur-end}}) goes on until we select a set that is equal to the size of the series of events, i.e.,~$L' = T$.
|
This process (Lines~{\ref{algo:lmdk-sel-heur-while}-\ref{algo:lmdk-sel-heur-end}}) goes on until we select a set that is equal to the size of the series of events, i.e.,~$L' = T$.
|
||||||
|
In terms of complexity, given $|T \setminus L|$ regular events, the \texttt{Heuristic} requires $O(|T \setminus L|^2)$ time and space.
|
||||||
In terms of complexity, given $n$ regular events it requires $\mathcal{O}(n^2)$ time and space.
|
Note that the reverse process, i.e.,~starting with $T$ {\thethings} and removing until $|L'| = |L| + 1$, performs similarly.
|
||||||
Note that the reverse heuristic approach, i.e.,~starting with $T$ {\thethings} and removing until $L$, performs similarly with Algorithm~\ref{algo:lmdk-sel-heur}.
|
|
||||||
|
|
||||||
|
|
||||||
\paragraph{Partitioned}
|
\paragraph{\texttt{Partitioned}}
|
||||||
We improve the complexity of Algorithm~\ref{algo:lmdk-sel-opt} by partitioning the {\thething} timestamp sequence $L$.
|
We improve the complexity of the \texttt{Heuristic} algorithm by partitioning the {\thething} timestamp sequence $L$.
|
||||||
Algorithm~\ref{algo:lmdk-sel-hist}, \getHist generates a histogram from $L$ with bins of size \h.
|
The novelty of this algorithm lies in the fact that it deals with the event series as a histogram which allows it to take advantage of its relevant features and methodology.
|
||||||
We find \h by using the Freedman–Diaconis rule which is resilient to outliers and takes into account the data variability and data size~\cite{meshgi2015expanding}.
|
Particularly, it uses the Freedman-Diaconis rule, which is resilient to outliers and takes into account the data variability and data size~\cite{meshgi2015expanding}, and generates a histogram from the {\thething} set $L$.
|
||||||
For every possible histogram version, the \getDiff function finds the difference between two histograms; for this operation we utilize the Euclidean distance~(see Section~\ref{subsec:sel-utl} for more details).
|
This way, it achieves an improved complexity, compared to the \texttt{Heuristic}, that is dependent on the histogram's bin size.
|
||||||
|
Algorithm~\ref{algo:lmdk-sel-hist} demonstrates the overall process.
|
||||||
|
|
||||||
\begin{algorithm}
|
\begin{algorithm}
|
||||||
\caption{Partitioned dummy {\thething} set options selection}
|
\caption{\texttt{Partitioned} {\thething} set options generation}
|
||||||
\label{algo:lmdk-sel-hist}
|
\label{algo:lmdk-sel-hist}
|
||||||
|
|
||||||
\DontPrintSemicolon
|
\DontPrintSemicolon
|
||||||
|
\KwData{the time series timestamps $T$, the {\thething} set $L$}
|
||||||
\KwData{$T, L$}
|
\KwResult{the selected {\thething} set options \opts}
|
||||||
\KwResult{\opts}
|
% \kat{verify description of variables}
|
||||||
|
% \mk{OK}
|
||||||
\BlankLine
|
\BlankLine
|
||||||
|
|
||||||
\hist, \h $\leftarrow$ \getHist{$T, L$}\;
|
\hist, \h $\leftarrow$ \getHist{$T, L$}\;
|
||||||
|
\histCur $\leftarrow$ \hist\;
|
||||||
\histCur $\leftarrow$ hist\;
|
|
||||||
|
|
||||||
\opts $\leftarrow$ $[]$\;
|
\opts $\leftarrow$ $[]$\;
|
||||||
|
% \kat{L' not defined..}
|
||||||
\While{sum($L'$) $\neq$ len($T$)}{ \label{algo:lmdk-sel-hist-while}
|
% \mk{It was histCur}
|
||||||
% Track the minimum (best) evaluation
|
\While{\sumHist{\histCur} $\neq$ \len{$T$}}{
|
||||||
\diffMin $\leftarrow$ $\infty$\;
|
\label{algo:lmdk-sel-hist-while}
|
||||||
|
\diffMin $\leftarrow$ $\infty$\; % \tcp*{Track the best evaluation}
|
||||||
% The candidate option
|
\opt $\leftarrow$ \histCur\; % \tcp*{The candidate option}
|
||||||
\opt $\leftarrow$ \histCur\;
|
\ForEach{\hi \textnormal{\textbf{in}} \histCur}{ % \tcp*{Repeat for every bin}
|
||||||
|
\label{algo:lmdk-sel-hist-cmp-start}
|
||||||
% Check every possibility
|
\If{\hi $+$ $1$ $\leq$ \h}{ % \tcp*{Can we add one more point?}
|
||||||
\ForEach{\hi \reg $L'$}{ \label{algo:lmdk-sel-hist-cmp-start}
|
|
||||||
|
|
||||||
% Can we add one more point?
|
|
||||||
\If{\hi $+$ $1$ $\leq$ \h}{
|
|
||||||
\histTmp $\leftarrow$ \histCur\;
|
\histTmp $\leftarrow$ \histCur\;
|
||||||
\histTmp$[i]$ $\leftarrow$ \histTmp$[i]$ $+$ $1$\;
|
{\histTmp}[$i$] $\leftarrow$ {\histTmp}[$i$] $+$ $1$\;
|
||||||
% Find difference from original
|
\diffCur $\leftarrow$ \getDiff{\hist, \histTmp}\; % \tcp*{Find difference from original}
|
||||||
\diffCur $\leftarrow$ \getDiff{\hist, \histTmp}\;
|
\label{algo:lmdk-sel-hist-getDiff}
|
||||||
|
\If{\diffCur $<$ \diffMin}{ % \tcp*{Remember if it is the best that you've seen}
|
||||||
% Remember if it is the best that you've seen
|
\label{algo:lmdk-sel-hist-cmp}
|
||||||
\If{\diffCur $<$ \diffMin}{ \label{algo:lmdk-sel-hist-cmp}
|
|
||||||
\diffMin $\leftarrow$ \diffCur\;
|
\diffMin $\leftarrow$ \diffCur\;
|
||||||
\opt $\leftarrow$ \histTmp\;
|
\opt $\leftarrow$ \histTmp\;
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
} \label{algo:lmdk-sel-hist-cmp-end}
|
} \label{algo:lmdk-sel-hist-cmp-end}
|
||||||
|
\histCur $\leftarrow$ \opt\; % \tcp*{Update current histogram}
|
||||||
% Update current histogram
|
\opts $\leftarrow$ \opt\; % \tcp*{Add current best to options}
|
||||||
\histCur $\leftarrow$ \opt\;
|
|
||||||
% Add current best to options
|
|
||||||
\opts $\leftarrow$ \opt\;
|
|
||||||
|
|
||||||
} \label{algo:lmdk-sel-hist-end}
|
} \label{algo:lmdk-sel-hist-end}
|
||||||
|
|
||||||
\Return{\opts}
|
\Return{\opts}
|
||||||
\end{algorithm}
|
\end{algorithm}
|
||||||
|
|
||||||
In Lines~{\ref{algo:lmdk-sel-hist-cmp-start}-\ref{algo:lmdk-sel-hist-cmp-end}} we check every possible histogram version by incrementing each bin by $1$ and comparing it to the original (Line~\ref{algo:lmdk-sel-hist-cmp}).
|
Function \getHist generates a histogram with bins of size \h for a given time series timestamps $T$ and {\thething} set $L$.
|
||||||
In the end of the process, we return \opts which contains all the versions of \hist that are closest to \hist for all possible sizes of \hist.
|
For every new histogram version, the \getDiff function (Line~\ref{algo:lmdk-sel-hist-getDiff}) finds the difference from the original histogram; for this operation it utilizes the Euclidean distance~(see Section~\ref{subsec:sel-utl} for more details).
|
||||||
|
In Lines~{\ref{algo:lmdk-sel-hist-cmp-start}-\ref{algo:lmdk-sel-hist-cmp-end}}, the algorithm checks every histogram version by incrementing each bin by $1$ and comparing it to the original (Line~\ref{algo:lmdk-sel-hist-cmp}).
|
||||||
|
In the end, it returns \opts which contains all the versions of \hist that are closest to the original \hist for all possible bin sizes of \hist.
|
||||||
|
|
||||||
|
|
||||||
\subsubsection{Privacy-preserving option selection}
|
\subsubsection{Privacy-preserving option selection}
|
||||||
\label{subsec:lmdk-opt-sel}
|
\label{subsec:lmdk-opt-sel}
|
||||||
|
The algorithms that we presented in Section~\ref{subsec:lmdk-set-opts} return a set of possible versions of the original {\thething} set $L$ by adding extra timestamps in it from the series of events at timestamps $T \setminus L$.
|
||||||
The Algorithms of Section~\ref{subsec:lmdk-set-opts} return a set of possible versions of the original {\thething} set $L$ by adding extra timestamps in it from the series of events at timestamps $T \supseteq L$.
|
In the next step, we randomly select a set by utilizing the exponential mechanism (Section~\ref{subsec:prv-mech}).
|
||||||
In the next step of the process, we randomly select a set by utilizing the exponential mechanism (Section~\ref{subsec:prv-mech}).
|
For this procedure, we allocate a small fraction of the available privacy budget, i.e.,~$1$\% or even less (see Section~\ref{subsec:sel-eps} for more details), which adds up to that of the publishing scheme according to Theorem~\ref{theor:compo-seq-ind}.
|
||||||
For this procedure, we allocate a small fraction of the available privacy budget, i.e.,~$1$\% or even less (see Section~\ref{subsec:sel-eps} for more details).
|
|
||||||
|
|
||||||
|
|
||||||
\paragraph{Utility score function}
|
\paragraph{Utility score function}
|
||||||
Prior to selecting a set, the exponential mechanism evaluates each set using a utility score function.
|
Prior to selecting a {\thething} timestamp set including the original along with dummy {\thethings}, the exponential mechanism evaluates each set using a utility score function.
|
||||||
|
We present here two ways of doing so.
|
||||||
|
|
||||||
One way evaluate each set is by taking into account the temporal position the events in the sequence.
|
One way to evaluate each set is by taking into account the temporal position of the events in the sequence.
|
||||||
% Nearby events
|
|
||||||
Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}.
|
Events that occur at recent timestamps are more likely to reveal sensitive information regarding the users involved~\cite{kellaris2014differentially}.
|
||||||
Thus, taking into account more recent events with respect to {\thethings} can result in less privacy loss and better privacy protection overall.
|
Hence, indicating the existence of dummy {\thethings} nearby actual {\thethings} can increase the adversarial confidence regarding the location of the latter within a series of events.
|
||||||
This leads to worse data utility.
|
In other words, sets with dummy {\thethings} with less average temporal distance from actual {\thethings} achieve better utility scores.
|
||||||
% Depending on the {\thething} discovery technique
|
|
||||||
The values of events near a {\thething} are usually similar to that of the latter.
|
|
||||||
Therefore, privacy-preserving mechanisms are likely to approximate their values based on the nearest {\thething} instead of investing extra privacy budget to perturb their actual values; thus, spending less privacy budget.
|
|
||||||
Saving privacy budget for releasing perturbed versions of actual event values can bring about better data utility.
|
|
||||||
% Distant events
|
|
||||||
However, indicating the existence of dummy {\thethings} nearby actual {\thethings} can increase the adversarial confidence regarding the location of the latter within a series of events.
|
|
||||||
Hence, choosing dummy {\thethings} far from the actual {\thethings} (and thus less relevant) can limit the final privacy loss.
|
|
||||||
|
|
||||||
Another approach for the score function is to consider the number of events in each set.
|
Another approach for the utility score function is to consider the number of events in each set.
|
||||||
On the one hand, sets with more dummy {\thethings} may render actual {\thethings} more indistinguishable probabilistically.
|
Sets with more dummy {\thethings} may render actual {\thethings} more indistinguishable, and therefore provide less utility.
|
||||||
That is due to the fact that, it is harder for an adversary to pick a {\thething} when the ratio of {\thethings} to the size of the set gets lower.
|
Consequently, more dummy {\thethings} lead to distributing the privacy budget to more events, and therefore leading to more robust overall privacy protection.
|
||||||
On the other hand, more dummy {\thethings} lead to distributing the privacy budget to more events, and therefore investing less at each timestamp.
|
|
||||||
Thus, providing a better level of privacy protection.
|
|
||||||
|
|
||||||
|
|
||||||
\paragraph{Option release}
|
\paragraph{Option release}
|
||||||
The options that Algorithms~\ref{algo:lmdk-sel-opt} and \ref{algo:lmdk-sel-heur} generate contain actual timestamps which can be utilized directly by the {\thething} privacy mechanisms that we presented in Section~\ref{subsec:lmdk-mechs}.
|
In the last step, the privacy-preserving dummy {\thething} selection module releases a new {\thething} set (including the original {\thethings} along with the dummy ones) from the options that were generated in the previous step, by utilizing the exponential mechanism.
|
||||||
However, Algorithm~\ref{algo:lmdk-sel-hist} returns histograms instead of timestamps.
|
|
||||||
Therefore, we need to process the result of the exponential mechanism further by creating a sample from the true {\thethings} and populating it with the remaining amount of choices, i.e.,~$\left|L'\right| - \left|L\right|$ by performing sampling without replacement from the resulting option $L$.
|
The options generated by the \texttt{Optimal} and \texttt{Heuristic} algorithms contain actual timestamps that can be utilized directly by the {\thething} privacy schemes that we presented in Section~\ref{subsec:lmdk-sol}.
|
||||||
|
However, the \texttt{Partitioned} algorithm returns histograms instead of timestamps.
|
||||||
|
Therefore, we need to process the result of the exponential mechanism further by sampling without replacement from the set $T \setminus L$ according to the selected histogram's probability density function.
|
||||||
|
Loading…
Reference in New Issue
Block a user