101 lines
7.5 KiB
TeX
101 lines
7.5 KiB
TeX
\section{Selection of {\thethings}}
|
|
\label{sec:eval-lmdk-sel}
|
|
In this section, we present the experiments on the methodology for the {\thething} selection presented in Section~\ref{subsec:lmdk-sel-sol}, on the real and synthetic data sets.
|
|
With the experiments on the synthetic data sets (Section~\ref{subsec:sel-utl}) we show the normalized Euclidean and Wasserstein distance metrics (not to be confused with the temporal distances in Figure~\ref{fig:avg-dist})
|
|
% \kat{is this distance the landmark distance that we saw just before ? clarify }
|
|
of the time series histograms for various distributions and {\thething} percentages.
|
|
This allows us to justify our design decisions for our concept that we showcased in Section~\ref{subsec:lmdk-sel-sol}.
|
|
With the experiments on the real data sets (Section~\ref{subsec:sel-prv}), we show the performance in terms of utility of our three {\thething} schemes in combination with the privacy-preserving {\thething} selection module, which enhances the privacy protection that our concept provides.
|
|
% \kat{Mention whether it improves the original proposal or not.}
|
|
|
|
|
|
\subsection{{\Thething} selection utility metrics}
|
|
\label{subsec:sel-utl}
|
|
Figure~\ref{fig:sel-dist} demonstrates the normalized distance that we obtain when we utilize either (a)~the Euclidean or (b)~the Wasserstein distance metric to obtain a set of {\thethings} including regular events.
|
|
|
|
\begin{figure}[htp]
|
|
\centering
|
|
\subcaptionbox{Euclidean\label{fig:sel-dist-norm}}{%
|
|
\includegraphics[width=.49\linewidth]{evaluation/sel-dist-norm}%
|
|
}%
|
|
\hfill
|
|
\subcaptionbox{Wasserstein\label{fig:sel-dist-emd}}{%
|
|
\includegraphics[width=.49\linewidth]{evaluation/sel-dist-emd}%
|
|
}%
|
|
\caption{The normalized (a)~Euclidean, and (b)~Wasserstein distance of the generated {\thething} sets for different {\thething} percentages.}
|
|
\label{fig:sel-dist}
|
|
\end{figure}
|
|
|
|
Comparing the results of the Euclidean distance in Figure~\ref{fig:sel-dist-norm} with those of the Wasserstein in Figure~\ref{fig:sel-dist-emd} we conclude that the Euclidean distance provides more consistent results for all possible distributions.
|
|
% (1 + (0.25 + 0.25 + 0.45 + 0.45)/4 + (0.25 + 0.25 + 0.3 + 0.3)/4 + (0.2 + 0.2 + 0.2 + 0.2)/4 + (0.15 + 0.15 + 0.15 + 0.15)/4)/6
|
|
% (1 + (0.1 + 0.1 + 0.25 + 0.25)/4 + (0.075 + 0.075 + .15 + 0.15)/4 + (0.075 + 0.075 + 0.1 + 0.1)/4 + (0.025 + 0.025 + 0.025 + 0.025)/4)/6
|
|
The maximum difference per {\thething} percentage is approximately $0.2$ for the former and $0.15$ for the latter between the bimodal and skewed {\thething} distributions.
|
|
Overall, the Euclidean distance achieves a mean normalized distance of $0.3$ while the Wasserstein distance a mean normalized distance that is equal to $0.2$.
|
|
Therefore, and by observing Figure~\ref{fig:sel-dist}, Wasserstein demonstrates a less consistent performance and less linear behavior among all possible {\thething} distributions.
|
|
Thus, we choose to utilize the Euclidean distance metric for the implementation of the privacy-preserving {\thething} selection module in Section~\ref{subsec:lmdk-sel-sol}.
|
|
|
|
|
|
\subsection{Privacy budget tuning}
|
|
\label{subsec:sel-eps}
|
|
In Figure~\ref{fig:sel-eps} we test the Uniform scheme in real data by investing different ratios ($1$\%, $10$\%, $25$\%, and $50$\%) of the available privacy budget $\varepsilon$ in the {\thething} selection module and the remaining to perturbing the original data values, in order to figure out the optimal ratio value.
|
|
Uniform is our baseline implementation, and hence allows us to derive more accurate conclusions in this case.
|
|
In general, we are expecting to observe that greater ratios will result in more accurate, i.e.,~smaller, {\thething} sets and less accurate values in the released data.
|
|
|
|
\begin{figure}[htp]
|
|
\centering
|
|
\subcaptionbox{Copenhagen\label{fig:copenhagen-sel-eps}}{%
|
|
\includegraphics[width=.49\linewidth]{evaluation/copenhagen-sel-eps}%
|
|
}%
|
|
\hspace{\fill}
|
|
\\ \bigskip
|
|
\subcaptionbox{HUE\label{fig:hue-sel-eps}}{%
|
|
\includegraphics[width=.49\linewidth]{evaluation/hue-sel-eps}%
|
|
}%
|
|
\hfill
|
|
\subcaptionbox{T-drive\label{fig:t-drive-sel-eps}}{%
|
|
\includegraphics[width=.49\linewidth]{evaluation/t-drive-sel-eps}%
|
|
}%
|
|
\caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages. We apply the Uniform {\thething} privacy scheme and vary the ratio of the privacy budget $\varepsilon$ that we allocate to the {\thething} selection module.}
|
|
\label{fig:sel-eps}
|
|
\end{figure}
|
|
|
|
The application of the randomized response mechanism, in the Copenhagen data set (Figure~\ref{fig:copenhagen-sel-eps}), is tolerant to the fluctuations of the privacy budget and maintains a relatively constant performance in terms of data utility.
|
|
For HUE (Figure~\ref{fig:hue-sel-eps}) and T-drive (Figure~\ref{fig:t-drive-sel-eps}), we observe that our implementation performs better for lower ratios, e.g.,~$0.01$, where we end up allocating the majority of the available privacy budget to the data release process instead of the {\thething} selection module.
|
|
The results of this experiment indicate that we can safely allocate the majority of $\varepsilon$ for publishing the data values, and therefore achieve better data utility, while providing more robust privacy protection to the {\thething} set.
|
|
|
|
|
|
\subsection{Budget allocation and {\thething} selection}
|
|
\label{subsec:sel-prv}
|
|
Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptive schemes (presented in detail in Section~\ref{subsec:lmdk-mechs}) in combination with the {\thething} selection module (Section~\ref{subsec:lmdk-sel-sol}).
|
|
|
|
\begin{figure}[htp]
|
|
\centering
|
|
\subcaptionbox{Copenhagen\label{fig:copenhagen-sel}}{%
|
|
\includegraphics[width=.49\linewidth]{evaluation/copenhagen-sel}%
|
|
}%
|
|
\hfill
|
|
\\ \bigskip
|
|
\subcaptionbox{HUE\label{fig:hue-sel}}{%
|
|
\includegraphics[width=.49\linewidth]{evaluation/hue-sel}%
|
|
}%
|
|
\hfill
|
|
\subcaptionbox{T-drive\label{fig:t-drive-sel}}{%
|
|
\includegraphics[width=.49\linewidth]{evaluation/t-drive-sel}%
|
|
}%
|
|
\caption{
|
|
The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data, for different {\thething} percentages from Figure~\ref{fig:real}.
|
|
The markers indicate the corresponding measurements with the incorporation of the privacy-preserving {\thething} selection module.
|
|
}
|
|
\label{fig:real-sel}
|
|
\end{figure}
|
|
|
|
In comparison with the utility performance without the {\thething} selection module (solid bars), we notice a slight deterioration for all three schemes (markers).
|
|
This is natural since we allocated part of the available privacy budget to the privacy-preserving {\thething} selection module, which in turn increased the number of {\thethings}, except for the case of $100$\% {\thethings}.
|
|
Therefore, there is less privacy budget available for data publishing throughout the time series.
|
|
% for $0$\% and $100$\% {\thethings}.
|
|
% \kat{why not for the other percentages?}
|
|
Skip performs best in our experiments with HUE (Figure~\ref{fig:hue-sel}), due to the low range in the energy consumption and the high scale of the Laplace noise that it avoids due to the employed approximation.
|
|
However, for the Copenhagen data set (Figure~\ref{fig:copenhagen-sel}) and T-drive (Figure~\ref{fig:t-drive-sel}), Skip attains high mean absolute error, which exposes no benefit with respect to user-level protection.
|
|
Overall, Adaptive has a consistent performance in terms of utility for all of the data sets that we experimented with, and almost always outperforms the user-level privacy protection.
|
|
Thus, it is selected as the best scheme to use in general.
|