evaluation: Reviewed theotherthing

This commit is contained in:
Manos Katsomallos 2021-10-11 09:52:06 +02:00
parent 0ec6637ea9
commit a549fe290f

View File

@ -2,11 +2,9 @@
\label{sec:lmdk-sel-eval}
In this section we present the experiments that we performed, to test the methodology that we presented in Section~\ref{subsec:lmdk-sel-sol}, on real and synthetic data sets.
With the experiments on the synthetic data sets (Section~\ref{subsec:sel-utl}) we show the normalized Euclidean and Wasserstein distances on items for various distributions and {\thething} percentages.
This allows us to justify our design decisions during the process.
Privacy loss by our framework when tuning the size and statistical characteristics of the input {\thething} set $L$ with special emphasis on how the privacy loss under temporal correlation is affected by the number and distribution of the {\thethings}.
With the experiments on the real data sets (Section~\ref{subsec:sel-prv}), we show the performance in terms of utility of our three {\thething} mechanisms in combination with privacy preserving {\thething} that can be possibly applied to humans.
With the experiments on the synthetic data sets (Section~\ref{subsec:sel-utl}) we show the normalized Euclidean and Wasserstein distances of the time series histogram for various distributions and {\thething} percentages.
This allows us to justify our design decisions for our concept that we showcased in Section~\ref{subsec:lmdk-sel-sol}.
With the experiments on the real data sets (Section~\ref{subsec:sel-prv}), we show the performance in terms of utility of our three {\thething} mechanisms in combination with the privacy preserving {\thething} selection component.
\subsection{{\Thething} selection utility metrics}
@ -27,8 +25,11 @@ Figure~\ref{fig:sel-dist} demonstrates the normalized distance that we obtain wh
\end{figure}
Comparing the results of the Euclidean distance in Figure~\ref{fig:sel-dist-norm} with those of the Wasserstein in Figure~\ref{fig:sel-dist-emd} we conclude that the Euclidean distance provides more consistent results for all possible distributions.
% (0 + (0.25 + 0.25 + 0.3 + 0.3)/4 + (0.45 + 0.45 + 0.45 + 0.5)/4 + (0.5 + 0.5 + 0.7 + 0.7)/4 + (0.6 + 0.6 + 1 + 1)/4 + (0.3 + 0.3 + 0.3 + 0.3)/4)/6
% (0 + (0.15 + 0.15 + 0.15 + 0.15)/4 + (0.2 + 0.2 + 0.3 + 0.4)/4 + (0.3 + 0.3 + 0.6 + 0.6)/4 + (0.3 + 0.3 + 1 + 1)/4 + (0.05 + 0.05 + 0.05 + 0.05)/4)
The maximum difference is approximately $0.4$ for the former and $0.7$ for the latter between the bimodal and skewed {\thething} distribution.
Therefore, we choose to utilize the Euclidean distance metric for the implementation of the privacy-preserving {\thething} selection.
While both methods share the same mean normalized distance of $0.4$, the Euclidean distance demonstrates a more consistent performance among all possible {\thething} distributions.
Therefore, we choose to utilize the Euclidean distance metric for the implementation of the privacy-preserving {\thething} selection in Section~\ref{subsec:lmdk-sel-sol}.
\subsection{Budget allocation and {\thething} selection}
@ -53,8 +54,8 @@ Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptiv
\end{figure}
In comparison with the utility performance without the {\thething} selection component (Figure~\ref{fig:real}), we notice a slight deterioration for all three models.
This is natural since we allocated part of the available privacy budget to the {\thething} selection component which in turn increased the number of {\thethings}.
This is natural since we allocated part of the available privacy budget to the privacy-preserving {\thething} selection component which in turn increased the number of {\thethings}.
Therefore, there is less privacy budget available for data publishing throughout the time series for $0$\% and $100$\% {\thethings}.
Skip performs best in our experiments with HUE, due to the low range in the energy consumption and the high scale of the Laplace noise which it avoids due to its tendency to approximate.
However, for the Copenhagen data set and T-drive it attains greater mean absolute error than the user-level.
However, for the Copenhagen data set and T-drive it attains greater mean absolute error than the user-level protection scheme.
Overall, Adaptive has a consistent performance in terms of utility for all of the data sets that we experimented with.