evaluation: Updated shapes and minor corrections

This commit is contained in:
Manos Katsomallos 2021-10-15 15:35:29 +02:00
parent 0607c316b2
commit ab8896ae56
9 changed files with 19 additions and 19 deletions

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -2,8 +2,8 @@
\label{ch:eval}
In this chapter we present the experiments that we performed in order to evaluate {\thething} privacy (Chapter~\ref{ch:lmdk-prv}) on real and synthetic data sets.
Section~\ref{sec:eval-dtl} contains all the details regarding the data sets the we used for our experiments along with the system configurations.
Section~\ref{sec:eval-lmdk} evaluates the data utility of the {\thething} privacy mechanisms that we designed in Section~\ref{sec:thething} and investigates the behavior of the privacy loss under temporal correlation for different distributions of {\thethings}.
Section~\ref{sec:eval-lmdk-sel} justifies our decisions while designing the privacy-preserving {\thething} selection mechanism in Section~\ref{sec:theotherthing} and the data utility impact of the latter.
Section~\ref{sec:eval-lmdk} evaluates the data utility of the {\thething} privacy schemes that we designed in Section~\ref{sec:thething} and investigates the behavior of the privacy loss under temporal correlation for different distributions of {\thethings}.
Section~\ref{sec:eval-lmdk-sel} justifies our decisions while designing the privacy-preserving {\thething} selection module in Section~\ref{sec:theotherthing} and the data utility impact of the latter.
Finally, Section~\ref{sec:eval-sum} concludes this chapter by summarizing the main results derived from the experiments.
\input{evaluation/details}

View File

@ -5,7 +5,7 @@ With the experiments on the synthetic data sets (Section~\ref{subsec:sel-utl}) w
% \kat{is this distance the landmark distance that we saw just before ? clarify }
of the time series histograms for various distributions and {\thething} percentages.
This allows us to justify our design decisions for our concept that we showcased in Section~\ref{subsec:lmdk-sel-sol}.
With the experiments on the real data sets (Section~\ref{subsec:sel-prv}), we show the performance in terms of utility of our three {\thething} mechanisms in combination with the privacy-preserving {\thething} selection mechanism, which enhances the privacy protection that our concept provides.
With the experiments on the real data sets (Section~\ref{subsec:sel-prv}), we show the performance in terms of utility of our three {\thething} schemes in combination with the privacy-preserving {\thething} selection module, which enhances the privacy protection that our concept provides.
% \kat{Mention whether it improves the original proposal or not.}
@ -32,12 +32,12 @@ Comparing the results of the Euclidean distance in Figure~\ref{fig:sel-dist-norm
The maximum difference per {\thething} percentage is approximately $0.2$ for the former and $0.15$ for the latter between the bimodal and skewed {\thething} distributions.
Overall, the Euclidean distance achieves a mean normalized distance of $0.3$ while the Wasserstein distance a mean normalized distance that is equal to $0.2$.
Therefore, and by observing Figure~\ref{fig:sel-dist}, Wasserstein demonstrates a less consistent performance and less linear behavior among all possible {\thething} distributions.
Thus, we choose to utilize the Euclidean distance metric for the implementation of the privacy-preserving {\thething} selection mechanism in Section~\ref{subsec:lmdk-sel-sol}.
Thus, we choose to utilize the Euclidean distance metric for the implementation of the privacy-preserving {\thething} selection module in Section~\ref{subsec:lmdk-sel-sol}.
\subsection{Privacy budget tuning}
\label{subsec:sel-eps}
In Figure~\ref{fig:sel-eps} we test the Uniform mechanism in real data by investing different ratios ($1$\%, $10$\%, $25$\%, and $50$\%) of the available privacy budget $\varepsilon$ in the {\thething} selection mechanism and the remaining to perturbing the original data values, in order to figure out the optimal ratio value.
In Figure~\ref{fig:sel-eps} we test the Uniform scheme in real data by investing different ratios ($1$\%, $10$\%, $25$\%, and $50$\%) of the available privacy budget $\varepsilon$ in the {\thething} selection module and the remaining to perturbing the original data values, in order to figure out the optimal ratio value.
Uniform is our baseline implementation, and hence allows us to derive more accurate conclusions in this case.
In general, we are expecting to observe that greater ratios will result in more accurate, i.e.,~smaller, {\thething} sets and less accurate values in the released data.
@ -55,18 +55,18 @@ In general, we are expecting to observe that greater ratios will result in more
\subcaptionbox{T-drive\label{fig:t-drive-sel-eps}}{%
\includegraphics[width=.49\linewidth]{evaluation/t-drive-sel-eps}%
}%
\caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages. We apply the Uniform {\thething} privacy mechanism and vary the ratio of the privacy budget $\varepsilon$ that we allocate to the {\thething} selection mechanism.}
\caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages. We apply the Uniform {\thething} privacy scheme and vary the ratio of the privacy budget $\varepsilon$ that we allocate to the {\thething} selection module.}
\label{fig:sel-eps}
\end{figure}
The application of the randomized response mechanism, in the Copenhagen data set (Figure~\ref{fig:copenhagen-sel-eps}), is tolerant to the fluctuations of the privacy budget and maintains a relatively constant performance in terms of data utility.
For HUE (Figure~\ref{fig:hue-sel-eps}) and T-drive (Figure~\ref{fig:t-drive-sel-eps}), we observe that our implementation performs better for lower ratios, e.g.,~$0.01$, where we end up allocating the majority of the available privacy budget to the data release process instead of the {\thething} selection mechanism.
For HUE (Figure~\ref{fig:hue-sel-eps}) and T-drive (Figure~\ref{fig:t-drive-sel-eps}), we observe that our implementation performs better for lower ratios, e.g.,~$0.01$, where we end up allocating the majority of the available privacy budget to the data release process instead of the {\thething} selection module.
The results of this experiment indicate that we can safely allocate the majority of $\varepsilon$ for publishing the data values, and therefore achieve better data utility, while providing more robust privacy protection to the {\thething} set.
\subsection{Budget allocation and {\thething} selection}
\label{subsec:sel-prv}
Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptive mechanisms (presented in detail in Section~\ref{subsec:lmdk-mechs}) in combination with the {\thething} selection mechanism (Section~\ref{subsec:lmdk-sel-sol}).
Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptive schemes (presented in detail in Section~\ref{subsec:lmdk-mechs}) in combination with the {\thething} selection module (Section~\ref{subsec:lmdk-sel-sol}).
\begin{figure}[htp]
\centering
@ -83,18 +83,18 @@ Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptiv
\includegraphics[width=.49\linewidth]{evaluation/t-drive-sel}%
}%
\caption{
The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data, for different {\thething} percentages, with the incorporation of the privacy-preserving {\thething} selection mechanism.
The light short horizontal lines indicate the corresponding measurements from Figure~\ref{fig:real} without the {\thething} selection mechanism.
The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data, for different {\thething} percentages from Figure~\ref{fig:real}.
The markers indicate the corresponding measurements with the incorporation of the privacy-preserving {\thething} selection module.
}
\label{fig:real-sel}
\end{figure}
In comparison with the utility performance without the {\thething} selection mechanism (light short horizontal lines), we notice a slight deterioration for all three mechanisms.
This is natural since we allocated part of the available privacy budget to the privacy-preserving {\thething} selection mechanism, which in turn increased the number of {\thethings}, except for the case of $100$\% {\thethings}.
In comparison with the utility performance without the {\thething} selection module (solid bars), we notice a slight deterioration for all three schemes (markers).
This is natural since we allocated part of the available privacy budget to the privacy-preserving {\thething} selection module, which in turn increased the number of {\thethings}, except for the case of $100$\% {\thethings}.
Therefore, there is less privacy budget available for data publishing throughout the time series.
% for $0$\% and $100$\% {\thethings}.
% \kat{why not for the other percentages?}
Skip performs best in our experiments with HUE (Figure~\ref{fig:hue-sel}), due to the low range in the energy consumption and the high scale of the Laplace noise that it avoids due to the employed approximation.
However, for the Copenhagen data set (Figure~\ref{fig:copenhagen-sel}) and T-drive (Figure~\ref{fig:t-drive-sel}), Skip attains high mean absolute error, which exposes no benefit with respect to user-level protection.
Overall, Adaptive has a consistent performance in terms of utility for all of the data sets that we experimented with, and almost always outperforms the user-level privacy protection.
Thus, it is selected as the best mechanism to use in general.
Thus, it is selected as the best scheme to use in general.

View File

@ -8,11 +8,11 @@ With the experiments on the real data sets (Section~\ref{subsec:lmdk-expt-bgt}),
We define data utility as the mean absolute error introduced by the privacy mechanism.
We compare with the event- and user-level differential privacy protection levels, and show that, in the general case, {\thething} privacy allows for better data utility than user-level differential privacy while balancing between the two protection levels.
With the experiments on the synthetic data sets (Section~\ref{subsec:lmdk-expt-cor}) we show the overall privacy loss,
With the experiments on the synthetic data sets (Section~\ref{subsec:lmdk-expt-cor}) we show the temporal privacy loss,
% \kat{in the previous set of experiments we were measuring the MAE, now we are measuring the privacy loss... Why is that? Isn't it two sides of the same coin? }
i.e.,~the privacy budget $\varepsilon$ with the extra privacy loss because of the temporal correlation, under temporal correlation within our framework when tuning the size and statistical characteristics of the input {\thething} set $L$.
% \kat{mention briefly what you observe}
We observe that a greater average {\thething}--regular event distance in a time series can result into greater overall privacy loss under moderate and strong temporal correlation.
We observe that a greater average {\thething}--regular event distance in a time series can result into greater temporal privacy loss under moderate and strong temporal correlation.
\subsection{Budget allocation schemes}
@ -100,9 +100,9 @@ We observe that the uniform and bimodal distributions tend to limit the regular
This is due to the fact that the former scatters the {\thethings}, while the latter distributes them on both edges, leaving a shorter space uninterrupted by {\thethings}.
% and as a result they reduce the uninterrupted space by landmarks in the sequence.
On the contrary, distributing the {\thethings} at one part of the sequence, as in skewed or symmetric, creates a wider space without {\thethings}.
This study provides us with different distance settings that we are going to use in the subsequent overall privacy loss study.
This study provides us with different distance settings that we are going to use in the subsequent temporal privacy loss study.
Figure~\ref{fig:dist-cor} illustrates a comparison among the aforementioned distributions regarding the overall privacy loss under (a)~weak, (b)~moderate, and (c)~strong temporal correlation degrees.
Figure~\ref{fig:dist-cor} illustrates a comparison among the aforementioned distributions regarding the temporal privacy loss under (a)~weak, (b)~moderate, and (c)~strong temporal correlation degrees.
The line shows the overall privacy loss---for all cases of {\thething} distribution---without temporal correlation.
\begin{figure}[htp]
@ -120,7 +120,7 @@ The line shows the overall privacy loss---for all cases of {\thething} distribut
\includegraphics[width=.49\linewidth]{evaluation/dist-cor-stg}%
}%
\caption{
The overall privacy loss (privacy budget $\varepsilon$)
The temporal privacy loss (privacy budget $\varepsilon$)
% \kat{what is the unit for privacy loss? I t should appear on the diagram}
% \mk{It's the privacy budget epsilon}
for different {\thething} percentages and distributions under (a)~weak, (b)~moderate, and (c)~strong degrees of temporal correlation.
@ -132,7 +132,7 @@ The line shows the overall privacy loss---for all cases of {\thething} distribut
In combination with Figure~\ref{fig:avg-dist}, we conclude that a greater average {\thething}--regular event
% \kat{it was even, I changed it to event but do not know what youo want ot say}
% \mk{Fixed}
distance in a distribution can result into greater overall privacy loss under moderate and strong temporal correlation.
distance in a distribution can result into greater temporal privacy loss under moderate and strong temporal correlation.
This is due to the fact that the backward/forward privacy loss accumulates more over time in wider spaces without {\thethings} (see Section~\ref{sec:correlation}).
Furthermore, the behavior of the privacy loss is as expected regarding the temporal correlation degree: a stronger correlation degree generates higher privacy loss while widening the gap between the different distribution cases.
On the contrary, a weaker correlation degree makes it harder to differentiate among the {\thething} distributions.