Merge branch 'master' of git.delkappa.com:manos/the-last-thing
This commit is contained in:
commit
cc3c6d6c37
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
@ -2,8 +2,8 @@
|
||||
\label{ch:eval}
|
||||
In this chapter we present the experiments that we performed in order to evaluate {\thething} privacy (Chapter~\ref{ch:lmdk-prv}) on real and synthetic data sets.
|
||||
Section~\ref{sec:eval-dtl} contains all the details regarding the data sets the we used for our experiments along with the system configurations.
|
||||
Section~\ref{sec:eval-lmdk} evaluates the data utility of the {\thething} privacy mechanisms that we designed in Section~\ref{sec:thething} and investigates the behavior of the privacy loss under temporal correlation for different distributions of {\thethings}.
|
||||
Section~\ref{sec:eval-lmdk-sel} justifies our decisions while designing the privacy-preserving {\thething} selection mechanism in Section~\ref{sec:theotherthing} and the data utility impact of the latter.
|
||||
Section~\ref{sec:eval-lmdk} evaluates the data utility of the {\thething} privacy schemes that we designed in Section~\ref{sec:thething} and investigates the behavior of the privacy loss under temporal correlation for different distributions of {\thethings}.
|
||||
Section~\ref{sec:eval-lmdk-sel} justifies our decisions while designing the privacy-preserving {\thething} selection module in Section~\ref{sec:theotherthing} and the data utility impact of the latter.
|
||||
Finally, Section~\ref{sec:eval-sum} concludes this chapter by summarizing the main results derived from the experiments.
|
||||
|
||||
\input{evaluation/details}
|
||||
|
@ -5,7 +5,7 @@ With the experiments on the synthetic data sets (Section~\ref{subsec:sel-utl}) w
|
||||
% \kat{is this distance the landmark distance that we saw just before ? clarify }
|
||||
of the time series histograms for various distributions and {\thething} percentages.
|
||||
This allows us to justify our design decisions for our concept that we showcased in Section~\ref{subsec:lmdk-sel-sol}.
|
||||
With the experiments on the real data sets (Section~\ref{subsec:sel-prv}), we show the performance in terms of utility of our three {\thething} mechanisms in combination with the privacy-preserving {\thething} selection mechanism, which enhances the privacy protection that our concept provides.
|
||||
With the experiments on the real data sets (Section~\ref{subsec:sel-prv}), we show the performance in terms of utility of our three {\thething} schemes in combination with the privacy-preserving {\thething} selection module, which enhances the privacy protection that our concept provides.
|
||||
% \kat{Mention whether it improves the original proposal or not.}
|
||||
|
||||
|
||||
@ -32,12 +32,12 @@ Comparing the results of the Euclidean distance in Figure~\ref{fig:sel-dist-norm
|
||||
The maximum difference per {\thething} percentage is approximately $0.2$ for the former and $0.15$ for the latter between the bimodal and skewed {\thething} distributions.
|
||||
Overall, the Euclidean distance achieves a mean normalized distance of $0.3$ while the Wasserstein distance a mean normalized distance that is equal to $0.2$.
|
||||
Therefore, and by observing Figure~\ref{fig:sel-dist}, Wasserstein demonstrates a less consistent performance and less linear behavior among all possible {\thething} distributions.
|
||||
Thus, we choose to utilize the Euclidean distance metric for the implementation of the privacy-preserving {\thething} selection mechanism in Section~\ref{subsec:lmdk-sel-sol}.
|
||||
Thus, we choose to utilize the Euclidean distance metric for the implementation of the privacy-preserving {\thething} selection module in Section~\ref{subsec:lmdk-sel-sol}.
|
||||
|
||||
|
||||
\subsection{Privacy budget tuning}
|
||||
\label{subsec:sel-eps}
|
||||
In Figure~\ref{fig:sel-eps} we test the Uniform mechanism in real data by investing different ratios ($1$\%, $10$\%, $25$\%, and $50$\%) of the available privacy budget $\varepsilon$ in the {\thething} selection mechanism and the remaining to perturbing the original data values, in order to figure out the optimal ratio value.
|
||||
In Figure~\ref{fig:sel-eps} we test the Uniform scheme in real data by investing different ratios ($1$\%, $10$\%, $25$\%, and $50$\%) of the available privacy budget $\varepsilon$ in the {\thething} selection module and the remaining to perturbing the original data values, in order to figure out the optimal ratio value.
|
||||
Uniform is our baseline implementation, and hence allows us to derive more accurate conclusions in this case.
|
||||
In general, we are expecting to observe that greater ratios will result in more accurate, i.e.,~smaller, {\thething} sets and less accurate values in the released data.
|
||||
|
||||
@ -55,18 +55,18 @@ In general, we are expecting to observe that greater ratios will result in more
|
||||
\subcaptionbox{T-drive\label{fig:t-drive-sel-eps}}{%
|
||||
\includegraphics[width=.49\linewidth]{evaluation/t-drive-sel-eps}%
|
||||
}%
|
||||
\caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages. We apply the Uniform {\thething} privacy mechanism and vary the ratio of the privacy budget $\varepsilon$ that we allocate to the {\thething} selection mechanism.}
|
||||
\caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages. We apply the Uniform {\thething} privacy scheme and vary the ratio of the privacy budget $\varepsilon$ that we allocate to the {\thething} selection module.}
|
||||
\label{fig:sel-eps}
|
||||
\end{figure}
|
||||
|
||||
The application of the randomized response mechanism, in the Copenhagen data set (Figure~\ref{fig:copenhagen-sel-eps}), is tolerant to the fluctuations of the privacy budget and maintains a relatively constant performance in terms of data utility.
|
||||
For HUE (Figure~\ref{fig:hue-sel-eps}) and T-drive (Figure~\ref{fig:t-drive-sel-eps}), we observe that our implementation performs better for lower ratios, e.g.,~$0.01$, where we end up allocating the majority of the available privacy budget to the data release process instead of the {\thething} selection mechanism.
|
||||
For HUE (Figure~\ref{fig:hue-sel-eps}) and T-drive (Figure~\ref{fig:t-drive-sel-eps}), we observe that our implementation performs better for lower ratios, e.g.,~$0.01$, where we end up allocating the majority of the available privacy budget to the data release process instead of the {\thething} selection module.
|
||||
The results of this experiment indicate that we can safely allocate the majority of $\varepsilon$ for publishing the data values, and therefore achieve better data utility, while providing more robust privacy protection to the {\thething} set.
|
||||
|
||||
|
||||
\subsection{Budget allocation and {\thething} selection}
|
||||
\label{subsec:sel-prv}
|
||||
Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptive mechanisms (presented in detail in Section~\ref{subsec:lmdk-mechs}) in combination with the {\thething} selection mechanism (Section~\ref{subsec:lmdk-sel-sol}).
|
||||
Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptive schemes (presented in detail in Section~\ref{subsec:lmdk-mechs}) in combination with the {\thething} selection module (Section~\ref{subsec:lmdk-sel-sol}).
|
||||
|
||||
\begin{figure}[htp]
|
||||
\centering
|
||||
@ -83,18 +83,18 @@ Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptiv
|
||||
\includegraphics[width=.49\linewidth]{evaluation/t-drive-sel}%
|
||||
}%
|
||||
\caption{
|
||||
The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data, for different {\thething} percentages, with the incorporation of the privacy-preserving {\thething} selection mechanism.
|
||||
The light short horizontal lines indicate the corresponding measurements from Figure~\ref{fig:real} without the {\thething} selection mechanism.
|
||||
The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data, for different {\thething} percentages from Figure~\ref{fig:real}.
|
||||
The markers indicate the corresponding measurements with the incorporation of the privacy-preserving {\thething} selection module.
|
||||
}
|
||||
\label{fig:real-sel}
|
||||
\end{figure}
|
||||
|
||||
In comparison with the utility performance without the {\thething} selection mechanism (light short horizontal lines), we notice a slight deterioration for all three mechanisms.
|
||||
This is natural since we allocated part of the available privacy budget to the privacy-preserving {\thething} selection mechanism, which in turn increased the number of {\thethings}, except for the case of $100$\% {\thethings}.
|
||||
In comparison with the utility performance without the {\thething} selection module (solid bars), we notice a slight deterioration for all three schemes (markers).
|
||||
This is natural since we allocated part of the available privacy budget to the privacy-preserving {\thething} selection module, which in turn increased the number of {\thethings}, except for the case of $100$\% {\thethings}.
|
||||
Therefore, there is less privacy budget available for data publishing throughout the time series.
|
||||
% for $0$\% and $100$\% {\thethings}.
|
||||
% \kat{why not for the other percentages?}
|
||||
Skip performs best in our experiments with HUE (Figure~\ref{fig:hue-sel}), due to the low range in the energy consumption and the high scale of the Laplace noise that it avoids due to the employed approximation.
|
||||
However, for the Copenhagen data set (Figure~\ref{fig:copenhagen-sel}) and T-drive (Figure~\ref{fig:t-drive-sel}), Skip attains high mean absolute error, which exposes no benefit with respect to user-level protection.
|
||||
Overall, Adaptive has a consistent performance in terms of utility for all of the data sets that we experimented with, and almost always outperforms the user-level privacy protection.
|
||||
Thus, it is selected as the best mechanism to use in general.
|
||||
Thus, it is selected as the best scheme to use in general.
|
||||
|
@ -8,11 +8,11 @@ With the experiments on the real data sets (Section~\ref{subsec:lmdk-expt-bgt}),
|
||||
We define data utility as the mean absolute error introduced by the privacy mechanism.
|
||||
We compare with the event- and user-level differential privacy protection levels, and show that, in the general case, {\thething} privacy allows for better data utility than user-level differential privacy while balancing between the two protection levels.
|
||||
|
||||
With the experiments on the synthetic data sets (Section~\ref{subsec:lmdk-expt-cor}) we show the overall privacy loss,
|
||||
With the experiments on the synthetic data sets (Section~\ref{subsec:lmdk-expt-cor}) we show the temporal privacy loss,
|
||||
% \kat{in the previous set of experiments we were measuring the MAE, now we are measuring the privacy loss... Why is that? Isn't it two sides of the same coin? }
|
||||
i.e.,~the privacy budget $\varepsilon$ with the extra privacy loss because of the temporal correlation, under temporal correlation within our framework when tuning the size and statistical characteristics of the input {\thething} set $L$.
|
||||
% \kat{mention briefly what you observe}
|
||||
We observe that a greater average {\thething}--regular event distance in a time series can result into greater overall privacy loss under moderate and strong temporal correlation.
|
||||
We observe that a greater average {\thething}--regular event distance in a time series can result into greater temporal privacy loss under moderate and strong temporal correlation.
|
||||
|
||||
|
||||
\subsection{Budget allocation schemes}
|
||||
@ -100,9 +100,9 @@ We observe that the uniform and bimodal distributions tend to limit the regular
|
||||
This is due to the fact that the former scatters the {\thethings}, while the latter distributes them on both edges, leaving a shorter space uninterrupted by {\thethings}.
|
||||
% and as a result they reduce the uninterrupted space by landmarks in the sequence.
|
||||
On the contrary, distributing the {\thethings} at one part of the sequence, as in skewed or symmetric, creates a wider space without {\thethings}.
|
||||
This study provides us with different distance settings that we are going to use in the subsequent overall privacy loss study.
|
||||
This study provides us with different distance settings that we are going to use in the subsequent temporal privacy loss study.
|
||||
|
||||
Figure~\ref{fig:dist-cor} illustrates a comparison among the aforementioned distributions regarding the overall privacy loss under (a)~weak, (b)~moderate, and (c)~strong temporal correlation degrees.
|
||||
Figure~\ref{fig:dist-cor} illustrates a comparison among the aforementioned distributions regarding the temporal privacy loss under (a)~weak, (b)~moderate, and (c)~strong temporal correlation degrees.
|
||||
The line shows the overall privacy loss---for all cases of {\thething} distribution---without temporal correlation.
|
||||
|
||||
\begin{figure}[htp]
|
||||
@ -120,9 +120,9 @@ The line shows the overall privacy loss---for all cases of {\thething} distribut
|
||||
\includegraphics[width=.49\linewidth]{evaluation/dist-cor-stg}%
|
||||
}%
|
||||
\caption{
|
||||
The overall privacy loss (privacy budget $\varepsilon$)
|
||||
The temporal privacy loss
|
||||
% \kat{what is the unit for privacy loss? I t should appear on the diagram}
|
||||
% \mk{It's the privacy budget epsilon}
|
||||
% \mk{It's the privacy budget epsilon plus correlations}
|
||||
for different {\thething} percentages and distributions under (a)~weak, (b)~moderate, and (c)~strong degrees of temporal correlation.
|
||||
The line shows the overall privacy loss without temporal correlation.
|
||||
}
|
||||
@ -132,7 +132,7 @@ The line shows the overall privacy loss---for all cases of {\thething} distribut
|
||||
In combination with Figure~\ref{fig:avg-dist}, we conclude that a greater average {\thething}--regular event
|
||||
% \kat{it was even, I changed it to event but do not know what youo want ot say}
|
||||
% \mk{Fixed}
|
||||
distance in a distribution can result into greater overall privacy loss under moderate and strong temporal correlation.
|
||||
distance in a distribution can result into greater temporal privacy loss under moderate and strong temporal correlation.
|
||||
This is due to the fact that the backward/forward privacy loss accumulates more over time in wider spaces without {\thethings} (see Section~\ref{sec:correlation}).
|
||||
Furthermore, the behavior of the privacy loss is as expected regarding the temporal correlation degree: a stronger correlation degree generates higher privacy loss while widening the gap between the different distribution cases.
|
||||
On the contrary, a weaker correlation degree makes it harder to differentiate among the {\thething} distributions.
|
||||
|
Loading…
Reference in New Issue
Block a user