Merge branch 'master' of git.delkappa.com:manos/the-last-thing

This commit is contained in:
Manos Katsomallos 2021-10-12 23:26:41 +02:00
commit 26b9d5e197
2 changed files with 19 additions and 13 deletions

View File

@ -1,10 +1,11 @@
\section{Selection of events}
\section{Selection of landmarks}
\label{sec:eval-lmdk-sel}
In this section, we present the experiments that we performed, to test the methodology that we presented in Section~\ref{subsec:lmdk-sel-sol}, on real and synthetic data sets.
With the experiments on the synthetic data sets (Section~\ref{subsec:sel-utl}) we show the normalized Euclidean and Wasserstein distances of the time series histograms for various distributions and {\thething} percentages.
In this section, we present the experiments on the methodology for the {\thethings} selection presented in Section~\ref{subsec:lmdk-sel-sol}, on the real and the synthetic data sets.
With the experiments on the synthetic data sets (Section~\ref{subsec:sel-utl}) we show the normalized Euclidean and Wasserstein distances \kat{is this distance the landmark distance that we saw just before ? clarify } of the time series histograms for various distributions and {\thething} percentages.
This allows us to justify our design decisions for our concept that we showcased in Section~\ref{subsec:lmdk-sel-sol}.
With the experiments on the real data sets (Section~\ref{subsec:sel-prv}), we show the performance in terms of utility of our three {\thething} mechanisms in combination with the privacy preserving {\thething} selection component.
\kat{Mention whether it improves the original proposal or not.}
\subsection{{\Thething} selection utility metrics}
@ -54,8 +55,10 @@ Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptiv
\end{figure}
In comparison with the utility performance without the {\thething} selection component (Figure~\ref{fig:real}), we notice a slight deterioration for all three models.
This is natural since we allocated part of the available privacy budget to the privacy-preserving {\thething} selection component which in turn increased the number of {\thethings}.
This is natural since we allocated part of the available privacy budget to the privacy-preserving {\thething} selection component, which in turn increased the number of {\thethings}.
Therefore, there is less privacy budget available for data publishing throughout the time series for $0$\% and $100$\% {\thethings}.
Skip performs best in our experiments with HUE, due to the low range in the energy consumption and the high scale of the Laplace noise which it avoids due to its tendency to approximate.
However, for the Copenhagen data set and T-drive it attains greater mean absolute error than the user-level protection scheme.
Overall, Adaptive has a consistent performance in terms of utility for all of the data sets that we experimented with.
\kat{why not for the other percentages?}
Skip performs best in our experiments with HUE, due to the low range in the energy consumption and the high scale of the Laplace noise, which it avoids due to the employed approximation.
However, for the Copenhagen data set and T-drive Skip attains greater mean absolute error than the user-level protection scheme, which exposes no benefit w.r.t. user-level protection.
Overall, Adaptive has a consistent performance in terms of utility for all of the data sets that we experimented with, and always outperforms the user-level privacy.
Thus, it is selected as the best mechanism to use in general.

View File

@ -58,8 +58,11 @@ Moreover, designing a data-dependent sampling scheme \kat{what would be the main
\subsection{Temporal distance and correlation}
\label{subsec:lmdk-expt-cor}
Figure~\ref{fig:avg-dist} shows a comparison of the average temporal distance of the events from the previous/next {\thething} or the start/end of the time series for various distributions in synthetic data.
More particularly, we count for every event the total number of events between itself and the nearest {\thething} or the series edge.
As previously mentioned, temporal correlations are inherent in continuous publishing, and they are the cause of supplementary privacy leakage in the case of privacy preserving data publication.
In this section, we are interested in studying the effect that the distance of the {\thethings} from every event have on the leakage caused by temporal correlations.
Figure~\ref{fig:avg-dist} shows a comparison of the average temporal distance of the events from the previous/next {\thething} or the start/end of the time series for various distributions in our synthetic data.
More specifically, we model the distance of an event as the count of the total number of events between itself and the nearest {\thething} or the series edge.
\begin{figure}[htp]
\centering
@ -72,6 +75,7 @@ We observe that the uniform and bimodal distributions tend to limit the regular
This is due to the fact that the former scatters the {\thethings}, while the latter distributes them on both edges, leaving a shorter space uninterrupted by {\thethings}.
% and as a result they reduce the uninterrupted space by landmarks in the sequence.
On the contrary, distributing the {\thethings} at one part of the sequence, as in skewed or symmetric, creates a wider space without {\thethings}.
This study provides us with different distance settings that we are going to use in the subsequent temporal leakage study.
Figure~\ref{fig:dist-cor} illustrates a comparison among the aforementioned distributions regarding the overall privacy loss under (a)~weak, (b)~moderate, and (c)~strong temporal correlation degrees.
The line shows the overall privacy loss---for all cases of {\thethings} distribution---without temporal correlation.
@ -93,9 +97,8 @@ The line shows the overall privacy loss---for all cases of {\thethings} distribu
\label{fig:dist-cor}
\end{figure}
In combination with Figure~\ref{fig:avg-dist}, we conclude that a greater average event--{\thething} even distance in a distribution can result into greater overall privacy loss under moderate and strong temporal correlation.
In combination with Figure~\ref{fig:avg-dist}, we conclude that a greater average event--{\thething} event \kat{it was even, I changed it to event but do not know what youo want ot say} distance in a distribution can result into greater overall privacy loss under moderate and strong temporal correlation.
This is due to the fact that the backward/forward privacy loss accumulates more over time in wider spaces without {\thethings} (see Section~\ref{sec:correlation}).
Furthermore, the behavior of the privacy loss is as expected regarding the temporal correlation degree.
Predictably, a stronger correlation degree generates higher privacy loss while widening the gap between the different distribution cases.
Furthermore, the behavior of the privacy loss is as expected regarding the temporal correlation degree: a stronger correlation degree generates higher privacy loss while widening the gap between the different distribution cases.
On the contrary, a weaker correlation degree makes it harder to differentiate among the {\thethings} distributions.
The privacy loss under a weak correlation degree converge.
The privacy loss under a weak correlation degree converge \kat{with what?}.