the-last-thing/text/evaluation/thething.tex

\section{Significant events}
\label{sec:lmdk-eval}

% \kat{After discussing with Dimitris, I thought you are keeping one chapter for the proposals of the thesis. In this case, it would be more clean to keep the theoretical contributions in one chapter and the evaluation in a separate chapter. }
% \mk{OK.}
In this section we present the experiments that we performed on real and synthetic data sets. 
With the experiments on the synthetic data sets we show the privacy loss by our framework when tuning the size and statistical characteristics of the input {\thething} set $L$.
We also show how the privacy loss under temporal correlation is affected by the number and distribution of the {\thethings}. 
With the experiments on the real data sets, we show the performance in terms of utility of our three {\thething} mechanisms.

Notice that in our experiments, in the cases when we have $0\%$ and $100\%$ of the events being {\thethings}, we get the same behavior as in event- and user-level privacy respectively.
This happens due the fact that at each timestamp we take into account only the data items at the current timestamp and ignore the rest of the time series (event-level) when there are no {\thethings}.
Whereas, when each timestamp corresponds to a {\thething} we consider and protect all the events throughout the entire series (user-level).


\subsection{Experiments}

\paragraph{Budget allocation schemes}

Figure~\ref{fig:real} exhibits the performance of the three mechanisms: Skip, Uniform, and Adaptive.

\begin{figure}[htp]
  \centering
  \subcaptionbox{Geolife\label{fig:geolife}}{%
    \includegraphics[width=.5\linewidth]{geolife}%
  }%
  \subcaptionbox{T-drive\label{fig:t-drive}}{%
    \includegraphics[width=.5\linewidth]{t-drive}%
  }%
  \caption{The mean absolute error (in meters) of the released data for different {\thethings} percentages.}
  \label{fig:real}
\end{figure}

For the Geolife data set (Figure~\ref{fig:geolife}), Skip has the best performance (measured in Mean Absolute Error, in meters) because it invests the most budget overall at every regular event, by approximating the {\thething} data based on previous releases.
Due to the data set's high density (every $1$--$5$ seconds or every $5$--$10$ meters per point) approximating constantly has a low impact on the data utility.
On the contrary, the lower density of the T-drive data set (Figure~\ref{fig:t-drive}) has a negative impact on the performance of Skip.
In the T-drive data set, the Adaptive mechanism outperforms the Uniform one by $10$\%--$20$\% for all {\thethings} percentages greater than $0$ and by more than $20$\% the Skip one.
In general, we can claim that the Adaptive is the best performing mechanism, if we take into consideration the drawbacks of the Skip mechanism mentioned in Section~\ref{subsec:lmdk-mechs}. Moreover, designing a data-dependent sampling scheme would possibly result in better results for Adaptive.


\paragraph{Temporal distance and correlation}
Figure~\ref{fig:avg-dist} shows a comparison of the average temporal distance of the events from the previous/next {\thething} or the start/end of the time series for various distributions in synthetic data.
More particularly, we count for every event the total number of events between itself and the nearest {\thething} or the series edge.
We observe that the uniform and bimodal distributions tend to limit the regular event--{\thething} distance.
This is due to the fact that the former scatters the {\thethings}, while the latter distributes them on both edges, leaving a shorter space uninterrupted by {\thethings}.
% and as a result they reduce the uninterrupted space by landmarks in the sequence.
On the contrary, distributing the {\thethings} at one part of the sequence, as in skewed or symmetric, creates a wider space without {\thethings}.

\begin{figure}[htp]
  \centering
  \includegraphics[width=.5\linewidth]{avg-dist}%
  \caption{Average temporal distance of the events from the {\thethings} for different {\thethings} percentages within a time series in various {\thethings} distributions.}
  \label{fig:avg-dist}
\end{figure}

Figure~\ref{fig:dist-cor} illustrates a comparison among the aforementioned distributions regarding the overall privacy loss under moderate (Figure~\ref{fig:dist-cor-mod}), and strong (Figure~\ref{fig:dist-cor-stg}) correlation degrees.
The line shows the overall privacy loss---for all cases of {\thethings} distribution---without temporal correlation.
We skip the presentation of the results under a weak correlation degree, since they converge in this case.
In combination with Figure~\ref{fig:avg-dist}, we conclude that a greater average event-{\thething} distance  in a distribution can result into greater overall privacy loss under moderate and strong temporal correlation.
This is due to the fact that the backward/forward privacy loss accumulates more over time in wider spaces without {\thethings} (see Section~\ref{subsec:correlations}).
Furthermore, the behavior of the privacy loss is as expected regarding the temporal correlation degree.
Predictably, a stronger correlation degree generates higher privacy loss while widening the gap between the different distribution cases.
On the contrary, a weaker correlation degree makes it harder to differentiate among the {\thethings} distributions.

\begin{figure}[htp]
  \centering
  \subcaptionbox{Weak correlation\label{fig:dist-cor-wk}}{%
    \includegraphics[width=.5\linewidth]{dist-cor-wk}%
  }%
  \hspace{\fill}
  \subcaptionbox{Moderate correlation\label{fig:dist-cor-mod}}{%
    \includegraphics[width=.5\linewidth]{dist-cor-mod}%
  }%
  \subcaptionbox{Strong correlation\label{fig:dist-cor-stg}}{%
    \includegraphics[width=.5\linewidth]{dist-cor-stg}%
  }%
  \caption{Privacy loss for different {\thethings} percentages and distributions, under weak, moderate, and strong degrees of temporal correlation.
  The line shows the overall privacy loss without temporal correlation.}
  \label{fig:dist-cor}
\end{figure}
Chapters and sections 2021-09-07 16:06:42 +02:00			`\section{Significant events}`
the-thing: Labels 2021-07-19 11:11:51 +02:00			`\label{sec:lmdk-eval}`
Structure 2021-07-18 17:31:05 +02:00
Chapters and sections 2021-09-07 16:06:42 +02:00			`% \kat{After discussing with Dimitris, I thought you are keeping one chapter for the proposals of the thesis. In this case, it would be more clean to keep the theoretical contributions in one chapter and the evaluation in a separate chapter. }`
			`% \mk{OK.}`
Structure 2021-07-18 17:31:05 +02:00			`In this section we present the experiments that we performed on real and synthetic data sets.`
			`With the experiments on the synthetic data sets we show the privacy loss by our framework when tuning the size and statistical characteristics of the input {\thething} set $L$.`
			`We also show how the privacy loss under temporal correlation is affected by the number and distribution of the {\thethings}.`
			`With the experiments on the real data sets, we show the performance in terms of utility of our three {\thething} mechanisms.`

			`Notice that in our experiments, in the cases when we have $0\%$ and $100\%$ of the events being {\thethings}, we get the same behavior as in event- and user-level privacy respectively.`
			`This happens due the fact that at each timestamp we take into account only the data items at the current timestamp and ignore the rest of the time series (event-level) when there are no {\thethings}.`
			`Whereas, when each timestamp corresponds to a {\thething} we consider and protect all the events throughout the entire series (user-level).`


			`\subsection{Experiments}`

			`\paragraph{Budget allocation schemes}`

			`Figure~\ref{fig:real} exhibits the performance of the three mechanisms: Skip, Uniform, and Adaptive.`

			`\begin{figure}[htp]`
			`\centering`
			`\subcaptionbox{Geolife\label{fig:geolife}}{%`
			`\includegraphics[width=.5\linewidth]{geolife}%`
			`}%`
			`\subcaptionbox{T-drive\label{fig:t-drive}}{%`
			`\includegraphics[width=.5\linewidth]{t-drive}%`
			`}%`
			`\caption{The mean absolute error (in meters) of the released data for different {\thethings} percentages.}`
			`\label{fig:real}`
			`\end{figure}`

			`For the Geolife data set (Figure~\ref{fig:geolife}), Skip has the best performance (measured in Mean Absolute Error, in meters) because it invests the most budget overall at every regular event, by approximating the {\thething} data based on previous releases.`
			`Due to the data set's high density (every $1$--$5$ seconds or every $5$--$10$ meters per point) approximating constantly has a low impact on the data utility.`
			`On the contrary, the lower density of the T-drive data set (Figure~\ref{fig:t-drive}) has a negative impact on the performance of Skip.`
			`In the T-drive data set, the Adaptive mechanism outperforms the Uniform one by $10$\%--$20$\% for all {\thethings} percentages greater than $0$ and by more than $20$\% the Skip one.`
			`In general, we can claim that the Adaptive is the best performing mechanism, if we take into consideration the drawbacks of the Skip mechanism mentioned in Section~\ref{subsec:lmdk-mechs}. Moreover, designing a data-dependent sampling scheme would possibly result in better results for Adaptive.`


			`\paragraph{Temporal distance and correlation}`
			`Figure~\ref{fig:avg-dist} shows a comparison of the average temporal distance of the events from the previous/next {\thething} or the start/end of the time series for various distributions in synthetic data.`
			`More particularly, we count for every event the total number of events between itself and the nearest {\thething} or the series edge.`
			`We observe that the uniform and bimodal distributions tend to limit the regular event--{\thething} distance.`
			`This is due to the fact that the former scatters the {\thethings}, while the latter distributes them on both edges, leaving a shorter space uninterrupted by {\thethings}.`
			`% and as a result they reduce the uninterrupted space by landmarks in the sequence.`
			`On the contrary, distributing the {\thethings} at one part of the sequence, as in skewed or symmetric, creates a wider space without {\thethings}.`

			`\begin{figure}[htp]`
			`\centering`
			`\includegraphics[width=.5\linewidth]{avg-dist}%`
			`\caption{Average temporal distance of the events from the {\thethings} for different {\thethings} percentages within a time series in various {\thethings} distributions.}`
			`\label{fig:avg-dist}`
			`\end{figure}`

			`Figure~\ref{fig:dist-cor} illustrates a comparison among the aforementioned distributions regarding the overall privacy loss under moderate (Figure~\ref{fig:dist-cor-mod}), and strong (Figure~\ref{fig:dist-cor-stg}) correlation degrees.`
			`The line shows the overall privacy loss---for all cases of {\thethings} distribution---without temporal correlation.`
			`We skip the presentation of the results under a weak correlation degree, since they converge in this case.`
			`In combination with Figure~\ref{fig:avg-dist}, we conclude that a greater average event-{\thething} distance in a distribution can result into greater overall privacy loss under moderate and strong temporal correlation.`
			`This is due to the fact that the backward/forward privacy loss accumulates more over time in wider spaces without {\thethings} (see Section~\ref{subsec:correlations}).`
			`Furthermore, the behavior of the privacy loss is as expected regarding the temporal correlation degree.`
			`Predictably, a stronger correlation degree generates higher privacy loss while widening the gap between the different distribution cases.`
			`On the contrary, a weaker correlation degree makes it harder to differentiate among the {\thethings} distributions.`

			`\begin{figure}[htp]`
			`\centering`
			`\subcaptionbox{Weak correlation\label{fig:dist-cor-wk}}{%`
			`\includegraphics[width=.5\linewidth]{dist-cor-wk}%`
			`}%`
			`\hspace{\fill}`
			`\subcaptionbox{Moderate correlation\label{fig:dist-cor-mod}}{%`
			`\includegraphics[width=.5\linewidth]{dist-cor-mod}%`
			`}%`
			`\subcaptionbox{Strong correlation\label{fig:dist-cor-stg}}{%`
			`\includegraphics[width=.5\linewidth]{dist-cor-stg}%`
			`}%`
			`\caption{Privacy loss for different {\thethings} percentages and distributions, under weak, moderate, and strong degrees of temporal correlation.`
			`The line shows the overall privacy loss without temporal correlation.}`
			`\label{fig:dist-cor}`
			`\end{figure}`