diff --git a/text/evaluation/details.tex b/text/evaluation/details.tex index 817abfa..7ca5414 100644 --- a/text/evaluation/details.tex +++ b/text/evaluation/details.tex @@ -8,7 +8,7 @@ In this section we list all the relevant details regarding the evaluation settin \label{subsec:eval-setup} We implemented our experiments\footnote{Code available at \url{https://git.delkappa.com/manos/the-last-thing}} in Python $3$.$9$.$7$ and executed them on a machine with an Intel i$7$-$6700$HQ at $3$.$5$GHz CPU and $16$GB RAM, running Manjaro Linux $21$.$1$.$5$. -We repeated each experiment $100$ times and we report the mean over these iterations. \kat{It could be interested to report also on the diagrams the std} +We repeated each experiment $100$ times and we report the mean over these iterations. \kat{It could be interesting to report also on the diagrams the std} \subsection{Data sets} diff --git a/text/evaluation/thething.tex b/text/evaluation/thething.tex index 4cae89d..139e829 100644 --- a/text/evaluation/thething.tex +++ b/text/evaluation/thething.tex @@ -1,34 +1,23 @@ -\section{Significant events} +\section{Landmark events} \label{sec:eval-lmdk} % \kat{After discussing with Dimitris, I thought you are keeping one chapter for the proposals of the thesis. In this case, it would be more clean to keep the theoretical contributions in one chapter and the evaluation in a separate chapter. } % \mk{OK.} In this section, we present the experiments that we performed, to test the methodology that we presented in Section~\ref{subsec:lmdk-sol}, on real and synthetic data sets. -With the experiments on the real data sets (Section~\ref{subsec:lmdk-expt-bgt}), we show the performance in terms of utility of our three {\thething} mechanisms. -With the experiments on the synthetic data sets (Section~\ref{subsec:lmdk-expt-cor}) we show the privacy loss by our framework when tuning the size and statistical characteristics of the input {\thething} set $L$ with special emphasis on how the privacy loss under temporal correlation is affected by the number and distribution of the {\thethings}. + +With the experiments on the real data sets (Section~\ref{subsec:lmdk-expt-bgt}), we show the performance in terms of data utility of our three {\thething} privacy budget allocation schemes: Skip, Uniform and Adaptive. +We define data utility as the Mean Absolute Error introduced by the privacy mechanism. +We compare with the event and user differential privacy, and show that in the general case, {\thething} privacy allows for better data utility than user differential privacy. + +With the experiments on the synthetic data sets (Section~\ref{subsec:lmdk-expt-cor}) we show the privacy loss \kat{in the previous set of experiments we were measuring the MAE, now we are measuring the privacy loss... Why is that? Isn't it two sides of the same coin? }by our framework when tuning the size and statistical characteristics of the input {\thething} set $L$ with special emphasis on how the privacy loss under temporal correlation is affected by the number and distribution of the {\thethings}. +\kat{mention briefly what you observe} + \subsection{Budget allocation schemes} \label{subsec:lmdk-expt-bgt} -Figure~\ref{fig:real} exhibits the performance of the three mechanisms: Skip, Uniform, and Adaptive. - -\begin{figure}[htp] - \centering - \subcaptionbox{Copenhagen\label{fig:copenhagen}}{% - \includegraphics[width=.5\linewidth]{evaluation/copenhagen}% - }% - \hspace{\fill} - \subcaptionbox{HUE\label{fig:hue}}{% - \includegraphics[width=.5\linewidth]{evaluation/hue}% - }% - \subcaptionbox{T-drive\label{fig:t-drive}}{% - \includegraphics[width=.5\linewidth]{evaluation/t-drive}% - }% - \caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages.} - \label{fig:real} -\end{figure} - +Figure~\ref{fig:real} exhibits the performance of the three mechanisms: Skip, Uniform, and Adaptive, for the three data sets that we study. % For the Geolife data set (Figure~\ref{fig:geolife}), Skip has the best performance (measured in Mean Absolute Error, in meters) because it invests the most budget overall at every regular event, by approximating the {\thething} data based on previous releases. % Due to the data set's high density (every $1$--$5$ seconds or every $5$--$10$ meters per point) approximating constantly has a low impact on the data utility. % On the contrary, the lower density of the T-drive data set (Figure~\ref{fig:t-drive}) has a negative impact on the performance of Skip. @@ -40,6 +29,21 @@ In general, a scheme that favors approximation over noise injection would achiev However, the Adaptive model performs by far better than Uniform and strikes a nice balance between event- and user-level protection for all {\thething} percentages. In the T-drive data set (Figure~\ref{fig:t-drive}), the Adaptive mechanism outperforms Uniform by $10$\%--$20$\% for all {\thething} percentages greater than $40$\% and Skip by more than $20$\%. The lower density (average distance of $623$m) of the T-drive data set has a negative impact on the performance of Skip. +\begin{figure}[htp] + \centering + \subcaptionbox{Copenhagen\label{fig:copenhagen}}{% + \includegraphics[width=.5\linewidth]{evaluation/copenhagen}% + }% + \hspace{\fill} + \subcaptionbox{HUE\label{fig:hue}}{% + \includegraphics[width=.5\linewidth]{evaluation/hue}% + }% + \subcaptionbox{T-drive\label{fig:t-drive}}{% + \includegraphics[width=.5\linewidth]{evaluation/t-drive}% + }% + \caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages.} + \label{fig:real} +\end{figure} In general, we can claim that the Adaptive is the most reliable and best performing mechanism with minimal tuning, if we take into consideration the drawbacks of the Skip mechanism mentioned in Section~\ref{subsec:lmdk-mechs}. Moreover, designing a data-dependent sampling scheme would possibly result in better results for Adaptive. @@ -78,7 +82,7 @@ The line shows the overall privacy loss---for all cases of {\thethings} distribu \subcaptionbox{Strong correlation\label{fig:dist-cor-stg}}{% \includegraphics[width=.5\linewidth]{evaluation/dist-cor-stg}% }% - \caption{Privacy loss for different {\thethings} percentages and distributions, under (a)~weak, (b)~moderate, and (c)~strong degrees of temporal correlation. + \caption{Privacy loss \kat{what is the unit for privacy loss? I t should appear on the diagram} for different {\thethings} percentages and distributions under (a)~weak, (b)~moderate, and (c)~strong degrees of temporal correlation. The line shows the overall privacy loss without temporal correlation.} \label{fig:dist-cor} \end{figure}