the-last-thing/text/evaluation/theotherthing.tex

\section{Selection of {\thethings}}
\label{sec:eval-lmdk-sel}
In this section, we present the experiments on the methodology for the {\thething} selection presented in Section~\ref{subsec:lmdk-sel-sol}, on the real and synthetic data sets.
With the experiments on the synthetic data sets (Section~\ref{subsec:sel-utl}) we show the normalized Euclidean and Wasserstein distance metrics (not to be confused with the temporal distances in Figure~\ref{fig:avg-dist})
% \kat{is this distance   the landmark distance that we saw just before ?   clarify } 
of the time series histograms for various distributions and {\thething} percentages.
This allows us to justify our design decisions for our concept that we showcased in Section~\ref{subsec:lmdk-sel-sol}.
With the experiments on the real data sets (Section~\ref{subsec:sel-prv}), we show the performance in terms of utility of our three {\thething} schemes in combination with the privacy-preserving {\thething} selection module, which enhances the privacy protection that our concept provides.
% \kat{Mention whether it improves the original proposal or not.}


\subsection{{\Thething} selection utility metrics}
\label{subsec:sel-utl}
Figure~\ref{fig:sel-dist} demonstrates the normalized distance that we obtain when we utilize either (a)~the Euclidean or (b)~the Wasserstein distance metric to obtain a set of {\thethings} including regular events.

\begin{figure}[htp]
  \centering
  \subcaptionbox{Euclidean\label{fig:sel-dist-norm}}{%
    \includegraphics[width=.49\linewidth]{evaluation/sel-dist-norm}%
  }%
  \hfill
  \subcaptionbox{Wasserstein\label{fig:sel-dist-emd}}{%
    \includegraphics[width=.49\linewidth]{evaluation/sel-dist-emd}%
  }%
  \caption{The normalized (a)~Euclidean, and (b)~Wasserstein distance of the generated {\thething} sets for different {\thething} percentages.}
  \label{fig:sel-dist}
\end{figure}

Comparing the results of the Euclidean distance in Figure~\ref{fig:sel-dist-norm} with those of the Wasserstein in Figure~\ref{fig:sel-dist-emd} we conclude that the Euclidean distance provides more consistent results for all possible distributions.
% (1 + (0.25 + 0.25 + 0.45 + 0.45)/4 + (0.25 + 0.25 + 0.3 + 0.3)/4 + (0.2 + 0.2 + 0.2 + 0.2)/4 + (0.15 + 0.15 + 0.15 + 0.15)/4)/6
% (1 + (0.1 + 0.1 + 0.25 + 0.25)/4 + (0.075 + 0.075 + .15 + 0.15)/4 + (0.075 + 0.075 + 0.1 + 0.1)/4 + (0.025 + 0.025 + 0.025 + 0.025)/4)/6
The maximum difference per {\thething} percentage is approximately $0.2$ for the former and $0.15$ for the latter between the bimodal and skewed {\thething} distributions.
Overall, the Euclidean distance achieves a mean normalized distance of $0.3$ while the Wasserstein distance a mean normalized distance that is equal to $0.2$.
Therefore, and by observing Figure~\ref{fig:sel-dist}, Wasserstein demonstrates a less consistent performance and less linear behavior among all possible {\thething} distributions.
Thus, we choose to utilize the Euclidean distance metric for the implementation of the privacy-preserving {\thething} selection module in Section~\ref{subsec:lmdk-sel-sol}.


\subsection{Privacy budget tuning}
\label{subsec:sel-eps}
In Figure~\ref{fig:sel-eps} we test the Uniform scheme in real data by investing different ratios ($1$\%, $10$\%, $25$\%, and $50$\%) of the available privacy budget $\varepsilon$ in the {\thething} selection module and the remaining to perturbing the original data values, in order to figure out the optimal ratio value.
Uniform is our baseline implementation, and hence allows us to derive more accurate conclusions in this case.
In general, we are expecting to observe that greater ratios will result in more accurate, i.e.,~smaller, {\thething} sets and less accurate values in the released data.

\begin{figure}[htp]
  \centering
  \subcaptionbox{Copenhagen\label{fig:copenhagen-sel-eps}}{%
    \includegraphics[width=.49\linewidth]{evaluation/copenhagen-sel-eps}%
  }%
  \hspace{\fill}
  \\ \bigskip
  \subcaptionbox{HUE\label{fig:hue-sel-eps}}{%
    \includegraphics[width=.49\linewidth]{evaluation/hue-sel-eps}%
  }%
  \hfill
  \subcaptionbox{T-drive\label{fig:t-drive-sel-eps}}{%
    \includegraphics[width=.49\linewidth]{evaluation/t-drive-sel-eps}%
  }%
  \caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages. We apply the Uniform {\thething} privacy scheme and vary the ratio of the privacy budget $\varepsilon$ that we allocate to the {\thething} selection module.}
  \label{fig:sel-eps}
\end{figure}

The application of the randomized response mechanism, in the Copenhagen data set (Figure~\ref{fig:copenhagen-sel-eps}), is tolerant to the fluctuations of the privacy budget and maintains a relatively constant performance in terms of data utility.
For HUE (Figure~\ref{fig:hue-sel-eps}) and T-drive (Figure~\ref{fig:t-drive-sel-eps}), we observe that our implementation performs better for lower ratios, e.g.,~$0.01$, where we end up allocating the majority of the available privacy budget to the data release process instead of the {\thething} selection module.
The results of this experiment indicate that we can safely allocate the majority of $\varepsilon$ for publishing the data values, and therefore achieve better data utility, while providing more robust privacy protection to the {\thething} set.


\subsection{Budget allocation and {\thething} selection}
\label{subsec:sel-prv}
Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptive schemes (presented in detail in Section~\ref{subsec:lmdk-mechs}) in combination with the {\thething} selection module (Section~\ref{subsec:lmdk-sel-sol}).

\begin{figure}[htp]
  \centering
  \subcaptionbox{Copenhagen\label{fig:copenhagen-sel}}{%
    \includegraphics[width=.49\linewidth]{evaluation/copenhagen-sel}%
  }%
  \hfill
  \\ \bigskip
  \subcaptionbox{HUE\label{fig:hue-sel}}{%
    \includegraphics[width=.49\linewidth]{evaluation/hue-sel}%
  }%
  \hfill
  \subcaptionbox{T-drive\label{fig:t-drive-sel}}{%
    \includegraphics[width=.49\linewidth]{evaluation/t-drive-sel}%
  }%
  \caption{
    The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data, for different {\thething} percentages from Figure~\ref{fig:real}.
    The markers indicate the corresponding measurements with the incorporation of the privacy-preserving {\thething} selection module.
  }
  \label{fig:real-sel}
\end{figure}

In comparison with the utility performance without the {\thething} selection module (solid bars), we notice a slight deterioration for all three schemes (markers).
This is natural since we allocated part of the available privacy budget to the privacy-preserving {\thething} selection module, which in turn increased the number of {\thethings}, except for the case of $100$\% {\thethings}.
Therefore, there is less privacy budget available for data publishing throughout the time series. 
% for $0$\% and $100$\% {\thethings}.
% \kat{why not for the other percentages?}
Skip performs best in our experiments with HUE (Figure~\ref{fig:hue-sel}), due to the low range in the energy consumption and the high scale of the Laplace noise that it avoids due to the employed approximation.
However, for the Copenhagen data set (Figure~\ref{fig:copenhagen-sel}) and T-drive (Figure~\ref{fig:t-drive-sel}), Skip attains high mean absolute error, which exposes no benefit with respect to user-level protection.
Overall, Adaptive has a consistent performance in terms of utility for all of the data sets that we experimented with, and almost always outperforms the user-level privacy protection.
Thus, it is selected as the best scheme to use in general.
evaluation: Minor corrections 2021-10-14 17:17:31 +02:00			`\section{Selection of {\thethings}}`
evaluation: Minor corrections 2021-10-11 11:08:03 +02:00			`\label{sec:eval-lmdk-sel}`
evaluation: Final review 2021-10-14 14:30:35 +02:00			`In this section, we present the experiments on the methodology for the {\thething} selection presented in Section~\ref{subsec:lmdk-sel-sol}, on the real and synthetic data sets.`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`With the experiments on the synthetic data sets (Section~\ref{subsec:sel-utl}) we show the normalized Euclidean and Wasserstein distance metrics (not to be confused with the temporal distances in Figure~\ref{fig:avg-dist})`
			`% \kat{is this distance the landmark distance that we saw just before ? clarify }`
			`of the time series histograms for various distributions and {\thething} percentages.`
evaluation: Reviewed theotherthing 2021-10-11 09:52:06 +02:00			`This allows us to justify our design decisions for our concept that we showcased in Section~\ref{subsec:lmdk-sel-sol}.`
evaluation: Updated shapes and minor corrections 2021-10-15 15:35:29 +02:00			`With the experiments on the real data sets (Section~\ref{subsec:sel-prv}), we show the performance in terms of utility of our three {\thething} schemes in combination with the privacy-preserving {\thething} selection module, which enhances the privacy protection that our concept provides.`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`% \kat{Mention whether it improves the original proposal or not.}`
evaluation: Minor corrections and text 2021-10-11 04:01:08 +02:00

			`\subsection{{\Thething} selection utility metrics}`
			`\label{subsec:sel-utl}`
			`Figure~\ref{fig:sel-dist} demonstrates the normalized distance that we obtain when we utilize either (a)~the Euclidean or (b)~the Wasserstein distance metric to obtain a set of {\thethings} including regular events.`

			`\begin{figure}[htp]`
			`\centering`
			`\subcaptionbox{Euclidean\label{fig:sel-dist-norm}}{%`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`\includegraphics[width=.49\linewidth]{evaluation/sel-dist-norm}%`
evaluation: Minor corrections and text 2021-10-11 04:01:08 +02:00			`}%`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`\hfill`
evaluation: Minor corrections and text 2021-10-11 04:01:08 +02:00			`\subcaptionbox{Wasserstein\label{fig:sel-dist-emd}}{%`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`\includegraphics[width=.49\linewidth]{evaluation/sel-dist-emd}%`
evaluation: Minor corrections and text 2021-10-11 04:01:08 +02:00			`}%`
			`\caption{The normalized (a)~Euclidean, and (b)~Wasserstein distance of the generated {\thething} sets for different {\thething} percentages.}`
			`\label{fig:sel-dist}`
			`\end{figure}`

			`Comparing the results of the Euclidean distance in Figure~\ref{fig:sel-dist-norm} with those of the Wasserstein in Figure~\ref{fig:sel-dist-emd} we conclude that the Euclidean distance provides more consistent results for all possible distributions.`
evaluation: Updated sel-utl 2021-10-13 09:34:10 +02:00			`% (1 + (0.25 + 0.25 + 0.45 + 0.45)/4 + (0.25 + 0.25 + 0.3 + 0.3)/4 + (0.2 + 0.2 + 0.2 + 0.2)/4 + (0.15 + 0.15 + 0.15 + 0.15)/4)/6`
			`% (1 + (0.1 + 0.1 + 0.25 + 0.25)/4 + (0.075 + 0.075 + .15 + 0.15)/4 + (0.075 + 0.075 + 0.1 + 0.1)/4 + (0.025 + 0.025 + 0.025 + 0.025)/4)/6`
			`The maximum difference per {\thething} percentage is approximately $0.2$ for the former and $0.15$ for the latter between the bimodal and skewed {\thething} distributions.`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`Overall, the Euclidean distance achieves a mean normalized distance of $0.3$ while the Wasserstein distance a mean normalized distance that is equal to $0.2$.`
			`Therefore, and by observing Figure~\ref{fig:sel-dist}, Wasserstein demonstrates a less consistent performance and less linear behavior among all possible {\thething} distributions.`
evaluation: Updated shapes and minor corrections 2021-10-15 15:35:29 +02:00			`Thus, we choose to utilize the Euclidean distance metric for the implementation of the privacy-preserving {\thething} selection module in Section~\ref{subsec:lmdk-sel-sol}.`
evaluation: Minor corrections and text 2021-10-11 04:01:08 +02:00

evaluation: Added sel-eps 2021-10-13 09:00:23 +02:00			`\subsection{Privacy budget tuning}`
			`\label{subsec:sel-eps}`
evaluation: Updated shapes and minor corrections 2021-10-15 15:35:29 +02:00			`In Figure~\ref{fig:sel-eps} we test the Uniform scheme in real data by investing different ratios ($1$\%, $10$\%, $25$\%, and $50$\%) of the available privacy budget $\varepsilon$ in the {\thething} selection module and the remaining to perturbing the original data values, in order to figure out the optimal ratio value.`
evaluation: Added sel-eps 2021-10-13 09:00:23 +02:00			`Uniform is our baseline implementation, and hence allows us to derive more accurate conclusions in this case.`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`In general, we are expecting to observe that greater ratios will result in more accurate, i.e.,~smaller, {\thething} sets and less accurate values in the released data.`
evaluation: Added sel-eps 2021-10-13 09:00:23 +02:00
			`\begin{figure}[htp]`
			`\centering`
			`\subcaptionbox{Copenhagen\label{fig:copenhagen-sel-eps}}{%`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`\includegraphics[width=.49\linewidth]{evaluation/copenhagen-sel-eps}%`
evaluation: Added sel-eps 2021-10-13 09:00:23 +02:00			`}%`
			`\hspace{\fill}`
			`\\ \bigskip`
			`\subcaptionbox{HUE\label{fig:hue-sel-eps}}{%`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`\includegraphics[width=.49\linewidth]{evaluation/hue-sel-eps}%`
evaluation: Added sel-eps 2021-10-13 09:00:23 +02:00			`}%`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`\hfill`
evaluation: Added sel-eps 2021-10-13 09:00:23 +02:00			`\subcaptionbox{T-drive\label{fig:t-drive-sel-eps}}{%`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`\includegraphics[width=.49\linewidth]{evaluation/t-drive-sel-eps}%`
evaluation: Added sel-eps 2021-10-13 09:00:23 +02:00			`}%`
evaluation: Updated shapes and minor corrections 2021-10-15 15:35:29 +02:00			`\caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages. We apply the Uniform {\thething} privacy scheme and vary the ratio of the privacy budget $\varepsilon$ that we allocate to the {\thething} selection module.}`
evaluation: Added sel-eps 2021-10-13 09:00:23 +02:00			`\label{fig:sel-eps}`
			`\end{figure}`

evaluation: Final review 2021-10-14 14:30:35 +02:00			`The application of the randomized response mechanism, in the Copenhagen data set (Figure~\ref{fig:copenhagen-sel-eps}), is tolerant to the fluctuations of the privacy budget and maintains a relatively constant performance in terms of data utility.`
evaluation: Updated shapes and minor corrections 2021-10-15 15:35:29 +02:00			`For HUE (Figure~\ref{fig:hue-sel-eps}) and T-drive (Figure~\ref{fig:t-drive-sel-eps}), we observe that our implementation performs better for lower ratios, e.g.,~$0.01$, where we end up allocating the majority of the available privacy budget to the data release process instead of the {\thething} selection module.`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`The results of this experiment indicate that we can safely allocate the majority of $\varepsilon$ for publishing the data values, and therefore achieve better data utility, while providing more robust privacy protection to the {\thething} set.`
evaluation: Added sel-eps 2021-10-13 09:00:23 +02:00

evaluation: Minor corrections and text 2021-10-11 04:01:08 +02:00			`\subsection{Budget allocation and {\thething} selection}`
			`\label{subsec:sel-prv}`
evaluation: Updated shapes and minor corrections 2021-10-15 15:35:29 +02:00			`Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptive schemes (presented in detail in Section~\ref{subsec:lmdk-mechs}) in combination with the {\thething} selection module (Section~\ref{subsec:lmdk-sel-sol}).`
ecaluation: Added real-sel 2021-10-11 01:13:17 +02:00
			`\begin{figure}[htp]`
			`\centering`
			`\subcaptionbox{Copenhagen\label{fig:copenhagen-sel}}{%`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`\includegraphics[width=.49\linewidth]{evaluation/copenhagen-sel}%`
ecaluation: Added real-sel 2021-10-11 01:13:17 +02:00			`}%`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`\hfill`
evaluation: Added sel-eps 2021-10-13 09:00:23 +02:00			`\\ \bigskip`
ecaluation: Added real-sel 2021-10-11 01:13:17 +02:00			`\subcaptionbox{HUE\label{fig:hue-sel}}{%`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`\includegraphics[width=.49\linewidth]{evaluation/hue-sel}%`
ecaluation: Added real-sel 2021-10-11 01:13:17 +02:00			`}%`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`\hfill`
ecaluation: Added real-sel 2021-10-11 01:13:17 +02:00			`\subcaptionbox{T-drive\label{fig:t-drive-sel}}{%`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`\includegraphics[width=.49\linewidth]{evaluation/t-drive-sel}%`
ecaluation: Added real-sel 2021-10-11 01:13:17 +02:00			`}%`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`\caption{`
evaluation: Updated shapes and minor corrections 2021-10-15 15:35:29 +02:00			`The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data, for different {\thething} percentages from Figure~\ref{fig:real}.`
			`The markers indicate the corresponding measurements with the incorporation of the privacy-preserving {\thething} selection module.`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`}`
ecaluation: Added real-sel 2021-10-11 01:13:17 +02:00			`\label{fig:real-sel}`
			`\end{figure}`

evaluation: Updated shapes and minor corrections 2021-10-15 15:35:29 +02:00			`In comparison with the utility performance without the {\thething} selection module (solid bars), we notice a slight deterioration for all three schemes (markers).`
			`This is natural since we allocated part of the available privacy budget to the privacy-preserving {\thething} selection module, which in turn increased the number of {\thethings}, except for the case of $100$\% {\thethings}.`
evaluation: Reviewed and replied to Katerina 2021-10-14 06:12:28 +02:00			`Therefore, there is less privacy budget available for data publishing throughout the time series.`
			`% for $0$\% and $100$\% {\thethings}.`
			`% \kat{why not for the other percentages?}`
evaluation: Final review 2021-10-14 14:30:35 +02:00			`Skip performs best in our experiments with HUE (Figure~\ref{fig:hue-sel}), due to the low range in the energy consumption and the high scale of the Laplace noise that it avoids due to the employed approximation.`
			`However, for the Copenhagen data set (Figure~\ref{fig:copenhagen-sel}) and T-drive (Figure~\ref{fig:t-drive-sel}), Skip attains high mean absolute error, which exposes no benefit with respect to user-level protection.`
			`Overall, Adaptive has a consistent performance in terms of utility for all of the data sets that we experimented with, and almost always outperforms the user-level privacy protection.`
evaluation: Updated shapes and minor corrections 2021-10-15 15:35:29 +02:00			`Thus, it is selected as the best scheme to use in general.`