100 lines
		
	
	
		
			7.6 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
			
		
		
	
	
			100 lines
		
	
	
		
			7.6 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
| \section{Selection of {\thethings}}
 | |
| \label{sec:eval-lmdk-sel}
 | |
| In this section, we present the experiments on the methodology for the dummy {\thething} selection presented in Section~\ref{subsec:lmdk-sel-sol}, on the real and synthetic data sets.
 | |
| Due to the high complexity of the \texttt{Optimal} and \texttt{Heuristic} algorithms, we choose to evaluate only the \texttt{Partitioned}, which is the optimized solution that we designed.
 | |
| With the experiments on the synthetic data sets (Section~\ref{subsec:sel-utl}) we show the normalized Euclidean and Wasserstein distance metrics (not to be confused with the temporal distances in Figure~\ref{fig:avg-dist})
 | |
| % \kat{is this distance   the landmark distance that we saw just before ?   clarify } 
 | |
| of the time series histograms for various distributions and {\thething} percentages.
 | |
| This allows us to justify our design decisions for our concept that we showcased in Section~\ref{subsec:lmdk-sel-sol}.
 | |
| With the experiments on the real data sets (Section~\ref{subsec:sel-prv}), we show the performance in terms of utility of our three {\thething} schemes in combination with the privacy-preserving dummy {\thething} selection module, which enhances the privacy protection that our concept provides.
 | |
| % \kat{Mention whether it improves the original proposal or not.}
 | |
| 
 | |
| 
 | |
| \subsection{Dummy {\thething} selection utility metrics}
 | |
| \label{subsec:sel-utl}
 | |
| Figure~\ref{fig:sel-dist} demonstrates the normalized distance that we obtain when we utilize either (a)~the Euclidean or (b)~the Wasserstein distance metric to obtain a set of {\thethings} including regular events.
 | |
| 
 | |
| \begin{figure}[htp]
 | |
|   \centering
 | |
|   \subcaptionbox{Euclidean\label{fig:sel-dist-norm}}{%
 | |
|     \includegraphics[width=.495\linewidth]{evaluation/sel-dist-norm}%
 | |
|   }%
 | |
|   \hfill
 | |
|   \subcaptionbox{Wasserstein\label{fig:sel-dist-emd}}{%
 | |
|     \includegraphics[width=.495\linewidth]{evaluation/sel-dist-emd}%
 | |
|   }%
 | |
|   \caption{The normalized (a)~Euclidean, and (b)~Wasserstein distance of the generated {\thething} sets for different {\thething} percentages.}
 | |
|   \label{fig:sel-dist}
 | |
| \end{figure}
 | |
| 
 | |
| Comparing the results of the Euclidean distance in Figure~\ref{fig:sel-dist-norm} with those of the Wasserstein in Figure~\ref{fig:sel-dist-emd} we conclude that the Euclidean distance provides more consistent results for all possible distributions.
 | |
| The maximum difference per {\thething} percentage is approximately $0.2$ for the former and $0.15$ for the latter between the bimodal and skewed {\thething} distributions.
 | |
| Overall, the Euclidean distance achieves a mean normalized distance of $0.3$, while the Wasserstein distance a mean normalized distance that is equal to $0.2$.
 | |
| Therefore, and by observing Figure~\ref{fig:sel-dist}, Wasserstein demonstrates a less consistent performance and less linear behavior among all possible {\thething} distributions.
 | |
| Thus, we choose to utilize the Euclidean distance metric for the implementation of the privacy-preserving dummy {\thething} selection module in Section~\ref{subsec:lmdk-sel-sol}.
 | |
| 
 | |
| 
 | |
| \subsection{Privacy budget tuning}
 | |
| \label{subsec:sel-eps}
 | |
| In Figure~\ref{fig:sel-eps}, we test the \texttt{Uniform} mechanism with real data by investing different ratios ($1$\%, $10$\%, $25$\%, and $50$\%) of the available privacy budget $\varepsilon$ in the dummy {\thething} selection module and the remaining in perturbing the original data values, in order to figure out the optimal ratio value.
 | |
| \texttt{Uniform} is our baseline implementation, and hence allows us to derive more accurate conclusions in this case.
 | |
| In general, we are expecting to observe that greater ratios will result in more accurate, i.e.,~smaller, {\thething} sets and less accurate values in the released data.
 | |
| 
 | |
| \begin{figure}[htp]
 | |
|   \centering
 | |
|   \subcaptionbox{Copenhagen\label{fig:copenhagen-sel-eps}}{%
 | |
|     \includegraphics[width=.495\linewidth]{evaluation/copenhagen-sel-eps}%
 | |
|   }%
 | |
|   \hspace{\fill}
 | |
|   \\ \bigskip
 | |
|   \subcaptionbox{HUE\label{fig:hue-sel-eps}}{%
 | |
|     \includegraphics[width=.495\linewidth]{evaluation/hue-sel-eps}%
 | |
|   }%
 | |
|   \hfill
 | |
|   \subcaptionbox{T-drive\label{fig:t-drive-sel-eps}}{%
 | |
|     \includegraphics[width=.495\linewidth]{evaluation/t-drive-sel-eps}%
 | |
|   }%
 | |
|   \caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages. We apply the \texttt{Uniform} {\thething} privacy mechanism and vary the ratio of the privacy budget $\varepsilon$ that we allocate to the dummy {\thething} selection module.}
 | |
|   \label{fig:sel-eps}
 | |
| \end{figure}
 | |
| 
 | |
| The application of the randomized response mechanism, in the Copenhagen data set (Figure~\ref{fig:copenhagen-sel-eps}), is tolerant to the fluctuations of the privacy budget and maintains a relatively constant performance in terms of data utility.
 | |
| For HUE (Figure~\ref{fig:hue-sel-eps}) and T-drive (Figure~\ref{fig:t-drive-sel-eps}), we observe that our implementation performs better for lower ratios, e.g.,~$0.01$, where we end up allocating the majority of the available privacy budget to the data release process instead of the dummy {\thething} selection module.
 | |
| The results of this experiment indicate that we can safely allocate the majority of $\varepsilon$ to the data publishing process, and therefore achieve better data utility, while guaranteeing more robust privacy protection.
 | |
| 
 | |
| 
 | |
| \subsection{Privacy schemes and dummy {\thething} selection}
 | |
| \label{subsec:sel-prv}
 | |
| Figure~\ref{fig:real-sel} exhibits the performance of \texttt{Skip}, \texttt{Uniform}, and \texttt{Adaptive} schemes (presented in detail in Section~\ref{subsec:lmdk-mechs}) in combination with the {\thething} selection module (Section~\ref{subsec:lmdk-sel-sol}).
 | |
| 
 | |
| \begin{figure}[htp]
 | |
|   \centering
 | |
|   \subcaptionbox{Copenhagen\label{fig:copenhagen-sel}}{%
 | |
|     \includegraphics[width=.49\linewidth]{evaluation/copenhagen-sel}%
 | |
|   }%
 | |
|   \hfill
 | |
|   \\ \bigskip
 | |
|   \subcaptionbox{HUE\label{fig:hue-sel}}{%
 | |
|     \includegraphics[width=.49\linewidth]{evaluation/hue-sel}%
 | |
|   }%
 | |
|   \hfill
 | |
|   \subcaptionbox{T-drive\label{fig:t-drive-sel}}{%
 | |
|     \includegraphics[width=.49\linewidth]{evaluation/t-drive-sel}%
 | |
|   }%
 | |
|   \caption{
 | |
|     The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data, for different {\thething} percentages from Figure~\ref{fig:real}.
 | |
|     The markers indicate the corresponding measurements with the incorporation of the privacy-preserving {\thething} selection module.
 | |
|   }
 | |
|   \label{fig:real-sel}
 | |
| \end{figure}
 | |
| 
 | |
| In comparison with the utility performance without the dummy {\thething} selection module (solid bars), we notice a slight deterioration for all three schemes (markers).
 | |
| This is natural since we allocated part of the available privacy budget to the privacy-preserving dummy {\thething} selection module, which in turn increased the number of {\thethings}, except for the case of $100$\% {\thethings}.
 | |
| Therefore, there is less privacy budget available for data publishing throughout the time series. 
 | |
| % for $0$\% and $100$\% {\thethings}.
 | |
| % \kat{why not for the other percentages?}
 | |
| \texttt{Skip} performs best in our experiments with HUE (Figure~\ref{fig:hue-sel}), due to the low range in the energy consumption and the high scale of the Laplace noise that it avoids due to the employed approximation.
 | |
| However, for the Copenhagen data set (Figure~\ref{fig:copenhagen-sel}) and T-drive (Figure~\ref{fig:t-drive-sel}), \texttt{Skip} attains high mean absolute error, which exposes no benefit with respect to user-level protection.
 | |
| Overall, \texttt{Adaptive} has a consistent performance in terms of utility for all of the data sets that we experimented with, and almost always outperforms the user-level privacy protection.
 | |
| Thus, \texttt{Adaptive} is selected as the best scheme to use in general.
 |