evaluation: Reviewed

This commit is contained in:
Manos Katsomallos 2021-10-25 03:12:10 +02:00
parent 271341603c
commit a4e4da0b24
4 changed files with 51 additions and 52 deletions

View File

@ -2,10 +2,10 @@
\label{ch:eval} \label{ch:eval}
\nnfootnote{This chapter is under review for being published in the proceedings of the $12$th ACM Conference on Data and Application Security and Privacy~\cite{katsomallos2022landmark}.\bigskip} \nnfootnote{This chapter is under review for being published in the proceedings of the $12$th ACM Conference on Data and Application Security and Privacy~\cite{katsomallos2022landmark}.\bigskip}
In this chapter we present the experiments that we performed in order to evaluate {\thething} privacy (Chapter~\ref{ch:lmdk-prv}) on real and synthetic data sets. In this chapter, we present the experiments that we performed in order to evaluate {\thething} privacy (Chapter~\ref{ch:lmdk-prv}) on real and synthetic data sets.
Section~\ref{sec:eval-dtl} contains all the details regarding the data sets the we used for our experiments along with the system configurations. Section~\ref{sec:eval-dtl} contains all the details regarding the data sets the we used for our experiments along with the system configurations.
Section~\ref{sec:eval-lmdk} evaluates the data utility of the {\thething} privacy schemes that we designed in Section~\ref{sec:thething} and investigates the behavior of the privacy loss under temporal correlation for different distributions of {\thethings}. Section~\ref{sec:eval-lmdk} evaluates the data utility of the {\thething} privacy schemes that we designed in Section~\ref{sec:thething} in comparison to user and event level, and investigates the behavior of the privacy loss under temporal correlation for different distributions of {\thethings}.
Section~\ref{sec:eval-lmdk-sel} justifies our decisions while designing the privacy-preserving {\thething} selection module in Section~\ref{sec:theotherthing} and the data utility impact of the latter. Section~\ref{sec:eval-lmdk-sel} justifies our decisions while designing the privacy-preserving dummy {\thething} selection module in Section~\ref{sec:theotherthing} and the data utility impact of the latter.
Finally, Section~\ref{sec:eval-sum} concludes this chapter by summarizing the main results derived from the experiments. Finally, Section~\ref{sec:eval-sum} concludes this chapter by summarizing the main results derived from the experiments.
\input{evaluation/details} \input{evaluation/details}

View File

@ -1,10 +1,10 @@
\section{Summary} \section{Summary}
\label{sec:eval-sum} \label{sec:eval-sum}
In this chapter we presented the experimental evaluation of the {\thething} privacy schemes and the privacy-preserving {\thething} selection scheme that we developed in Chapter~\ref{ch:lmdk-prv}, on real and synthetic data sets. In this chapter we presented the experimental evaluation of the {\thething} privacy schemes and the dummy {\thething} selection module that we developed in Chapter~\ref{ch:lmdk-prv}, on real and synthetic data sets.
The Adaptive scheme is the most reliable and best performing scheme, in terms of overall data utility, with minimal tuning across most of the cases. The \texttt{Adaptive} scheme is the most reliable and best performing scheme, in terms of overall data utility, with minimal tuning across most of the cases.
Skip performs optimally in data sets with a smaller target value range, where approximation fits best. \texttt{Skip} performs optimally in data sets with a smaller target value range, where approximation fits best.
The {\thething} selection module introduces a reasonable data utility decline to all of our schemes however, the Adaptive handles it well and bounds the data utility to higher levels compared to user-level protection. The dummy {\thething} selection module introduces a reasonable data utility decline to all of our schemes however, the \texttt{Adaptive} handles it well and bounds the data utility to higher levels compared to user-level protection.
% \kat{it would be nice to see it clearly on Figure 5.5. (eg, by including another bar that shows adaptive without landmark selection)} % \kat{it would be nice to see it clearly on Figure 5.5. (eg, by including another bar that shows adaptive without landmark selection)}
% \mk{Done.} % \mk{Done.}
In terms of temporal correlation, we observe that under moderate and strong temporal correlation, a greater average regular--{\thething} event distance in a {\thething} distribution causes greater overall privacy loss. In terms of temporal correlation, we observe that under moderate and strong temporal correlation, a greater average regular--{\thething} event distance in a {\thething} distribution causes greater temporal privacy loss.
Finally, the contribution of the {\thething} privacy on enhancing the data utility, while preserving $\varepsilon$-differential privacy, is demonstrated by the fact that the selected Adaptive scheme provides better data utility than the user-level privacy protection. Finally, the contribution of the {\thething} privacy on enhancing the data utility, while preserving $\varepsilon$-differential privacy, is demonstrated by the fact that the selected \texttt{Adaptive} scheme provides better data utility than the user-level privacy protection.

View File

@ -1,70 +1,69 @@
\section{Selection of {\thethings}} \section{Selection of {\thethings}}
\label{sec:eval-lmdk-sel} \label{sec:eval-lmdk-sel}
In this section, we present the experiments on the methodology for the {\thething} selection presented in Section~\ref{subsec:lmdk-sel-sol}, on the real and synthetic data sets. In this section, we present the experiments on the methodology for the dummy {\thething} selection presented in Section~\ref{subsec:lmdk-sel-sol}, on the real and synthetic data sets.
Due to the high complexity of the \texttt{Optimal} and \texttt{Heuristic} algorithms, we choose to evaluate only the \texttt{Partitioned}, which is the optimized solution that we designed.
With the experiments on the synthetic data sets (Section~\ref{subsec:sel-utl}) we show the normalized Euclidean and Wasserstein distance metrics (not to be confused with the temporal distances in Figure~\ref{fig:avg-dist}) With the experiments on the synthetic data sets (Section~\ref{subsec:sel-utl}) we show the normalized Euclidean and Wasserstein distance metrics (not to be confused with the temporal distances in Figure~\ref{fig:avg-dist})
% \kat{is this distance the landmark distance that we saw just before ? clarify } % \kat{is this distance the landmark distance that we saw just before ? clarify }
of the time series histograms for various distributions and {\thething} percentages. of the time series histograms for various distributions and {\thething} percentages.
This allows us to justify our design decisions for our concept that we showcased in Section~\ref{subsec:lmdk-sel-sol}. This allows us to justify our design decisions for our concept that we showcased in Section~\ref{subsec:lmdk-sel-sol}.
With the experiments on the real data sets (Section~\ref{subsec:sel-prv}), we show the performance in terms of utility of our three {\thething} schemes in combination with the privacy-preserving {\thething} selection module, which enhances the privacy protection that our concept provides. With the experiments on the real data sets (Section~\ref{subsec:sel-prv}), we show the performance in terms of utility of our three {\thething} schemes in combination with the privacy-preserving dummy {\thething} selection module, which enhances the privacy protection that our concept provides.
% \kat{Mention whether it improves the original proposal or not.} % \kat{Mention whether it improves the original proposal or not.}
\subsection{{\Thething} selection utility metrics} \subsection{Dummy {\thething} selection utility metrics}
\label{subsec:sel-utl} \label{subsec:sel-utl}
Figure~\ref{fig:sel-dist} demonstrates the normalized distance that we obtain when we utilize either (a)~the Euclidean or (b)~the Wasserstein distance metric to obtain a set of {\thethings} including regular events. Figure~\ref{fig:sel-dist} demonstrates the normalized distance that we obtain when we utilize either (a)~the Euclidean or (b)~the Wasserstein distance metric to obtain a set of {\thethings} including regular events.
\begin{figure}[htp] \begin{figure}[htp]
\centering \centering
\subcaptionbox{Euclidean\label{fig:sel-dist-norm}}{% \subcaptionbox{Euclidean\label{fig:sel-dist-norm}}{%
\includegraphics[width=.49\linewidth]{evaluation/sel-dist-norm}% \includegraphics[width=.495\linewidth]{evaluation/sel-dist-norm}%
}% }%
\hfill \hfill
\subcaptionbox{Wasserstein\label{fig:sel-dist-emd}}{% \subcaptionbox{Wasserstein\label{fig:sel-dist-emd}}{%
\includegraphics[width=.49\linewidth]{evaluation/sel-dist-emd}% \includegraphics[width=.495\linewidth]{evaluation/sel-dist-emd}%
}% }%
\caption{The normalized (a)~Euclidean, and (b)~Wasserstein distance of the generated {\thething} sets for different {\thething} percentages.} \caption{The normalized (a)~Euclidean, and (b)~Wasserstein distance of the generated {\thething} sets for different {\thething} percentages.}
\label{fig:sel-dist} \label{fig:sel-dist}
\end{figure} \end{figure}
Comparing the results of the Euclidean distance in Figure~\ref{fig:sel-dist-norm} with those of the Wasserstein in Figure~\ref{fig:sel-dist-emd} we conclude that the Euclidean distance provides more consistent results for all possible distributions. Comparing the results of the Euclidean distance in Figure~\ref{fig:sel-dist-norm} with those of the Wasserstein in Figure~\ref{fig:sel-dist-emd} we conclude that the Euclidean distance provides more consistent results for all possible distributions.
% (1 + (0.25 + 0.25 + 0.45 + 0.45)/4 + (0.25 + 0.25 + 0.3 + 0.3)/4 + (0.2 + 0.2 + 0.2 + 0.2)/4 + (0.15 + 0.15 + 0.15 + 0.15)/4)/6
% (1 + (0.1 + 0.1 + 0.25 + 0.25)/4 + (0.075 + 0.075 + .15 + 0.15)/4 + (0.075 + 0.075 + 0.1 + 0.1)/4 + (0.025 + 0.025 + 0.025 + 0.025)/4)/6
The maximum difference per {\thething} percentage is approximately $0.2$ for the former and $0.15$ for the latter between the bimodal and skewed {\thething} distributions. The maximum difference per {\thething} percentage is approximately $0.2$ for the former and $0.15$ for the latter between the bimodal and skewed {\thething} distributions.
Overall, the Euclidean distance achieves a mean normalized distance of $0.3$ while the Wasserstein distance a mean normalized distance that is equal to $0.2$. Overall, the Euclidean distance achieves a mean normalized distance of $0.3$, while the Wasserstein distance a mean normalized distance that is equal to $0.2$.
Therefore, and by observing Figure~\ref{fig:sel-dist}, Wasserstein demonstrates a less consistent performance and less linear behavior among all possible {\thething} distributions. Therefore, and by observing Figure~\ref{fig:sel-dist}, Wasserstein demonstrates a less consistent performance and less linear behavior among all possible {\thething} distributions.
Thus, we choose to utilize the Euclidean distance metric for the implementation of the privacy-preserving {\thething} selection module in Section~\ref{subsec:lmdk-sel-sol}. Thus, we choose to utilize the Euclidean distance metric for the implementation of the privacy-preserving dummy {\thething} selection module in Section~\ref{subsec:lmdk-sel-sol}.
\subsection{Privacy budget tuning} \subsection{Privacy budget tuning}
\label{subsec:sel-eps} \label{subsec:sel-eps}
In Figure~\ref{fig:sel-eps} we test the Uniform scheme in real data by investing different ratios ($1$\%, $10$\%, $25$\%, and $50$\%) of the available privacy budget $\varepsilon$ in the {\thething} selection module and the remaining to perturbing the original data values, in order to figure out the optimal ratio value. In Figure~\ref{fig:sel-eps}, we test the \texttt{Uniform} mechanism in real data by investing different ratios ($1$\%, $10$\%, $25$\%, and $50$\%) of the available privacy budget $\varepsilon$ in the dummy {\thething} selection module and the remaining to perturbing the original data values, in order to figure out the optimal ratio value.
Uniform is our baseline implementation, and hence allows us to derive more accurate conclusions in this case. \texttt{Uniform} is our baseline implementation, and hence allows us to derive more accurate conclusions in this case.
In general, we are expecting to observe that greater ratios will result in more accurate, i.e.,~smaller, {\thething} sets and less accurate values in the released data. In general, we are expecting to observe that greater ratios will result in more accurate, i.e.,~smaller, {\thething} sets and less accurate values in the released data.
\begin{figure}[htp] \begin{figure}[htp]
\centering \centering
\subcaptionbox{Copenhagen\label{fig:copenhagen-sel-eps}}{% \subcaptionbox{Copenhagen\label{fig:copenhagen-sel-eps}}{%
\includegraphics[width=.49\linewidth]{evaluation/copenhagen-sel-eps}% \includegraphics[width=.495\linewidth]{evaluation/copenhagen-sel-eps}%
}% }%
\hspace{\fill} \hspace{\fill}
\\ \bigskip \\ \bigskip
\subcaptionbox{HUE\label{fig:hue-sel-eps}}{% \subcaptionbox{HUE\label{fig:hue-sel-eps}}{%
\includegraphics[width=.49\linewidth]{evaluation/hue-sel-eps}% \includegraphics[width=.495\linewidth]{evaluation/hue-sel-eps}%
}% }%
\hfill \hfill
\subcaptionbox{T-drive\label{fig:t-drive-sel-eps}}{% \subcaptionbox{T-drive\label{fig:t-drive-sel-eps}}{%
\includegraphics[width=.49\linewidth]{evaluation/t-drive-sel-eps}% \includegraphics[width=.495\linewidth]{evaluation/t-drive-sel-eps}%
}% }%
\caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages. We apply the Uniform {\thething} privacy scheme and vary the ratio of the privacy budget $\varepsilon$ that we allocate to the {\thething} selection module.} \caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages. We apply the \texttt{Uniform} {\thething} privacy mechanism and vary the ratio of the privacy budget $\varepsilon$ that we allocate to the dummy {\thething} selection module.}
\label{fig:sel-eps} \label{fig:sel-eps}
\end{figure} \end{figure}
The application of the randomized response mechanism, in the Copenhagen data set (Figure~\ref{fig:copenhagen-sel-eps}), is tolerant to the fluctuations of the privacy budget and maintains a relatively constant performance in terms of data utility. The application of the randomized response mechanism, in the Copenhagen data set (Figure~\ref{fig:copenhagen-sel-eps}), is tolerant to the fluctuations of the privacy budget and maintains a relatively constant performance in terms of data utility.
For HUE (Figure~\ref{fig:hue-sel-eps}) and T-drive (Figure~\ref{fig:t-drive-sel-eps}), we observe that our implementation performs better for lower ratios, e.g.,~$0.01$, where we end up allocating the majority of the available privacy budget to the data release process instead of the {\thething} selection module. For HUE (Figure~\ref{fig:hue-sel-eps}) and T-drive (Figure~\ref{fig:t-drive-sel-eps}), we observe that our implementation performs better for lower ratios, e.g.,~$0.01$, where we end up allocating the majority of the available privacy budget to the data release process instead of the dummy {\thething} selection module.
The results of this experiment indicate that we can safely allocate the majority of $\varepsilon$ for publishing the data values, and therefore achieve better data utility, while providing more robust privacy protection to the {\thething} set. The results of this experiment indicate that we can safely allocate the majority of $\varepsilon$ to the data publishing process, and therefore achieve better data utility, while guaranteeing more robust privacy protection.
\subsection{Budget allocation and {\thething} selection} \subsection{Privacy schemes and dummy {\thething} selection}
\label{subsec:sel-prv} \label{subsec:sel-prv}
Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptive schemes (presented in detail in Section~\ref{subsec:lmdk-mechs}) in combination with the {\thething} selection module (Section~\ref{subsec:lmdk-sel-sol}). Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptive schemes (presented in detail in Section~\ref{subsec:lmdk-mechs}) in combination with the {\thething} selection module (Section~\ref{subsec:lmdk-sel-sol}).
@ -89,12 +88,12 @@ Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptiv
\label{fig:real-sel} \label{fig:real-sel}
\end{figure} \end{figure}
In comparison with the utility performance without the {\thething} selection module (solid bars), we notice a slight deterioration for all three schemes (markers). In comparison with the utility performance without the dummy {\thething} selection module (solid bars), we notice a slight deterioration for all three schemes (markers).
This is natural since we allocated part of the available privacy budget to the privacy-preserving {\thething} selection module, which in turn increased the number of {\thethings}, except for the case of $100$\% {\thethings}. This is natural since we allocated part of the available privacy budget to the privacy-preserving dummy {\thething} selection module, which in turn increased the number of {\thethings}, except for the case of $100$\% {\thethings}.
Therefore, there is less privacy budget available for data publishing throughout the time series. Therefore, there is less privacy budget available for data publishing throughout the time series.
% for $0$\% and $100$\% {\thethings}. % for $0$\% and $100$\% {\thethings}.
% \kat{why not for the other percentages?} % \kat{why not for the other percentages?}
Skip performs best in our experiments with HUE (Figure~\ref{fig:hue-sel}), due to the low range in the energy consumption and the high scale of the Laplace noise that it avoids due to the employed approximation. \texttt{Skip} performs best in our experiments with HUE (Figure~\ref{fig:hue-sel}), due to the low range in the energy consumption and the high scale of the Laplace noise that it avoids due to the employed approximation.
However, for the Copenhagen data set (Figure~\ref{fig:copenhagen-sel}) and T-drive (Figure~\ref{fig:t-drive-sel}), Skip attains high mean absolute error, which exposes no benefit with respect to user-level protection. However, for the Copenhagen data set (Figure~\ref{fig:copenhagen-sel}) and T-drive (Figure~\ref{fig:t-drive-sel}), \texttt{Skip} attains high mean absolute error, which exposes no benefit with respect to user-level protection.
Overall, Adaptive has a consistent performance in terms of utility for all of the data sets that we experimented with, and almost always outperforms the user-level privacy protection. Overall, \texttt{Adaptive} has a consistent performance in terms of utility for all of the data sets that we experimented with, and almost always outperforms the user-level privacy protection.
Thus, it is selected as the best scheme to use in general. Thus, \texttt{Adaptive} is selected as the best scheme to use in general.

View File

@ -4,21 +4,22 @@
% \mk{OK.} % \mk{OK.}
In this section, we present the experiments that we performed, to test the methodology that we presented in Section~\ref{subsec:lmdk-sol}, on real and synthetic data sets. In this section, we present the experiments that we performed, to test the methodology that we presented in Section~\ref{subsec:lmdk-sol}, on real and synthetic data sets.
With the experiments on the real data sets (Section~\ref{subsec:lmdk-expt-bgt}), we show the performance in terms of data utility of our three {\thething} privacy mechanisms: Skip, Uniform and Adaptive. With the experiments on the real data sets (Section~\ref{subsec:lmdk-expt-bgt}), we show the performance in terms of data utility of our three {\thething} privacy schemes: \texttt{Skip}, \texttt{Uniform} and \texttt{Adaptive}.
We define data utility as the mean absolute error introduced by the privacy mechanism. We define data utility as the mean absolute error introduced by the privacy mechanism.
We compare with the event- and user-level differential privacy protection levels, and show that, in the general case, {\thething} privacy allows for better data utility than user-level differential privacy while balancing between the two protection levels. We compare with the event- and user-level differential privacy protection levels, and show that, in the general case, {\thething} privacy allows for better data utility than user-level differential privacy while balancing between the two protection levels.
With the experiments on the synthetic data sets (Section~\ref{subsec:lmdk-expt-cor}) we show the temporal privacy loss, With the experiments on the synthetic data sets (Section~\ref{subsec:lmdk-expt-cor}) we show how the temporal privacy loss,
% \kat{in the previous set of experiments we were measuring the MAE, now we are measuring the privacy loss... Why is that? Isn't it two sides of the same coin? } % \kat{in the previous set of experiments we were measuring the MAE, now we are measuring the privacy loss... Why is that? Isn't it two sides of the same coin? }
i.e.,~the privacy budget $\varepsilon$ with the extra privacy loss because of the temporal correlation, under temporal correlation within our framework when tuning the size and statistical characteristics of the input {\thething} set $L$. i.e.,~the privacy budget $\varepsilon$ with the extra privacy loss because of the temporal correlation, changes
%under temporal correlation within our framework
when tuning the size and statistical characteristics of the input {\thething} set $L$.
% \kat{mention briefly what you observe} % \kat{mention briefly what you observe}
We observe that a greater average {\thething}--regular event distance in a time series can result into greater temporal privacy loss under moderate and strong temporal correlation. We observe that a greater average {\thething}--regular event distance in a time series can result into greater temporal privacy loss under moderate and strong temporal correlation.
\subsection{Budget allocation schemes} \subsection{{\Thething} privacy schemes}
\label{subsec:lmdk-expt-bgt} \label{subsec:lmdk-expt-bgt}
Figure~\ref{fig:real} exhibits the performance of the three schemes, \texttt{Skip}, \texttt{Uniform}, and \texttt{Adaptive} applied on the three data sets that we study.
Figure~\ref{fig:real} exhibits the performance of the three mechanisms, Skip, Uniform, and Adaptive applied on the three data sets that we study.
Notice that, in the cases when we have $0\%$ and $100\%$ of the events being {\thethings}, we get the same behavior as in event- and user-level privacy respectively. Notice that, in the cases when we have $0\%$ and $100\%$ of the events being {\thethings}, we get the same behavior as in event- and user-level privacy respectively.
This happens due the fact that at each timestamp we take into account only the data items at the current timestamp and ignore the rest of the time series (event-level) when there are no {\thethings}. This happens due the fact that at each timestamp we take into account only the data items at the current timestamp and ignore the rest of the time series (event-level) when there are no {\thethings}.
Whereas, when each timestamp corresponds to a {\thething} we consider and protect all the events throughout the entire series (user-level). Whereas, when each timestamp corresponds to a {\thething} we consider and protect all the events throughout the entire series (user-level).
@ -29,21 +30,21 @@ Whereas, when each timestamp corresponds to a {\thething} we consider and protec
\begin{figure}[htp] \begin{figure}[htp]
\centering \centering
\subcaptionbox{Copenhagen\label{fig:copenhagen}}{% \subcaptionbox{Copenhagen\label{fig:copenhagen}}{%
\includegraphics[width=.49\linewidth]{evaluation/copenhagen}% \includegraphics[width=.495\linewidth]{evaluation/copenhagen}%
}% }%
\\ \bigskip \\ \bigskip
\subcaptionbox{HUE\label{fig:hue}}{% \subcaptionbox{HUE\label{fig:hue}}{%
\includegraphics[width=.49\linewidth]{evaluation/hue}% \includegraphics[width=.495\linewidth]{evaluation/hue}%
}% }%
\hfill \hfill
\subcaptionbox{T-drive\label{fig:t-drive}}{% \subcaptionbox{T-drive\label{fig:t-drive}}{%
\includegraphics[width=.49\linewidth]{evaluation/t-drive}% \includegraphics[width=.495\linewidth]{evaluation/t-drive}%
}% }%
\caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages.} \caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages.}
\label{fig:real} \label{fig:real}
\end{figure} \end{figure}
For the Copenhagen data set (Figure~\ref{fig:copenhagen}), Adaptive has an For the Copenhagen data set (Figure~\ref{fig:copenhagen}), \texttt{Adaptive} has an
% constant % constant
% \kat{it is not constant, for 0 it is much lower} % \kat{it is not constant, for 0 it is much lower}
overall consistent performance and works best for $60$\% and $80$\% {\thethings}. overall consistent performance and works best for $60$\% and $80$\% {\thethings}.
@ -52,21 +53,21 @@ overall consistent performance and works best for $60$\% and $80$\% {\thethings}
We notice that for $0$\% {\thethings}, it achieves better utility than the event-level protection We notice that for $0$\% {\thethings}, it achieves better utility than the event-level protection
% \kat{what does this mean? how is it possible?} % \kat{what does this mean? how is it possible?}
due to the combination of more available privacy budget per timestamp (due to the absence of {\thethings}) and its adaptive sampling methodology. due to the combination of more available privacy budget per timestamp (due to the absence of {\thethings}) and its adaptive sampling methodology.
Skip excels, compared to the others, at cases where it needs to approximate $20$\%, $40$\%, or $100$\% of the times. \texttt{Skip} excels, compared to the others, at cases where it needs to approximate $20$\%, $40$\%, or $100$\% of the times.
% \kat{it seems a little random.. do you have an explanation? (rather few times or all?)} % \kat{it seems a little random.. do you have an explanation? (rather few times or all?)}
In general, we notice that, for this data set and due to the application of the random response technique, it is more beneficial to either invest more privacy budget per event or prefer approximation over introducing randomization. In general, we notice that, for this data set and due to the application of the random response technique, it is more beneficial to either invest more privacy budget per event or prefer approximation over introducing randomization.
The combination of the small range of measurements ($[0.28$, $4.45]$ with an average of $0.88$kWh) in HUE (Figure~\ref{fig:hue}) and the large scale in the Laplace mechanism, allows for mechanisms that favor approximation over noise injection to achieve a better performance in terms of data utility. The combination of the small range of measurements ($[0.28$, $4.45]$ with an average of $0.88$kWh) in HUE (Figure~\ref{fig:hue}) and the large scale in the Laplace mechanism, allows for schemes that favor approximation over noise injection to achieve a better performance in terms of data utility.
Hence, Skip achieves a constant low mean absolute error. Hence, \texttt{Skip} achieves a constant low mean absolute error.
% \kat{why?explain} % \kat{why?explain}
Regardless, the Adaptive mechanism performs by far better than Uniform and Regardless, the \texttt{Adaptive} scheme performs by far better than \texttt{Uniform} and
% strikes a nice balance\kat{???} % strikes a nice balance\kat{???}
balances between event- and user-level protection for all {\thething} percentages. balances between event- and user-level protection for all {\thething} percentages.
In T-drive (Figure~\ref{fig:t-drive}), the Adaptive mechanism outperforms Uniform by $10$\%--$20$\% for all {\thething} percentages greater than $40$\% and Skip by more than $20$\%. In T-drive (Figure~\ref{fig:t-drive}), \texttt{Adaptive} outperforms \texttt{Uniform} by $10$\%--$20$\% for all {\thething} percentages greater than $40$\% and \texttt{Skip} by more than $20$\%.
The lower density (average distance of $623$m) of the T-drive data set has a negative impact on the performance of Skip because republishing a previous perturbed value is now less accurate than perturbing the current location. The lower density (average distance of $623$m) of the T-drive data set has a negative impact on the performance of \texttt{Skip} because republishing a previous perturbed value is now less accurate than perturbing the current location.
Principally, we can claim that the Adaptive is the most reliable and best performing mechanism, Principally, we can claim that the \texttt{Adaptive} is the most reliable and best performing scheme,
% with a minimal and generic parameter tuning % with a minimal and generic parameter tuning
% \kat{what does minimal tuning mean?} % \kat{what does minimal tuning mean?}
if we take into consideration the drawbacks of the Skip mechanism, particularly in spatiotemporal data, e.g., sporadic location data publishing~\cite{gambs2010show, russell2018fitness} or misapplying location cloaking~\cite{xssfopes2020tweet}, that could lead to the indication of privacy-sensitive attribute values. if we take into consideration the drawbacks of the Skip mechanism, particularly in spatiotemporal data, e.g., sporadic location data publishing~\cite{gambs2010show, russell2018fitness} or misapplying location cloaking~\cite{xssfopes2020tweet}, that could lead to the indication of privacy-sensitive attribute values.
@ -77,12 +78,11 @@ Moreover, implementing a more advanced and data-dependent sampling method
that accounts for changes in the trends of the input data and adapts its rate accordingly, would that accounts for changes in the trends of the input data and adapts its rate accordingly, would
% possibly % possibly
% \kat{possibly is not good enough, if you are sure remove it. Otherwise mention that more experiments need to be done?} % \kat{possibly is not good enough, if you are sure remove it. Otherwise mention that more experiments need to be done?}
result in a more effective budget allocation that would improve the performance of Adaptive in terms of data utility. result in a more effective budget allocation that would improve the performance of \texttt{Adaptive} in terms of data utility.
\subsection{Temporal distance and correlation} \subsection{Temporal distance and correlation}
\label{subsec:lmdk-expt-cor} \label{subsec:lmdk-expt-cor}
As previously mentioned, temporal correlation is inherent in continuous publishing, and it is the cause of supplementary privacy loss in the case of privacy-preserving time series publishing. As previously mentioned, temporal correlation is inherent in continuous publishing, and it is the cause of supplementary privacy loss in the case of privacy-preserving time series publishing.
In this section, we are interested in studying the effect that the distance of the {\thethings} from every regular event has on the loss caused under the presence of temporal correlation. In this section, we are interested in studying the effect that the distance of the {\thethings} from every regular event has on the loss caused under the presence of temporal correlation.