evaluation: Minor corrections

This commit is contained in:
Manos Katsomallos 2021-10-11 01:13:45 +02:00
parent 34404faeed
commit d5c39c4e42
2 changed files with 13 additions and 12 deletions

View File

@ -43,7 +43,7 @@ We take into account only the temporal order of the points and the position of r
\subsection{Configurations} \subsection{Configurations}
\label{subsec:eval-conf} \label{subsec:eval-conf}
\subsubsection{{\Thethings}' percentage} \subsubsection{{\Thething} percentage}
For the Copenhagen data set, we achieve For the Copenhagen data set, we achieve
$0\%$ {\thethings} by considering an empty list of contact devices, $0\%$ {\thethings} by considering an empty list of contact devices,
@ -53,16 +53,16 @@ $60\%$ with $[181$, $182$, $192$, $195$, $196$, $201$, $203$, $207$, $221$, $230
$80\%$ with $[260$, $282$, $287$, $289$, $290$, $291$, $308$, $311$, $318$, $323$, $324$, $330$, $334$, $335$, $344$, $350$, $353$, $355$, $357$, $358$, $361$, $363]$, and $80\%$ with $[260$, $282$, $287$, $289$, $290$, $291$, $308$, $311$, $318$, $323$, $324$, $330$, $334$, $335$, $344$, $350$, $353$, $355$, $357$, $358$, $361$, $363]$, and
$100\%$ by including all of the possible contacts. $100\%$ by including all of the possible contacts.
In HUE, we get $0$, $20$ $40$, $60$, $80$, and $100$ {\thethings} percentages by setting the energy consumption threshold below $0.28$, $1.12$, $0.88$, $0.68$, $0.54$, $4.45$kWh respectively. In HUE, we get $0$\%, $20$\% $40$\%, $60$\%, $80$\%, and $100$\% {\thethings} by setting the energy consumption threshold below $0.28$kWh, $1.12$kWh, $0.88$kWh, $0.68$kWh, $0.54$kWh, $4.45$kWh respectively.
In T-drive, we achieved the desired {\thethings} percentages by utilizing the method of Li et al.~\cite{li2008mining} for detecting stay points in trajectory data. In T-drive, we achieved the desired {\thething} percentages by utilizing the method of Li et al.~\cite{li2008mining} for detecting stay points in trajectory data.
In more detail, the algorithm checks for each data item if each subsequent item is within a given distance threshold $\Delta l$ and measures the time period $\Delta t$ between the present point and the last subsequent point. In more detail, the algorithm checks for each data item if each subsequent item is within a given distance threshold $\Delta l$ and measures the time period $\Delta t$ between the present point and the last subsequent point.
We achieve $0$, $20$ $40$, $60$, $80$, and $100$ {\thethings} percentages by setting the ($\Delta l$ in meters, $\Delta t$ in minutes) pairs input to the stay point discovery method as [($0$, $1000$), ($2095$, $30$), ($2790$, $30$), ($3590$, $30$), ($4825$, $30$), ($10350$, $30$)]. We achieve $0$\%, $20$\% $40$\%, $60$\%, $80$\%, and $100$\% {\thethings} by setting the ($\Delta l$ in meters, $\Delta t$ in minutes) pairs input to the stay point discovery method as [($0$, $1000$), ($2095$, $30$), ($2790$, $30$), ($3590$, $30$), ($4825$, $30$), ($10350$, $30$)].
We generated synthetic data with \emph{skewed} (the {\thethings} are distributed towards the beginning/end of the series), \emph{symmetric} (in the middle), \emph{bimodal} (both end and beginning), and \emph{uniform} (all over the time series) {\thething} distributions. We generated synthetic data with \emph{skewed} (the {\thethings} are distributed towards the beginning/end of the series), \emph{symmetric} (in the middle), \emph{bimodal} (both end and beginning), and \emph{uniform} (all over the time series) {\thething} distributions.
In order to get {\thethings} with the above distribution features, we generate probability distributions with appropriate characteristics and sample from them, without replacement, the desired number of points. In order to get {\thethings} with the above distribution features, we generate probability distributions with appropriate characteristics and sample from them, without replacement, the desired number of points.
%The generated distributions are representative of the cases that we wish to examine during the experiments. %The generated distributions are representative of the cases that we wish to examine during the experiments.
For example, for a left-skewed {\thethings} distribution we would utilize a truncated distribution resulting from the restriction of the domain of a distribution to the beginning and end of the time series with its location shifted to the center of the right half of the series. For example, for a left-skewed {\thething} distribution we would utilize a truncated distribution resulting from the restriction of the domain of a distribution to the beginning and end of the time series with its location shifted to the center of the right half of the series.
For consistency, we calculate the scale parameter depending on the length of the series by setting it equal to the series' length over a constant. For consistency, we calculate the scale parameter depending on the length of the series by setting it equal to the series' length over a constant.
Notice that in our experiments, in the cases when we have $0\%$ and $100\%$ of the events being {\thethings}, we get the same behavior as in event- and user-level privacy respectively. Notice that in our experiments, in the cases when we have $0\%$ and $100\%$ of the events being {\thethings}, we get the same behavior as in event- and user-level privacy respectively.

View File

@ -25,20 +25,21 @@ Figure~\ref{fig:real} exhibits the performance of the three mechanisms: Skip, Un
\subcaptionbox{T-drive\label{fig:t-drive}}{% \subcaptionbox{T-drive\label{fig:t-drive}}{%
\includegraphics[width=.5\linewidth]{evaluation/t-drive}% \includegraphics[width=.5\linewidth]{evaluation/t-drive}%
}% }%
\caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thethings} percentages.} \caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages.}
\label{fig:real} \label{fig:real}
\end{figure} \end{figure}
% For the Geolife data set (Figure~\ref{fig:geolife}), Skip has the best performance (measured in Mean Absolute Error, in meters) because it invests the most budget overall at every regular event, by approximating the {\thething} data based on previous releases. % For the Geolife data set (Figure~\ref{fig:geolife}), Skip has the best performance (measured in Mean Absolute Error, in meters) because it invests the most budget overall at every regular event, by approximating the {\thething} data based on previous releases.
% Due to the data set's high density (every $1$--$5$ seconds or every $5$--$10$ meters per point) approximating constantly has a low impact on the data utility. % Due to the data set's high density (every $1$--$5$ seconds or every $5$--$10$ meters per point) approximating constantly has a low impact on the data utility.
% On the contrary, the lower density of the T-drive data set (Figure~\ref{fig:t-drive}) has a negative impact on the performance of Skip. % On the contrary, the lower density of the T-drive data set (Figure~\ref{fig:t-drive}) has a negative impact on the performance of Skip.
For the Copenhagen data set (Figure~\ref{fig:copenhagen}), Adaptive has a constant overall performance and performs best for $0$, $60$, and $80$\% {\thethings}. For the Copenhagen data set (Figure~\ref{fig:copenhagen}), Adaptive has a constant overall performance and performs best for $0$\%, $60$\%, and $80$\% {\thethings}.
The Skip model excels, compared to the others, at cases where it needs to approximate a lot ($100$\%). We notice that for $0$\% {\thethings}, it achieves better utility than the event-level protection.
The combination of the low range in HUE ($[0.28$, $4.45]$ with an average of $0.88$kWh) and the large scale in the Laplace mechanism results in a low mean absolute error for Skip(Figure~\ref{fig:hue}). The Skip model excels, compared to the others, at cases where it needs to approximate $20$\%--$40$\% or $100$\% of the times.
The combination of the low range in HUE ($[0.28$, $4.45]$ with an average of $0.88$kWh) and the large scale in the Laplace mechanism, results in a low mean absolute error for Skip (Figure~\ref{fig:hue}).
In general, a scheme that favors approximation over noise injection would achieve a better performance in this case. In general, a scheme that favors approximation over noise injection would achieve a better performance in this case.
However, the Adaptive model performs by far better than Uniform and strikes a nice balance between event- and user-level protection for all {\thethings} percentages. However, the Adaptive model performs by far better than Uniform and strikes a nice balance between event- and user-level protection for all {\thething} percentages.
In the T-drive data set (Figure~\ref{fig:t-drive}), the Adaptive mechanism outperforms Uniform by $10$\%--$20$\% for all {\thethings} percentages greater than $40$ and Skip by more than $20$\%. In the T-drive data set (Figure~\ref{fig:t-drive}), the Adaptive mechanism outperforms Uniform by $10$\%--$20$\% for all {\thething} percentages greater than $40$\% and Skip by more than $20$\%.
The lower density (average distance of $623$ meters) of the T-drive data set has a negative impact on the performance of Skip. The lower density (average distance of $623$m) of the T-drive data set has a negative impact on the performance of Skip.
In general, we can claim that the Adaptive is the most reliable and best performing mechanism with minimal tuning, if we take into consideration the drawbacks of the Skip mechanism mentioned in Section~\ref{subsec:lmdk-mechs}. In general, we can claim that the Adaptive is the most reliable and best performing mechanism with minimal tuning, if we take into consideration the drawbacks of the Skip mechanism mentioned in Section~\ref{subsec:lmdk-mechs}.
Moreover, designing a data-dependent sampling scheme would possibly result in better results for Adaptive. Moreover, designing a data-dependent sampling scheme would possibly result in better results for Adaptive.