evaluation: Reviewed and replied to Katerina

2021-10-14 06:12:28 +02:00
parent 427fbaf0ff
commit fc904af1fb
12 changed files with 187 additions and 125 deletions
--- a/graphics/evaluation/dist-cor-mod.pdf
+++ b/graphics/evaluation/dist-cor-mod.pdf
--- a/graphics/evaluation/dist-cor-wk.pdf
+++ b/graphics/evaluation/dist-cor-wk.pdf
--- a/rslt/bgt_cmp/Copenhagen-sel-cmp.pdf
+++ b/rslt/bgt_cmp/Copenhagen-sel-cmp.pdf
--- a/rslt/bgt_cmp/Copenhagen.pdf
+++ b/rslt/bgt_cmp/Copenhagen.pdf
--- a/rslt/dist_cor/dist-cor-mod.pdf
+++ b/rslt/dist_cor/dist-cor-mod.pdf
--- a/rslt/dist_cor/dist-cor-stg.pdf
+++ b/rslt/dist_cor/dist-cor-stg.pdf
--- a/rslt/dist_cor/dist-cor-wk.pdf
+++ b/rslt/dist_cor/dist-cor-wk.pdf
--- a/text/evaluation/details.tex
+++ b/text/evaluation/details.tex
@ -1,103 +1,125 @@
-\section{Experimental Setting and Data Sets}
+\section{Experimental setting and data sets}
 \label{sec:eval-dtl}
-
 In this section we list all the relevant details regarding the evaluation setting (Section~\ref{subsec:eval-setup}), and we present the real and synthetic data sets that we used (Section~\ref{subsec:eval-dat}), along with the corresponding configurations (Section~\ref{subsec:eval-conf}).


-\subsection{Machine Setup}
+\subsection{Machine setup}
 \label{subsec:eval-setup}
-
-We implemented our experiments\footnote{Code available at \url{https://git.delkappa.com/manos/the-last-thing}} in Python $3$.$9$.$7$ and executed them on a machine with an Intel i$7$-$6700$HQ at $3$.$5$GHz CPU and $16$GB RAM, running Manjaro Linux $21$.$1$.$5$.
-We repeated each experiment $100$ times and we report the mean over these iterations. \kat{It could be interesting to report also on the diagrams the std}
+We implemented our experiments\footnote{Source code available at \url{https://git.delkappa.com/manos/the-last-thing}} in Python $3$.$9$.$7$ and executed them on a machine with an Intel i$7$-$6700$HQ at $3$.$5$GHz CPU and $16$GB RAM, running Manjaro Linux $21$.$1$.$5$.
+We repeated each experiment $100$ times and we report the mean over these iterations.
+% \kat{It could be interesting to report also on the diagrams the std}
+% \mk{I'll keep it in mind.}


 \subsection{Data sets}
 \label{subsec:eval-dat}
+We performed experiments on real (Section~\ref{subsec:eval-dat-real}) and synthetic data sets (Section~\ref{subsec:eval-dat-syn}).

-\subsubsection{Real Data Sets}
+
+\subsubsection{Real data sets}
+\label{subsec:eval-dat-real}
+For uniformity and in order to be consistent, we sample from each of the following data sets the first $1,000$ entries that satisfy the configuration criteria that we discuss in detail in Section~\ref{subsec:eval-conf}.

 \paragraph{Copenhagen}~\cite{sapiezynski2019interaction}
 data set was collected via the smartphone devices of $851$ university students over a period of $4$ week as part of the Copenhagen Networks Study.
 Each device was configured to be discoverable by and to discover nearby Bluetooth devices every $5$ minutes.
 Upon discovery, each device registers (i)~the timestamp in seconds, (ii)~the device's unique identifier, (iii)~the unique identifier of the device that it discovered ($- 1$ when no device was found or $- 2$ for any non-participating device), and (iv)~the Received Signal Strength Indicator (RSSI) in dBm.
 Half of the devices have registered data at at least $81\%$ of the possible timestamps.
-From this data set, we utilized the $1,000$ first contacts out of $12,167$ valid unique contacts of the device with identifier `$449$'. \kat{why only the 1000 first contacts? why device 449? why only one device and not multiple ones, and then report the mean?}
+$3$ devices ($449$, $550$, $689$) satisfy our configuration criteria (Section~\ref{subsec:eval-conf}) within their first $1,000$ entries.
+From those $3$ devices, we picked the first one, i.e.,~device with identifier `$449$', and utilized its $1,000$ first entries out of $12,167$ unique valid contacts.
+% \kat{why only the 1000 first contacts? why device 449? why only one device and not multiple ones, and then report the mean?}
+% \mk{I explained why 449 and I added a general explanation in the intro of the subsection.}

 \paragraph{HUE}~\cite{makonin2018hue}
 contains the hourly energy consumption data of $22$ residential customers of BCHydro, a provincial power utility in British Columbia.
 The measurements for each residence are saved individually and each measurement contains (i)~the date (YYYY-MM-DD), (ii)~the hour, and (iii)~the energy consumption in kWh.
-In our experiments, we used the first $1,000$ out of $29,231$ measurements of the residence with identifier `$1$', average energy consumption equal to $0.88$kWh, and value range $[0.28$, $4.45]$. \kat{again, explain your choices. Moreover, you make some conclusions later on, based on the characteristics of the data set, for example the density of the measurement values. You should describe all these characteristics in these paragraphs.}
+In our experiments, we used the first residence, i.e.,~residence with identifier `$1$', that satisfies our configuration criteria (Section~\ref{subsec:eval-conf}) within its first $1,000$ entries. 
+In those entries, out of a total of $29,231$ measurements, we estimated an average energy consumption equal to $0.88$kWh and a value range within $[0.28$, $4.45]$.
+% \kat{again, explain your choices. Moreover, you make some conclusions later on, based on the characteristics of the data set, for example the density of the measurement values. You should describe all these characteristics in these paragraphs.}
+% \mk{OK}

 \paragraph{T-drive}~\cite{yuan2010t}
 consists of $15$ million GPS data points of the trajectories of $10,357$ taxis in Beijing, spanning a period of $1$ week and a total distance of $9$ million kilometers.
 The taxis reported their location data on average every $177$ seconds and $623$ meters approximately.
 Each vehicle registers (i)~the taxi unique identifier, (ii)~the timestamp (YYYY-MM-DD HH:MM:SS), (iii)~longitude, and (iv)~latitude.
 These measurements are stored individually per vehicle.
-We sampled the first $1000$ data items of the taxi with identifier `$2$'.\kat{again, explain your choices}
+We sampled the first $1000$ data items of the taxi with identifier `$2$', which satisfied our configuration criteria (Section~\ref{subsec:eval-conf}).
+% \kat{again, explain your choices}
+% \mk{OK}
+

 \subsubsection{Synthetic}
+\label{subsec:eval-dat-syn}
 We generated synthetic time series of length equal to $100$ timestamps, for which we varied the number and distribution of {\thethings}. 
-In this way, we have a controlled data set that we can use to study the behaviour of our proposal.
-\kat{more details needed. eg. what is the distributions and number of timestamps used? How many time series you generated? }
-We take into account only the temporal order of the points and the position of regular and {\thething} events within the series. \kat{why is the value not important? at the energy consumption, they mattered}
+In this way, we have a controlled data set that we can use to study the behavior of our proposal.
+% \kat{more details needed. eg. what is the distributions and number of timestamps used? How many time series you generated? }
+We take into account only the temporal order of the points and the position of regular and {\thething} events within the series.
+In Section~\ref{subsec:eval-conf}, we explain in more detail our configuration criteria.
+% \kat{why is the value not important? at the energy consumption, they mattered}


 \subsection{Configurations}
 \label{subsec:eval-conf}
+% \kat{add some info here.. what are the configurations for? What does landmark percentage refer to, and how does it matter? }
+We vary the {\thething} percentage (Section~\ref{subsec:eval-conf-lmdk}), i.e.,~the ratio of timestamps that we attribute to {\thethings} and regular events, in order to identify the limitations of our methodology.
+For each data set, we implement a privacy mechanism that injects noise related to the type of its attribute values and we tune the parameters of each mechanism accordingly (Section~\ref{subsec:eval-conf-prv}).
+Last but not least, we explain how we generate synthetic data sets with the desired degree of temporal correlation (Section~\ref{subsec:eval-conf-cor}).
+

-\kat{add some info here.. what are the configurations for? What does landmark percentage refer to, and how does it matter? }
 \subsubsection{{\Thething} percentage}
-
-In the Copenhagen data set, a landmark represents a time-stamp when a contact device is registered. 
-We achieve 
+\label{subsec:eval-conf-lmdk}
+In the Copenhagen data set, a {\thething} represents a timestamp when a contact device is registered.
+After identifying the unique contacts within the sample, we achieve each desired {\thethings} to regular events ratio by considering a list that contains a part of these contact devices.
+In more detail, we achieve 
 $0\%$ {\thethings} by considering an empty list of contact devices,
 $20\%$ by extending the list with $[3$, $6$, $11$, $12$, $25$, $29$, $36$, $39$, $41$, $46$, $47$, $50$, $52$, $56$, $57$, $61$, $63$, $78$, $80]$, 
 $40\%$ with $[81$, $88$, $90$, $97$, $101$, $128$, $130$, $131$, $137$, $145$, $146$, $148$, $151$, $158$, $166$, $175$, $176]$, 
 $60\%$ with $[181$, $182$, $192$, $195$, $196$, $201$, $203$, $207$, $221$, $230$, $235$, $237$, $239$, $241$, $254]$, 
 $80\%$ with $[260$, $282$, $287$, $289$, $290$, $291$, $308$, $311$, $318$, $323$, $324$, $330$, $334$, $335$, $344$, $350$, $353$, $355$, $357$, $358$, $361$, $363]$, and 
 $100\%$ by including all of the possible contacts.
-\kat{How did you decide which devices to add at each point?}
+% \kat{How did you decide which devices to add at each point?}
+% \mk{I discussed it earlier.}

-\kat{Say what time-stamps are landmarks in this data set. What is the consumption threshld?}In HUE, we get $0$\%, $20$\% $40$\%, $60$\%, $80$\%, and $100$\% {\thethings} by setting the energy consumption threshold below $0.28$kWh, $1.12$kWh, $0.88$kWh, $0.68$kWh, $0.54$kWh, $4.45$kWh respectively.
+% \kat{Say what time-stamps are landmarks in this data set. What is the consumption threshld?}
+% \mk{OK}
+In HUE, we consider as {\thethings} the events that have energy consumption values below a certain threshold.
+That is, we get $0$\%, $20$\% $40$\%, $60$\%, $80$\%, and $100$\% {\thethings} by setting the energy consumption threshold at $0.28$kWh, $1.12$kWh, $0.88$kWh, $0.68$kWh, $0.54$kWh, and $4.45$kWh respectively.

-In T-drive, a landmark represents the time-stamp of a stay point. We achieved the desired {\thething} percentages by utilizing the method of Li et al.~\cite{li2008mining} for detecting stay points in trajectory data.
+In T-drive, a {\thething} represents a location where a vehicle spend some time.
+We achieved the desired {\thething} percentages by utilizing the method of Li et al.~\cite{li2008mining} for detecting stay points in trajectory data.
 In more detail, the algorithm checks for each data item if each subsequent item is within a given distance threshold $\Delta l$ and measures the time period $\Delta t$ between the present point and the last subsequent point.
-We achieve $0$\%, $20$\% $40$\%, $60$\%, $80$\%, and $100$\% {\thethings} by setting the ($\Delta l$ in meters, $\Delta t$ in minutes) pairs input to the stay point discovery method as [($0$, $1000$), ($2095$, $30$), ($2790$, $30$), ($3590$, $30$), ($4825$, $30$), ($10350$, $30$)]. \kat{how did you come up with these numbers?}
+After analyzing the data and experimenting with different pairs of distance and time period, we achieve $0$\%, $20$\% $40$\%, $60$\%, $80$\%, and $100$\% {\thethings} by setting the ($\Delta l$ in meters, $\Delta t$ in minutes) pairs input to the stay point discovery method as [($0$, $1000$), ($2095$, $30$), ($2790$, $30$), ($3590$, $30$), ($4825$, $30$), ($10350$, $30$)].
+% \kat{how did you come up with these numbers?}

 We generated synthetic data with \emph{skewed} (the {\thethings} are distributed towards the beginning/end of the series), \emph{symmetric} (in the middle), \emph{bimodal} (both end and beginning), and \emph{uniform} (all over the time series) {\thething} distributions.
-In order to get {\thethings} with the above distribution features, we generate probability distributions with appropriate characteristics and sample from them, without replacement, the desired number of points.
-%The generated distributions are representative of the cases that we wish to examine during the experiments.
-For a left-skewed {\thething} distribution we would utilize a truncated distribution resulting from the restriction of the domain of a distribution to the beginning and end of the time series with its location shifted to the center of the right half of the series.
-For a right-skewed ....
-For a symmetric ..
-For a bimodal ..
-For uniform ...
-\kat{repeat for all kinds of distributions}
+In order to get {\thething} sets with the above distribution features, we generate probability distributions with restricted domain to the beginning and end of the time series, and sample from them, without replacement, the desired number of points.
+For each case, we place the location, i.e.,~centre, of the distribution accordingly.
+That is, for a symmetric we put the location in the middle of the time series and for a left/right skewed to the right/left.
+For the bimodal we combine two mirrored skewed distributions.
+Finally, for the uniform distribution we distribute the {\thethings} randomly throughout the time series.
 For consistency, we calculate the scale parameter depending on the length of the series by setting it equal to the series' length over a constant.


-\kat{The following paragraph does not belong in this section..}
-Notice that in our experiments, in the cases when we have $0\%$ and $100\%$ of the events being {\thethings}, we get the same behavior as in event- and user-level privacy respectively.
-This happens due the fact that at each timestamp we take into account only the data items at the current timestamp and ignore the rest of the time series (event-level) when there are no {\thethings}.
-Whereas, when each timestamp corresponds to a {\thething} we consider and protect all the events throughout the entire series (user-level).
-
-
 \subsubsection{Privacy parameters}
-
-\kat{Explain why you select each of these perturbation mechanisms for each of the datasets. Is the random response differential private? Mention it! }
-To perturb the contact tracing data of the Copenhagen data set, we utilize the \emph{random response} technique~\cite{wang2017locally} to report with probability $p = \frac{e^\varepsilon}{e^\varepsilon + 1}$ whether the current contact is a {\thething} or not.
+\label{subsec:eval-conf-prv}
+% \kat{Explain why you select each of these perturbation mechanisms for each of the datasets. Is the random response differential private? Mention it! }
+For all of te real data sets, we implement $\varepsilon$-differential privacy.
+To perturb the contact tracing data of the Copenhagen data set, we utilize the \emph{random response} technique~\cite{wang2017locally}, and at each timestamp we report truthfully, with probability $p = \frac{e^\varepsilon}{e^\varepsilon + 1}$, whether the current contact is a {\thething} or not.
 We randomize the energy consumption in HUE with the Laplace mechanism (described in detail in Section~\ref{subsec:prv-mech}).
-We inject noise to the spatial values in T-drive that we sample from the Planar Laplace mechanism~\cite{andres2013geo}.
+For T-drive, we perturb the location data wit noise that we sample from the Planar Laplace mechanism~\cite{andres2013geo}.

-We set the privacy budget $\varepsilon = 1$, and, for simplicity, we assume that for every query sensitivity it holds that $\Delta f = 1$. \kat{why don't you consider other values as well?}
+We set the privacy budget $\varepsilon = 1$ for all of our experiments and, for simplicity, we assume that for every query sensitivity it holds that $\Delta f = 1$.
+% \kat{why don't you consider other values as well?}
 For the experiments performed on the synthetic data sets, the original values to be released do not influence the outcome of our conclusions, thus we ignore them.
-\kat{why are the values not important for the synthetic dataset? This seems a little weird, when said out of context.. our goal is to perturb the values, but do not really care about the way we perturb our values?}
+% \kat{why are the values not important for the synthetic dataset? This seems a little weird, when said out of context.. our goal is to perturb the values, but do not really care about the way we perturb our values?}
 % Finally, notice that, depending on the results' variation, most diagrams are in logarithmic scale.


 \subsubsection{Temporal correlation}
-
-\kat{Did you find any correlation in the other data? Do you need the correlation matrix to be known a priori? Describe a little why you did not use the real data for correlations }
+\label{subsec:eval-conf-cor}
+% \kat{Did you find any correlation in the other data? Do you need the correlation matrix to be known a priori? Describe a little why you did not use the real data for correlations }
+Despite the inherent presence of temporal correlation in time series, it is challenging to correctly discover and quantify it.
+For this reason, and in order to create a more controlled environment for our experiments, we chose to conduct tests relevant to temporal correlation using synthetic data sets.
 We model the temporal correlation in the synthetic data as a \emph{stochastic matrix} $P$, using a \emph{Markov Chain}~\cite{gagniuc2017markov}.
 $P$ is a $n \times n$ matrix, where the element $P_{ij}$
 %at the $i$th row of the $j$th column that 
--- a/text/evaluation/main.tex
+++ b/text/evaluation/main.tex
@ -1,7 +1,6 @@
 \chapter{Evaluation}
 \label{ch:eval}
-
-In this chapter we present the experiments that we performed in order to evaluate {\thething} Privacy (Chapter~\ref{ch:lmdk-prv}) on real and synthetic data sets.
+In this chapter we present the experiments that we performed in order to evaluate {\thething} privacy (Chapter~\ref{ch:lmdk-prv}) on real and synthetic data sets.
 Section~\ref{sec:eval-dtl} contains all the details regarding the data sets the we used for our experiments along with the system configurations.
 Section~\ref{sec:eval-lmdk} evaluates the data utility of the {\thething} privacy mechanisms that we designed in Section~\ref{sec:thething} and investigates the behavior of the privacy loss under temporal correlation for different distributions of {\thethings}.
 Section~\ref{sec:eval-lmdk-sel} justifies our decisions while designing the privacy-preserving {\thething} selection component in Section~\ref{sec:theotherthing} and the data utility impact of the latter.
--- a/text/evaluation/summary.tex
+++ b/text/evaluation/summary.tex
@ -1,9 +1,10 @@
 \section{Summary}
 \label{sec:eval-sum}
-
 In this chapter we presented the experimental evaluation of the {\thething} privacy mechanisms and the privacy-preserving {\thething} selection mechanism that we developed in Chapter~\ref{ch:lmdk-prv}, on real and synthetic data sets.
-The Adaptive mechanism is the most reliable and best performing mechanism, in terms of overall data utility, with minimal tuning across most cases.
+The Adaptive mechanism is the most reliable and best performing mechanism, in terms of overall data utility, with minimal tuning across most of the cases.
 Skip performs optimally in data sets with a smaller target value range, where approximation fits best.
-The {\thething} selection component introduces a reasonable data utility decline to all of our mechanisms however, the Adaptive handles it well and bounds the data utility to higher levels compared to user-level protection.\kat{it would be nice to see it clearly on Figure 5.5. (eg, by including another bar that shows adaptive without landmark selection)}
+The {\thething} selection mechanism introduces a reasonable data utility decline to all of our mechanisms however, the Adaptive handles it well and bounds the data utility to higher levels compared to user-level protection.
+% \kat{it would be nice to see it clearly on Figure 5.5. (eg, by including another bar that shows adaptive without landmark selection)}
+% \mk{Done.}
 In terms of temporal correlation, we observe that under moderate and strong temporal correlation, a greater average regular--{\thething} event distance in a {\thething} distribution causes greater overall privacy loss.
-Finally, the contribution of the {\thething} privacy on enhancing the data quality, while preserving $\epsilon$-differential privacy is demonstrated by the fact that the selected, Adaptive mechanism provides better data quality than the user-level mechanism.
+Finally, the contribution of the {\thething} privacy on enhancing the data utility, while preserving $\epsilon$-differential privacy, is demonstrated by the fact that the selected Adaptive mechanism provides better data utility than the user-level mechanism.
--- a/text/evaluation/theotherthing.tex
+++ b/text/evaluation/theotherthing.tex
@ -1,25 +1,26 @@
 \section{Selection of landmarks}
 \label{sec:eval-lmdk-sel}
-
-In this section, we present the experiments on the methodology for the {\thethings} selection presented in Section~\ref{subsec:lmdk-sel-sol}, on the real and the synthetic data sets.
-With the experiments on the synthetic data sets (Section~\ref{subsec:sel-utl}) we show the normalized Euclidean and Wasserstein distances \kat{is this distance   the landmark distance that we saw just before ?   clarify } of the time series histograms for various distributions and {\thething} percentages.
+In this section, we present the experiments on the methodology for the {\thethings} selection presented in Section~\ref{subsec:lmdk-sel-sol}, on the real and synthetic data sets.
+With the experiments on the synthetic data sets (Section~\ref{subsec:sel-utl}) we show the normalized Euclidean and Wasserstein distance metrics (not to be confused with the temporal distances in Figure~\ref{fig:avg-dist})
+% \kat{is this distance   the landmark distance that we saw just before ?   clarify } 
+of the time series histograms for various distributions and {\thething} percentages.
 This allows us to justify our design decisions for our concept that we showcased in Section~\ref{subsec:lmdk-sel-sol}.
-With the experiments on the real data sets (Section~\ref{subsec:sel-prv}), we show the performance in terms of utility of our three {\thething} mechanisms in combination with the privacy preserving {\thething} selection component.
-\kat{Mention whether it improves the original proposal or not.}
+With the experiments on the real data sets (Section~\ref{subsec:sel-prv}), we show the performance in terms of utility of our three {\thething} mechanisms in combination with the privacy-preserving {\thething} selection mechanism, which enhances the privacy protection of our concept.
+% \kat{Mention whether it improves the original proposal or not.}


 \subsection{{\Thething} selection utility metrics}
 \label{subsec:sel-utl}
-
 Figure~\ref{fig:sel-dist} demonstrates the normalized distance that we obtain when we utilize either (a)~the Euclidean or (b)~the Wasserstein distance metric to obtain a set of {\thethings} including regular events.

 \begin{figure}[htp]
  \centering
  \subcaptionbox{Euclidean\label{fig:sel-dist-norm}}{%
-    \includegraphics[width=.5\linewidth]{evaluation/sel-dist-norm}%
+    \includegraphics[width=.49\linewidth]{evaluation/sel-dist-norm}%
  }%
+  \hfill
  \subcaptionbox{Wasserstein\label{fig:sel-dist-emd}}{%
-    \includegraphics[width=.5\linewidth]{evaluation/sel-dist-emd}%
+    \includegraphics[width=.49\linewidth]{evaluation/sel-dist-emd}%
  }%
  \caption{The normalized (a)~Euclidean, and (b)~Wasserstein distance of the generated {\thething} sets for different {\thething} percentages.}
  \label{fig:sel-dist}
@ -29,67 +30,71 @@ Comparing the results of the Euclidean distance in Figure~\ref{fig:sel-dist-norm
 % (1 + (0.25 + 0.25 + 0.45 + 0.45)/4 + (0.25 + 0.25 + 0.3 + 0.3)/4 + (0.2 + 0.2 + 0.2 + 0.2)/4 + (0.15 + 0.15 + 0.15 + 0.15)/4)/6
 % (1 + (0.1 + 0.1 + 0.25 + 0.25)/4 + (0.075 + 0.075 + .15 + 0.15)/4 + (0.075 + 0.075 + 0.1 + 0.1)/4 + (0.025 + 0.025 + 0.025 + 0.025)/4)/6
 The maximum difference per {\thething} percentage is approximately $0.2$ for the former and $0.15$ for the latter between the bimodal and skewed {\thething} distributions.
-Overall, the Euclidean achieves a mean normalized distance of $0.3$ and the Wasserstein $0.2$.
-Therefore, and by observing Figure~\ref{fig:sel-dist}, the Wasserstein distance demonstrates a less consistent performance and less linear behavior among all possible {\thething} distributions.
-Thus, we choose to utilize the Euclidean distance metric for the implementation of the privacy-preserving {\thething} selection in Section~\ref{subsec:lmdk-sel-sol}.
+Overall, the Euclidean distance achieves a mean normalized distance of $0.3$ while the Wasserstein distance a mean normalized distance that is equal to $0.2$.
+Therefore, and by observing Figure~\ref{fig:sel-dist}, Wasserstein demonstrates a less consistent performance and less linear behavior among all possible {\thething} distributions.
+Thus, we choose to utilize the Euclidean distance metric for the implementation of the privacy-preserving {\thething} selection mechanism in Section~\ref{subsec:lmdk-sel-sol}.


 \subsection{Privacy budget tuning}
 \label{subsec:sel-eps}
-
-In Figure~\ref{fig:sel-eps} we test the Uniform model in real data by investing different ratios ($1$\%, $10$\%, $25$\%, and $50$\%) of the available privacy budget $\varepsilon$ in the {\thething} selection mechanism, in order to figure out the optimal ratio value.
+In Figure~\ref{fig:sel-eps} we test the Uniform mechanism in real data by investing different ratios ($1$\%, $10$\%, $25$\%, and $50$\%) of the available privacy budget $\varepsilon$ in the {\thething} selection mechanism and the remaining to perturbing the data values, in order to figure out the optimal ratio value.
 Uniform is our baseline implementation, and hence allows us to derive more accurate conclusions in this case.
-In general, greater ratios will result in more accurate, i.e.,~smaller, {\thething} sets and less accurate values in the released data sets.
+In general, we are expecting to observe that greater ratios will result in more accurate, i.e.,~smaller, {\thething} sets and less accurate values in the released data.

 \begin{figure}[htp]
  \centering
  \subcaptionbox{Copenhagen\label{fig:copenhagen-sel-eps}}{%
-    \includegraphics[width=.5\linewidth]{evaluation/copenhagen-sel-eps}%
+    \includegraphics[width=.49\linewidth]{evaluation/copenhagen-sel-eps}%
  }%
  \hspace{\fill}
  \\ \bigskip
  \subcaptionbox{HUE\label{fig:hue-sel-eps}}{%
-    \includegraphics[width=.5\linewidth]{evaluation/hue-sel-eps}%
+    \includegraphics[width=.49\linewidth]{evaluation/hue-sel-eps}%
  }%
+  \hfill
  \subcaptionbox{T-drive\label{fig:t-drive-sel-eps}}{%
-    \includegraphics[width=.5\linewidth]{evaluation/t-drive-sel-eps}%
+    \includegraphics[width=.49\linewidth]{evaluation/t-drive-sel-eps}%
  }%
-  \caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages. We apply the Uniform {\thething} privacy model and vary the ratio of the privacy budget $\varepsilon$ that we allocate to the {\thething} selection component.}
+  \caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages. We apply the Uniform {\thething} privacy mechanism and vary the ratio of the privacy budget $\varepsilon$ that we allocate to the {\thething} selection mechanism.}
  \label{fig:sel-eps}
 \end{figure}

-The application of the randomized response mechanism, in the Copenhagen data set, is tolerant to the fluctuations of the privacy budget.
-For HUE and T-drive, we observe that our implementation performs better for lower ratios, e.g.,~$0.01$, where we end up allocating the majority of the available privacy budget to the data release process instead of the {\thething} selection component.
-The results of this experiment indicate that we can safely allocate the majority of $\varepsilon$ for publishing the data values, and therefore achieve better data utility, while providing more robust privacy protection to the {\thething} timestamp set.
+The application of the randomized response mechanism, in the Copenhagen data set, is tolerant to the fluctuations of the privacy budget and maintains a relatively constant performance in terms of data utility.
+For HUE and T-drive, we observe that our implementation performs better for lower ratios, e.g.,~$0.01$, where we end up allocating the majority of the available privacy budget to the data release process instead of the {\thething} selection mechanism.
+The results of this experiment indicate that we can safely allocate the majority of $\varepsilon$ for publishing the data values, and therefore achieve better data utility, while providing more robust privacy protection to the {\thething} set.


 \subsection{Budget allocation and {\thething} selection}
 \label{subsec:sel-prv}
-
-Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptive (see Section~\ref{subsec:lmdk-mechs}) in combination with the {\thething} selection component.
+Figure~\ref{fig:real-sel} exhibits the performance of Skip, Uniform, and Adaptive mechanisms (presented in detail in Section~\ref{subsec:lmdk-mechs}) in combination with the {\thething} selection mechanism (Section~\ref{subsec:lmdk-sel-sol}).

 \begin{figure}[htp]
  \centering
  \subcaptionbox{Copenhagen\label{fig:copenhagen-sel}}{%
-    \includegraphics[width=.5\linewidth]{evaluation/copenhagen-sel}%
+    \includegraphics[width=.49\linewidth]{evaluation/copenhagen-sel}%
  }%
-  \hspace{\fill}
+  \hfill
  \\ \bigskip
  \subcaptionbox{HUE\label{fig:hue-sel}}{%
-    \includegraphics[width=.5\linewidth]{evaluation/hue-sel}%
+    \includegraphics[width=.49\linewidth]{evaluation/hue-sel}%
  }%
+  \hfill
  \subcaptionbox{T-drive\label{fig:t-drive-sel}}{%
-    \includegraphics[width=.5\linewidth]{evaluation/t-drive-sel}%
+    \includegraphics[width=.49\linewidth]{evaluation/t-drive-sel}%
  }%
-  \caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages.}
+  \caption{
+    The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data, for different {\thething} percentages, with the incorporation of the privacy-preserving {\thething} selection mechanism.
+    The light short horizontal lines indicate the corresponding measurements from Figure~\ref{fig:real} without the {\thething} selection mechanism.
+  }
  \label{fig:real-sel}
 \end{figure}

-In comparison with the utility performance without the {\thething} selection component (Figure~\ref{fig:real}), we notice a slight deterioration for all three models.
-This is natural since we allocated part of the available privacy budget to the privacy-preserving {\thething} selection component, which in turn increased the number of {\thethings}.
-Therefore, there is less privacy budget available for data publishing throughout the time series for $0$\% and $100$\% {\thethings}.
-\kat{why not for the other percentages?}
-Skip performs best in our experiments with HUE, due to the low range in the energy consumption and the high scale of the Laplace noise, which it avoids due to the employed approximation.
-However, for the Copenhagen data set and T-drive Skip attains greater mean absolute error than the user-level protection scheme, which exposes no benefit w.r.t. user-level protection.  
-Overall, Adaptive has a consistent performance in terms of utility for all of the data sets that we experimented with, and always outperforms the user-level privacy.
+In comparison with the utility performance without the {\thething} selection mechanism (light short horizontal lines), we notice a slight deterioration for all three mechanisms.
+This is natural since we allocated part of the available privacy budget to the privacy-preserving {\thething} selection mechanism, which in turn increased the number of {\thethings}, except for the case of $100$\% {\thethings}.
+Therefore, there is less privacy budget available for data publishing throughout the time series. 
+% for $0$\% and $100$\% {\thethings}.
+% \kat{why not for the other percentages?}
+Skip performs best in our experiments with HUE, due to the low range in the energy consumption and the high scale of the Laplace noise that it avoids due to the employed approximation.
+However, for the Copenhagen data set and T-drive, Skip attains greater mean absolute error than the user-level protection scheme, which exposes no benefit with respect to user-level protection.
+Overall, Adaptive has a consistent performance in terms of utility for all of the data sets that we experimented with, and almost always outperforms the user-level privacy.
 Thus, it is selected as the best mechanism to use in general.
--- a/text/evaluation/thething.tex
+++ b/text/evaluation/thething.tex
@ -1,66 +1,90 @@
-\section{Landmark events}
+\section{{\Thething} events}
 \label{sec:eval-lmdk}
-
 % \kat{After discussing with Dimitris, I thought you are keeping one chapter for the proposals of the thesis. In this case, it would be more clean to keep the theoretical contributions in one chapter and the evaluation in a separate chapter. }
 % \mk{OK.}
 In this section, we present the experiments that we performed, to test the methodology that we presented in Section~\ref{subsec:lmdk-sol}, on real and synthetic data sets. 

-With the experiments on the real data sets (Section~\ref{subsec:lmdk-expt-bgt}), we show the performance in terms of data utility of our three {\thething}  privacy budget allocation schemes: Skip, Uniform and Adaptive.
-We define data utility as the Mean Absolute Error introduced by the privacy mechanism.
-We compare with the event and user differential privacy, and show that in the general case, {\thething} privacy allows for better data utility than user differential privacy.
-
-With the experiments on the synthetic data sets (Section~\ref{subsec:lmdk-expt-cor}) we show the privacy loss \kat{in the previous set of experiments we were measuring the MAE, now we are measuring the privacy loss... Why is that? Isn't it two sides of the same coin? }by our framework when tuning the size and statistical characteristics of the input {\thething} set $L$ with special emphasis on how the privacy loss under temporal correlation is affected by the number and distribution of the {\thethings}.
-\kat{mention briefly what you observe}
+With the experiments on the real data sets (Section~\ref{subsec:lmdk-expt-bgt}), we show the performance in terms of data utility of our three {\thething} privacy mechanisms: Skip, Uniform and Adaptive.
+We define data utility as the mean absolute error introduced by the privacy mechanism.
+We compare with the event- and user-level differential privacy protection levels, and show that, in the general case, {\thething} privacy allows for better data utility than user-level differential privacy while balancing between the two protection levels.

+With the experiments on the synthetic data sets (Section~\ref{subsec:lmdk-expt-cor}) we show the overall privacy loss,
+% \kat{in the previous set of experiments we were measuring the MAE, now we are measuring the privacy loss... Why is that? Isn't it two sides of the same coin? }
+i.e.,~the privacy budget $\varepsilon$, under temporal correlation within our framework when tuning the size and statistical characteristics of the input {\thething} set $L$.
+% \kat{mention briefly what you observe}
+We observe that a greater average {\thething}--regular event distance in a time series can result into greater overall privacy loss under moderate and strong temporal correlation.


 \subsection{Budget allocation schemes}
 \label{subsec:lmdk-expt-bgt}

 Figure~\ref{fig:real} exhibits the performance of the three mechanisms, Skip, Uniform, and Adaptive applied on the three data sets that we study.
+Notice that, in the cases when we have $0\%$ and $100\%$ of the events being {\thethings}, we get the same behavior as in event- and user-level privacy respectively.
+This happens due the fact that at each timestamp we take into account only the data items at the current timestamp and ignore the rest of the time series (event-level) when there are no {\thethings}.
+Whereas, when each timestamp corresponds to a {\thething} we consider and protect all the events throughout the entire series (user-level).
 % For the Geolife data set (Figure~\ref{fig:geolife}), Skip has the best performance (measured in Mean Absolute Error, in meters) because it invests the most budget overall at every regular event, by approximating the {\thething} data based on previous releases.
 % Due to the data set's high density (every $1$--$5$ seconds or every $5$--$10$ meters per point) approximating constantly has a low impact on the data utility.
 % On the contrary, the lower density of the T-drive data set (Figure~\ref{fig:t-drive}) has a negative impact on the performance of Skip.

-For the Copenhagen data set (Figure~\ref{fig:copenhagen}), Adaptive has a constant\kat{it is not constant, for 0 it is much lower} overall performance and performs best for $0$\%, $60$\%, and $80$\% {\thethings} \kat{this is contradictory: you say that it is constant overall, and then that it is better for certain percentages. }.
-We notice that for $0$\% {\thethings}, it achieves better utility than the event-level protection.\kat{what does this mean? how is it possible?}
-The Skip model excels, compared to the others, at cases where it needs to approximate $20$\%--$40$\% or $100$\% of the times.\kat{it seems a little random.. do you have an explanation? (rather few times or all?)}
-
-The combination of the small range of measurements in HUE ($[0.28$, $4.45]$ with an average of $0.88$kWh) and the large scale in the Laplace mechanism, results in a low mean absolute error for Skip (Figure~\ref{fig:hue}).
-In general, a scheme that favors approximation over noise injection would achieve a better performance in this case.
-\kat{why?explain}
-However, the Adaptive model performs by far better than Uniform and strikes a nice balance\kat{???} between event- and user-level protection for all {\thething} percentages.
-
-In the T-drive data set (Figure~\ref{fig:t-drive}), the Adaptive mechanism outperforms Uniform by $10$\%--$20$\% for all {\thething} percentages greater than $40$\% and Skip by more than $20$\%.
-The lower density (average distance of $623$m) of the T-drive data set has a negative impact on the performance of Skip; republishing a previous perturbed value is now less accurate than perturbing the new location.
-
-
 \begin{figure}[htp]
  \centering
  \subcaptionbox{Copenhagen\label{fig:copenhagen}}{%
-    \includegraphics[width=.5\linewidth]{evaluation/copenhagen}%
+    \includegraphics[width=.49\linewidth]{evaluation/copenhagen}%
  }%
-  \hspace{\fill}
  \\ \bigskip
  \subcaptionbox{HUE\label{fig:hue}}{%
-    \includegraphics[width=.5\linewidth]{evaluation/hue}%
+    \includegraphics[width=.49\linewidth]{evaluation/hue}%
  }%
+  \hfill
  \subcaptionbox{T-drive\label{fig:t-drive}}{%
-    \includegraphics[width=.5\linewidth]{evaluation/t-drive}%
+    \includegraphics[width=.49\linewidth]{evaluation/t-drive}%
  }%
  \caption{The mean absolute error (a)~as a percentage, (b)~in kWh, and (c)~in meters of the released data for different {\thething} percentages.}
  \label{fig:real}
 \end{figure}

-In general, we can claim that the Adaptive is the most reliable and best performing mechanism with minimal tuning\kat{what does minimal tuning mean?}, if we take into consideration the drawbacks of the Skip mechanism mentioned in Section~\ref{subsec:lmdk-mechs}. \kat{you can mention them also here briefly, and give the pointer for the section}
-Moreover, designing a data-dependent sampling scheme \kat{what would be the main characteristic of the scheme? that it picks landmarks how?} would possibly\kat{possibly is not good enough, if you are sure remove it. Otherwise mention that more experiments need to be done?} result in better results for Adaptive.
+For the Copenhagen data set (Figure~\ref{fig:copenhagen}), Adaptive has an
+% constant
+% \kat{it is not constant, for 0 it is much lower}
+overall consistent performance and works best for $60$\% and $80$\% {\thethings}.
+% \kat{this is contradictory: you say that it is constant overall, and then that it is better for certain percentages. }.
+% \mk{`Consistent' is the right word.}
+We notice that for $0$\% {\thethings}, it achieves better utility than the event-level protection
+% \kat{what does this mean? how is it possible?}
+due to the combination of more available privacy budget per timestamp (due to the absence of {\thethings}) and its adaptive sampling methodology.
+The Skip model excels, compared to the others, at cases where it needs to approximate $20$\%, $40$\%, or $100$\% of the times.
+% \kat{it seems a little random.. do you have an explanation? (rather few times or all?)}
+In general, we notice that, for this data set, it is more beneficial to either invest more privacy budget per event or prefer approximation over introducing randomization.
+
+The combination of the small range of measurements in HUE ($[0.28$, $4.45]$ with an average of $0.88$kWh) and the large scale in the Laplace mechanism, allows for schemes that favor approximation over noise injection to achieve a better performance in terms of data utility.
+Hence, Skip (Figure~\ref{fig:hue}) achieves a constant low mean absolute error.
+% \kat{why?explain}
+Regardless, the Adaptive mechanism performs by far better than Uniform and
+% strikes a nice balance\kat{???} 
+balances between event- and user-level protection for all {\thething} percentages.
+
+In the T-drive data set (Figure~\ref{fig:t-drive}), the Adaptive mechanism outperforms Uniform by $10$\%--$20$\% for all {\thething} percentages greater than $40$\% and Skip by more than $20$\%.
+The lower density (average distance of $623$m) of the T-drive data set has a negative impact on the performance of Skip; republishing a previous perturbed value is now less accurate than perturbing the new location.
+
+Principally, we can claim that the Adaptive is the most reliable and best performing mechanism,
+% with a minimal and generic parameter tuning
+% \kat{what does minimal tuning mean?}
+if we take into consideration the drawbacks of the Skip mechanism, particularly in spatiotemporal data, e.g., sporadic location data publishing~\cite{gambs2010show, russell2018fitness} or misapplying location cloaking~\cite{xssfopes2020tweet}, that could lead to the indication of privacy-sensitive attribute values.
+% (mentioned in Section~\ref{subsec:lmdk-mechs})
+% \kat{you can mention them also here briefly, and give the pointer for the section}
+Moreover, implementing a more advanced and data-dependent sampling method
+% \kat{what would be the main characteristic of the scheme? that it picks landmarks how?}
+that accounts for changes in the trends of the input data and adapts its rate accordingly, would 
+% possibly
+% \kat{possibly is not good enough, if you are sure remove it. Otherwise mention that more experiments need to be done?}
+result in a more effective budget allocation that would improve the performance of Adaptive in terms of data utility.


 \subsection{Temporal distance and correlation}
 \label{subsec:lmdk-expt-cor}

-As previously mentioned, temporal correlations are inherent in continuous publishing, and they are the cause of supplementary privacy leakage in the case of privacy preserving data publication.
-In this section, we are interested in studying the effect that the distance of the {\thethings} from every event have on the leakage caused by temporal correlations. 
+As previously mentioned, temporal correlation is inherent in continuous publishing, and they are the cause of supplementary privacy loss in the case of privacy-preserving data publication.
+In this section, we are interested in studying the effect that the distance of the {\thethings} from every event have on the loss caused by temporal correlation. 

 Figure~\ref{fig:avg-dist} shows a comparison of the average temporal distance of the events from the previous/next {\thething} or the start/end of the time series for various distributions in our synthetic data.
 More specifically, we model the distance of an event as the count of the total number of events between itself and the nearest {\thething} or the series edge.
@ -68,7 +92,7 @@ More specifically, we model the distance of an event as the count of the total n
 \begin{figure}[htp]
  \centering
  \includegraphics[width=.5\linewidth]{evaluation/avg-dist}%
-  \caption{Average temporal distance of the events from the {\thethings} for different {\thethings} percentages within a time series in various {\thethings} distributions.}
+  \caption{Average temporal distance of regular events from the {\thethings} for different {\thethings} percentages within a time series in various {\thething} distributions.}
  \label{fig:avg-dist}
 \end{figure}

@ -79,28 +103,39 @@ On the contrary, distributing the {\thethings} at one part of the sequence, as i
 This study provides us with different distance settings that we are going to use in the subsequent temporal leakage study.

 Figure~\ref{fig:dist-cor} illustrates a comparison among the aforementioned distributions regarding the overall privacy loss under (a)~weak, (b)~moderate, and (c)~strong temporal correlation degrees.
-The line shows the overall privacy loss---for all cases of {\thethings} distribution---without temporal correlation.
+The line shows the overall privacy loss---for all cases of {\thething} distribution---without temporal correlation.

 \begin{figure}[htp]
  \centering
  \subcaptionbox{Weak correlation\label{fig:dist-cor-wk}}{%
-    \includegraphics[width=.5\linewidth]{evaluation/dist-cor-wk}%
+    \includegraphics[width=.49\linewidth]{evaluation/dist-cor-wk}%
  }%
-  \hspace{\fill}
+  \hfill
  \\ \bigskip
  \subcaptionbox{Moderate correlation\label{fig:dist-cor-mod}}{%
-    \includegraphics[width=.5\linewidth]{evaluation/dist-cor-mod}%
+    \includegraphics[width=.49\linewidth]{evaluation/dist-cor-mod}%
  }%
+  \hfill
  \subcaptionbox{Strong correlation\label{fig:dist-cor-stg}}{%
-    \includegraphics[width=.5\linewidth]{evaluation/dist-cor-stg}%
+    \includegraphics[width=.49\linewidth]{evaluation/dist-cor-stg}%
  }%
-  \caption{Privacy loss \kat{what is the unit for privacy loss? I t should appear on the diagram} for different {\thethings} percentages and distributions under (a)~weak, (b)~moderate, and (c)~strong degrees of temporal correlation.
-  The line shows the overall privacy loss without temporal correlation.}
+  \caption{
+    The overall privacy loss (privacy budget $\varepsilon$)
+    % \kat{what is the unit for privacy loss? I t should appear on the diagram}
+    % \mk{It's the privacy budget epsilon}
+    for different {\thething} percentages and distributions under (a)~weak, (b)~moderate, and (c)~strong degrees of temporal correlation.
+    The line shows the overall privacy loss without temporal correlation.
+  }
  \label{fig:dist-cor}
 \end{figure}

-In combination with Figure~\ref{fig:avg-dist}, we conclude that a greater average event--{\thething} event \kat{it was even, I changed it to event but do not know what youo want ot say} distance in a distribution can result into greater overall privacy loss under moderate and strong temporal correlation.
+In combination with Figure~\ref{fig:avg-dist}, we conclude that a greater average {\thething}--regular event
+% \kat{it was even, I changed it to event but do not know what youo want ot say}
+% \mk{Fixed}
+distance in a distribution can result into greater overall privacy loss under moderate and strong temporal correlation.
 This is due to the fact that the backward/forward privacy loss accumulates more over time in wider spaces without {\thethings} (see Section~\ref{sec:correlation}).
 Furthermore, the behavior of the privacy loss is as expected regarding the temporal correlation degree: a stronger correlation degree generates higher privacy loss while widening the gap between the different distribution cases.
-On the contrary, a weaker correlation degree makes it harder to differentiate among the {\thethings} distributions.
-The privacy loss under a weak correlation degree converge \kat{with what?}.
+On the contrary, a weaker correlation degree makes it harder to differentiate among the {\thething} distributions.
+The privacy loss under a weak correlation degree converge
+% \kat{with what?}
+with all possible distributions for all {\thething} percentages.