diff --git a/text/preliminaries/data.tex b/text/preliminaries/data.tex index 086428f..f8f532d 100644 --- a/text/preliminaries/data.tex +++ b/text/preliminaries/data.tex @@ -3,18 +3,21 @@ \subsection{Data categories} \label{subsec:data-categories} +\kat{Again, the title of the thesis is user-generated data, so there should exist also a distinction between user-generated and third party generated data. Hospital data for example, would fall in the third party generated data.} +In this thesis, we are interested in data that contain information about individuals and their actions, as these are highly privacy-sensitive. +We firstly classify data based on their content \kat{'based on their content' reminds me of health data, trajectories, etc., not if they are aggregated or not. }: -The data that we are interested in, contain information about individuals and their actions. -We firstly classify the data based on their content: - +\kat{Use full sentences, even in the bullets. } \begin{itemize} - \item \emph{Microdata}---the data items in their raw, usually tabular, form pertaining to individuals or objects. + \item \emph{Microdata}---the data items \kat{define data item} in their raw, usually tabular, form pertaining to individuals or objects \kat{objects?}. \item \emph{Statistical data}---the outcome of statistical processes on microdata. \end{itemize} +\kat{I miss the definition of data. You speak of data items, data values, what is the difference to data?} An example of microdata is displayed in Table~\ref{tab:snapshot-micro}, while an example of statistical data in Table~\ref{tab:snapshot-statistical}. -Data, in either of these two forms, may have a special property called~\emph{continuity}, i.e.,~their values change and can be observed through time. -Depending on the span of observation, we distinguish the following categories: +Data, in either of these two forms, may have a special property called~\emph{continuity}, i.e.,~their values change and can be observed through time. \kat{The way that you define it here reminds temporal data. What is the difference?} +\kat{If you say that data may have a special property called continuity, we wonder about the existence of other properties. Be more explicit on why you choose to mention only this property.} +Depending on the span of the observation, we distinguish the following categories: \begin{itemize} \item \emph{Finite data}---data are observed during a predefined time interval. @@ -23,14 +26,16 @@ Depending on the span of observation, we distinguish the following categories: \begin{example} \label{ex:continuous} - Extending Example~\ref{ex:snapshot}, Table~\ref{tab:continuous} shows an example of continuous data observation, by introducing one data table for each consecutive timestamp. - The two data tables, over the time-span $[t_1, t_2]$ are an example of finite data. + Extending Example~\ref{ex:snapshot}, \kat{Maybe put these three tables in a Figure instead of a table?} Table~\ref{tab:continuous} shows an example of continuous data observation \kat{maybe mention explicitly before what is data observation and continuous data observation }, by introducing one data table for each consecutive timestamp. + The two data tables over the time-span $[t_1, t_2]$ are an example of finite data. Infinite data are the whole series of data obtained over the period~$[t_1, \infty)$ (infinity is denoted by `\dots'). \includetable{continuous} \end{example} +\kat{Why isn't here the presentation of sequential and incremental in bullets?} + We further define two sub-categories applicable to both finite and infinite data: \emph{sequential} and \emph{incremental} data; these two subcategories are not exhaustive, i.e.,~not all data sets belong to the one or the other category. In sequential data, the value of the observed variable changes, depending on its previous value. For example, trajectories are finite sequences of location stamps, as naturally the position at each timestamp is connected to the position at the previous timestamp.