diff --git a/text/introduction/main.tex b/text/introduction/main.tex index 786d3bd..3b5badc 100644 --- a/text/introduction/main.tex +++ b/text/introduction/main.tex @@ -3,15 +3,15 @@ \nnfootnote{This chapter was presented during the $11$th International Workshop on Information Search, Integration, and Personalization~\cite{kotzinos2016data} and at the DaQuaTa International Workshop~\cite{kotzinos2017data}, as well as at the S{\~a}o Paulo School of Advanced Science on Smart Cities~\cite{katsomallos2016measuring}.} Data privacy is becoming an increasingly important issue, both at a technical and at a societal level, and introduces various challenges ranging from the way we share and publish data sets to the way we use online and mobile services. -Personal information, also described as \emph{microdata}, acquired increasing value and are in many cases used as the `currency'~\cite{economist2016data} to pay for access to various services, i.e.,~users are asked to exchange their personal information with the service provided. +Personal information, also described as \emph{microdata}, acquired increasing value and is in many cases used as the `currency'~\cite{economist2016data} to pay for access to various services, i.e.,~users are asked to exchange their personal information with the service provided. This is particularly true for many \emph{Location-Based Services} (LBSs), e.g.,~Google Maps~\cite{gmaps}, Waze~\cite{waze}, etc. These services exchange their `free' service with collecting and using user-generated data, such as timestamped geolocalized information. -Besides navigation and location-based services, social media applications, e.g.,~Facebook~\cite{facebook}, Twitter~\cite{twitter}, Foursquare~\cite{foursquare}, etc. take advantage of user-generated and user-related data, to make relevant recommendations and show personalized advertisement. +Besides navigation and location-based services, social media applications, e.g.,~Facebook~\cite{facebook}, Twitter~\cite{twitter}, Foursquare~\cite{foursquare}, etc. take advantage of user-generated and user-related data, to make relevant recommendations and show personalized advertisements. In this case, the location is also part of the important required personal data to be shared. Last but not least, \emph{data brokers}, e.g.,~Experian~\cite{experian}, TransUnion~\cite{transunion}, Acxiom~\cite{acxiom}, etc. collect data from public and private resources, e.g.,~censuses, bank card transaction records, voter registration lists, etc. Most of these data are georeferenced and contain directly or indirectly location information; protecting the location of the user has become one of the most important privacy goals so far. -These different sources and types of data, on the one hand give useful feedback to the involved users and/or services, and on the other hand, when combined together, provide valuable information to various internal/external analytical services. +On the one hand, these different sources and types of data give useful feedback to the involved users and/or services, and on the other hand, when combined together, provide valuable information to various internal/external analytical services. While these activities happen within the boundaries of the law~\cite{tankard2016gdpr}, it is important to be able to protect the privacy (by anonymizing, perturbing, encrypting, etc.) the corresponding data before sharing, and to take into account the possibility of correlating, linking, and crossing diverse independent data sets. Especially the latter is becoming quite important in the era of Big Data, where the existence of diverse linked data sets is one of the promises; as an example, one can refer to the discussion on Entity Resolution problems using Linked Open Data in~\cite{efthymiou2015big}. In some cases, personal data might be so representative that even if de-identified, when integrated with a small amount of external data, one can trace back to their original source. @@ -60,7 +60,7 @@ This specific subfield of data privacy becomes increasingly important since it: \end{enumerate} Additionally, data in continuous data publishing use cases require a timely processing because their value usually decreases over time depending on the use case as demonstrated in Figure~\ref{fig:data-value}. -For this reason, we provide an insight into time-related properties of the algorithms, e.g.,~if they work on infinite, real-time data, or if they take into consideration existing data dependencies. +For this reason, we provide an insight into time-related properties of the algorithms, e.g.,~if they work on finite or infinite data, or if they take into consideration any underlying data dependence. The importance of continuous data publishing is stressed by the fact that, commonly, many types of data have such properties, with geospatial data being a prominent case. A few examples include---but are not limited to---data being produced while tracking the movement of individuals for various purposes (where data might also need to be privacy-protected in real-time and in a continuous fashion); crowdsourced data that are used to report measurements, such as noise or pollution (where again we have a continuous timestamped and usually georeferenced stream of data); and even isolated data items that might include location information, such as photographs or social media posts. Typically, in such cases, we have a collection of data referring to the same individual or set of individuals over a period of time, which can also be infinite.