Table~\ref{tab:micro} summarizes the literature for the Microdata category.
Each reviewed work is abstractly described in this table, by its category (finite or infintite), its publishing mode (batch or streaming) and scheme(global or local), the level of privacy achieved (user, event, w-event), the attacks addressed, the privacy operation applied, and the base method it is built upon.
We observe that privacy-preserving algorithms for microdata rely mostly on $k$-anonymity or derivatives of it.
Ganta et al.~\cite{ganta2008composition} revealed that $k$-anonymity methods are vulnerable to complementary release attacks (or \emph{composition attacks} in the original publication).
Consequently, the research community proposed solutions based on $k$-anonymity, focusing on different threats linked to continuous publication, as we review later on.
However, notice that only a couple~\cite{li2016hybrid,shmueli2015privacy}
of the following works assume that data sets are privacy-protected \emph{independently} of one another, meaning that the publisher is oblivious of the rest of the publications.
On the other side, algorithms based on differential privacy are not concerned with so specific attacks as, by definition, differential privacy considers that the adversary may possess any kind of background knowledge.
Moreover, more recent works consider also data dependencies
%are considered for differential privacy algorithms,
to account for the extra privacy loss entailed by them.
We begin the discussion with the works designed for microdata as finite observations (Section~\ref{subsec:micro-finite}), to continue with the infinite observations setting (Section~\ref{subsec:micro-infinite}).
% - complementary release (form the quasi-identifiers from joining releases)
% - user
% - k-anonymity
% - generalization + specialisation
\hypertarget{wang2006anonymizing}{Wang and Fung}~\cite{wang2006anonymizing} address the problem of anonymously releasing different projections (i.e.,~subsets of the attributes) of the same data set in subsequent timestamps.
More precisely, the authors want to protect individual information that could be revealed from joining various releases of the same data set.
To do so, instead of locating the quasi-identifiers in a single release, the authors suggest that the identifiers may span the current and all previous releases of the (projections of the) data set.
Then, the proposed method uses the join of the different releases on the common identifying attributes.
The goal is to generalize the identifying attributes of the current release, given that previous releases are immutable.
The generalization is performed in a top down manner, meaning that the attributes are initially over-generalized, and step by step are specialized until they reach the point when predefined quality and privacy requirements are met.
The privacy requirement is the so-called \emph{($X$, $Y$)-privacy} for a threshold $k$, meaning that the identifying attributes in $X$ are linked with at most $k$ sensitive values in $Y$, in the join of the previously released and current data sets.
The quality requirement can be tuned into the framework.
Namely, the authors propose three alternatives: the reduction of the class entropy~\cite{quinlan2014c4, shannon2001mathematical}, the notion of distortion, and the discernibility~\cite{bayardo2005data}.
The anonymization algorithm for releasing a data set in the existence of a previously released data set takes into account the scalability and performance problems that a join among those two may entail.
Still, when many previous releases exist, the complexity would remain high.
\hypertarget{fung2008anonymity}{Fung et al.}~\cite{fung2008anonymity} introduce the problem of privately releasing continuous incremental data sets.
As a reminder, the invariant of this kind of releases is that at every timestamp $t_i$, the records previously released at $t_j$ ($j < i$) are released again together with a set of new records.
The authors first focus in two consecutive releases and describe three classes of possible attacks, which fall under the general category of complementary attacks.
They name these attacks \emph{correspondence attacks} because they rely on the principle that all tuples from an original data set $D_1$, from timestamp $t_1$, correspond to a tuple in the data set $D_2$, from timestamp $t_2$.
Naturally, the opposite does not hold, as tuples added at $t_2$ do not exist in $D_1$.
Assuming that the attacker knows the quasi-identifiers and the timestamp of the record of a person, they define the \emph{backward}, \emph{cross}, and \emph{forward} (\emph{BCF}) attacks.
They show that combining two individually $k$-anonymized subsequent releases using one of the aforementioned attacks can lead to `cracking' some of the records in the set of $k$ candidate tuples rendering the privacy level lower than $k$.
Except for the detection of cases of compromising BCF anonymity between two releases, the authors also provide an anonymization algorithm for a release $\pmb{o}_2$ in the presence of a private release $\pmb{o}_1$.
The algorithm starts from the most possible generalized state for the quasi-identifiers of the records in $D_2$.
Step by step, it checks which combinations of specializations on the attributes do not violate the BCF anonymity and outputs the most possible specialized version of the data set.
The authors discuss how the framework extends to multiple releases and to different kinds of privacy methods (other than $k$-anonymity).
It is worth noting that to maintain a certain quality for a release, it is essential that the delta among subsequent releases is large enough; otherwise the needed generalization level may destroy the utility of the data set.
% K anonymity for trajectories with spatial distortion
% - microdata
% - finite (sequential)(trajectories)
% - batch
% - complementary release
% - user
% - clustering & k-anonymity
% - distortion (on the centroid)
\hypertarget{abul2008never}{Abul et al}.~\cite{abul2008never} defined \emph{($k$, $\delta$)-anonymity} for enabling high-quality moving-objects data sets publishing.
The authors claim that the classical $k$-anonymity framework cannot be directly applied to such kind of data from a data-centric perspective.
The traditional distortion techniques in $k$-anonymity, i.e.,~generalization or suppression, yield great loss of information.
On the one hand, suppression diminishes the size of the database.
On the other hand, generalization demands the existence of quasi-identifiers, the values of which are going to be generalized.
In trajectories, however, all points can be equally considered as quasi-identifiers.
Obviously, a generalization of all the trajectories points would yield great levels of distortion.
For this reason, a new, spatial-based distortion method is proposed.
After clustering the trajectories in groups of at least $k$ elements, each trajectory is translated into a new one, in a vicinity of a predefined threshold $\delta$.
Of course, the newly generated trajectories should still form a $k$-anonymous set.
The authors validate their theory by experimentally showing that the resulting distance of count queries executed over a data set and its ($k$, $\delta$) version, remains low.
However, a comparative evaluation to existing clustering techniques, e.g.,~$k$-means would have been interesting, to better support the contributions on this part of the solution as well.
% Privacy-utility trade-off under continual observation
% - microdata
% - finite
% - batch/streaming
% - dependence
% - user
% - perturbation (randomization)
% - temporal correlations (HMM)
% - local
\hypertarget{erdogdu2015privacy}{Erdogdu and Fawaz}~\cite{erdogdu2015privacy} consider the scenario where privacy-conscious individuals separate the data that they generate into sensitive, and non-sensitive.
The individuals keep the former unreleased, and publish samples of the latter to a service provider.
Privacy mapping, implemented as a stochastic process, distorts the non-sensitive data samples locally, and a separable distortion metric (e.g.,~Hamming distance) calculates the discrepancy of the distorted data from the original.
The goal of the privacy mapping is to find a balance between the distortion and privacy metric, i.e.,~achieve maximum released data utility, while offering sufficient privacy guarantees.
The authors assume that there is a data dependence (modeled with an HMM) between the two data sets, and thus the release of the distorted data set can reveal information about the sensitive one.
They investigate both a simple attack setting, and a complex one.
In the simple attack, the adversary can make static assumptions, based only on the so far made observations that cannot be later altered.
In the complex attack, past, and future data releases affect dynamically the assumptions that an adversarial entity makes.
In both cases, the framework quantifies the information leakage at any time point using a privacy metric that measures the improvement of the adversarial inference of the sensitive data set, which the individual kept secret, after observing the data released at that particular point.
Throughout the process, the authors consider both the batch, and the streaming processing schemes.
However, the assumption that individuals are privacy-conscious can drastically limit the applicability of the framework.
Furthermore, the metrics that the framework utilizes for the evaluation of the privacy guarantees that it provides are not intuitive.
% M-invariance: towards privacy preserving re-publication of dynamic data sets
% - microdata
% - finite
% - batch
% - complementary release (intersection of sensitive values)
% - user
% - k-anonymity
% - generalization + synthetic data insertion
\hypertarget{xiao2007m}{Xiao et al.}~\cite{xiao2007m} consider the case when a data set is (re)published in different timestamps
in an update (insert/delete tuple) manner.
More precisely, they address data anonymization in continuous publishing by implementing $m$-\emph{invariance}.
In a simple $k$-anonymity (or $l$-diverse) scenario the privacy of an individual existing in two updates can be compromised by the intersection of the set of sensitive values.
In contrast, an individual who exists in a series of $m$-invariant releases is always associated with the same set of $m$ different sensitive values.
To enable the publishing of $m$-invariant data sets, artificial tuples (\emph{counterfeits}) may be added in a release.
To minimize the noise added to the data sets, the authors provide an algorithm with two extra desiderata: limit the counterfeits, and minimize the quasi-identifiers' generalization level.
Still, the choice of adding tuples with specific sensitive values disturbs the value distribution with a direct effect on any relevant statistics analysis.
% Preventing equivalence attacks in updated, anonymized data
% - microdata
% - finite
% - batch
% - complementary release (equivalence attack)
% - user
% - m-invariance (k-anonymity)
% - generalization + synthetic data insertion
In the same update setting (insert/delete tuple), \hypertarget{he2011preventing}{He et al.}~\cite{he2011preventing} introduce another kind of attack, namely the \emph{equivalence} attack, not taken into account by the aforementioned $m$-invariance technique.
The equivalence attack allows for sets of individuals to be considered equivalent as far as the sensitive attribute is concerned, in different timestamps.
In this way, all the members of the equivalence class will be harmed, if the sensitive value is learned even for only one member.
For a number of releases to be private, they have to be both $m$-invariant and $e$-equivalent ($e < m$).
The authors propose an algorithm incorporating $m$-invariance, based on the graph optimization \emph{min cut} problem, for publishing $e$-equivalent data sets.
The proposed method can achieve better levels of privacy, in comparable times and quality as $m$-invariance.
% Privacy by diversity in sequential releases of databases
% - generalization + permutation of sensitive information among tuples with the same quasi-identifiers
\hypertarget{Shmueli}{Shmueli and Tassa}~\cite{shmueli2015privacy} identified the computational inefficiency of anonymously releasing a data set, taking into account previous ones, in scenarios of continuous data publishing.
The released data sets contain subsets of attributes of an original data set, while the authors propose an extension for attribute addition.
Their algorithm can compute $l$-diverse anonymized releases (over different subsets of attributes) in parallel by generating $l -1$ so-called \emph{fake} worlds.
A fake world is generated from the base data set by randomly permutating non-identifier and sensitive values among the tuples, in such a way that minimal information loss (quality desideratum) is incurred.
This is partially accomplished by verifying that the permutation is done among quasi-identifiers that are similar.
Then, the algorithm creates buckets of tuples with at least $l$ different sensitive values, in which the quasi-identifiers will then be generalized in order to achieve $l$-diversity (privacy protection desideratum).
The generalization step is also conducted in an information-loss efficient way.
All different releases will be $l$-diverse because they are created assuming the same possible worlds, with which they are consistent.
Tuples/attributes deletion is briefly discussed and left as an open question.
The article is contrasted with a previous work~\cite{shmueli2012limiting} of the same authors, claiming that the new approach considers a stronger adversary (the adversary knows all individuals with their quasi-identifiers in the data set, and not only one), and that the computation is much more efficient, as it does not have an exponential complexity with respect to the number of previous publications.
% A hybrid approach to prevent composition attacks for independent data releases
% - microdata
% - finite
% - batch
% - complementary release (releases unknown to the publisher)
% - user
% - k-anonymity
% - generalization + noise (from normal distribution)
\hypertarget{li2016hybrid}{Li et al.}~\cite{li2016hybrid} identified a common characteristic in most of the privacy techniques: when anonymizing a data set all previous releases are known to the data publisher.
However, it is probable that the releases are independent from each other, and that the data publisher is unaware of these releases when anonymizing the data set.
In such a setting, the previous techniques would suffer from composition attacks.
The authors define this kind of adversary and propose a hybrid model for data anonymization.
More precisely, the publisher/adversary knows that an individual exists in two different anonymized versions of the same data set, he has a hold of the anonymized versions, but the anonymization is done independently (i.e.,~without considering the previously anonymized data sets) for each data set.
The key idea in fighting a composition attack is to enforce the probability that the matches among tuples from two data sets are random, linking different rather than the same individual.
To do so, the proposed privacy protection method exploits three preprocessing steps before applying a traditional $k$-anonymity or $l$-diversity algorithm.
First, the data set is sampled so as to blur the knowledge of the existence of individuals.
Then, especially in small data sets, quasi-identifiers are distorted by noise addition before the classical generalization step.
The noise is taken from a normal distribution with the mean and standard deviation values calculated on the corresponding quasi-identifier values.
In the case of sparse data, the sensitive values are generalized along with the quasi-identifiers.
The danger of composition attacks is less prominent when using this method on top of $k$-anonymity rather than without, while having comparable quality results.
The authors also provide a comparison to data set release using $\varepsilon$-differential privacy, demonstrating that their techniques are superior with respect to quality because in the opponent algorithm the noise is added up for each of the sensitive attribute to be protected.
Even though the authors use in the experiments two different values for $\varepsilon$, a better experiment would have been to compare the quality/privacy ratio between the two methods.
This is a good attempt to independently anonymize multiple times the same data set; nevertheless, the scenario is restricted to releases over the same database schema, using the same perturbation, and generalization functions.
% Publishing trajectories with differential privacy guarantees
% - microdata (trajectories)
% - finite
% - batch
% - linkage
% - event
% - differential privacy
% - perturbation (Laplace)
% - Seems to belong to the local scheme but in the scenario/evaluation they release multiple trajectories.
\hypertarget{jiang2013publishing}{Jiang et al.}~\cite{jiang2013publishing} focus on ship trajectories with known starting and terminal points.
More specifically, they study different noise addition mechanisms for publishing trajectories with differential privacy guarantees.
These mechanisms include adding global noise to the trajectory, and local noise to either each location point or the coordinates of each point of the trajectory.
The first two mechanisms sample noisy radius from an exponential distribution, while the latter adds noise drawn from a Laplace distribution to each coordinate of every location.
By comparing these different techniques, they conclude that the latter offers better privacy guarantee and smaller error bound.
Nonetheless, the resulting trajectory is noticeably distorted due to the addition of Laplace noise to the original coordinates.
To tackle this issue, they design the \emph{Sampling Distance and Direction} (SDD) mechanism.
This mechanism allows the publishing of optimal next possible trajectory point by sampling, from the probability distribution of the exponential mechanism,
a suitable distance and direction at the current position, while taking into account the ship's maximum speed constraint.
Due to the fact that SDD utilizes the exponential mechanism, it outperforms the other three mechanisms, and maintains a good utility-privacy balance.
% Differentially private trajectory data publication
% - microdata (trajectories)
% - finite
% - batch
% - linkage
% - user
% - differential privacy
% - perturbation (Laplace)
\hypertarget{chen2011differentially}{Chen et al.}~\cite{chen2011differentially} propose a non-interactive data-dependent privacy-preserving algorithm to generate a differentially private release of trajectory data.
The algorithm relies on a noisy prefix tree, i.e.,~an ordered search tree data structure used to store an associative array.
Each node represents a location, from a set of possible locations that any user can be present at, of a trajectory and contains a perturbed count, which represents the number of individuals at the current location, with noise drawn from a Laplace distribution.
The privacy budget is equally allocated to each level of the tree representing a timestamp.
At each level, and for every node, the algorithm seeks for the children nodes with non-zero number of trajectories (non-empty nodes) to continue expanding them.
An empty node has a noisy count lower than a threshold that is dependent on the available privacy budget and the height of the tree.
All children nodes associate with disjoint data subsets, and thus the algorithm can utilize for every node all of the available budget at every tree level, according to the parallel composition theorem of differential privacy.
To generate the anonymized database, it is necessary to traverse the prefix tree once in post-order, paying attention to terminating (empty) nodes.
During this process, taking into account some consistency constraints helps to avoid erroneous trajectories due to the noise injection.
Namely, each node of a path should have a count that is greater than or equal to the counts of its children, and each node of a path should have a count that is greater than the sum of the counts of all of its children.
Increasing the privacy budget results in less average relative error because less noise is added at each level, and thus improves quality.
By increasing the height of the tree, the relative error initially decreases as more information is retained from the database.
However, after a certain threshold, the increase of height can result in less available privacy budget at each level, and thus more relative error due to the increased perturbation.
% Protecting Locations with Differential Privacy under Temporal Correlations
\hypertarget{xiao2015protecting}{Xiao et al.}~\cite{xiao2015protecting} propose another privacy definition based on differential privacy that accounts for temporal correlations in geo-tagged data.
Location transitions between two consecutive timestamps are determined by temporal correlations modeled through a Markov chain.
A \emph{$\delta$-location} set includes all the probable locations a user might appear at, excluding locations of low probability.
Therefore, the true location is hidden in the resulting set, in which any pair of locations are indistinguishable.
The lower the value of $\delta$, the more locations are included and hence, the higher the level of privacy that is achieved.
The authors use the \emph{Planar Isotropic Mechanism} (PIM) as perturbation mechanism, which they designed upon their proof that $l_1$-norm sensitivity fails to capture the exact sensitivity in a multidimensional space.
For this reason, PIM utilizes instead \emph{sensitivity hull}, an independent notion of the context of location privacy.
In~\cite{xiao2017loclok}, the authors demonstrate the functionality of their system \emph{LocLok}, which implements the concept of $\delta$-location.
% Time distortion anonymization for the publication of mobility data with high utility
% - microdata (trajectory)
% - finite
% - batch
% - linkage
% - event
% - temporal transformation
% - perturbation
% - local
\hypertarget{primault2015time}{Primault et al.}~\cite{primault2015time} proposed \emph{Promesse}, an algorithm that builds on time distortion instead of location distortion when releasing trajectories.
Promesse takes as input an individual's mobility trace comprising of a data set of pairs of geolocations and timestamps, and a parameter $\varepsilon$.
The latter indicates the desired distance between the location points that will be publicly released.
Initially, Promesse extracts regularly spaced locations, and interpolates each one of the locations at a distance depending on the previous location and the value of $\varepsilon$.
Then, it removes the first and last locations of the mobility trace, and assigns uniformly distributed timestamps to the remaining locations of the trajectory.
Hence, the resulting trace has a smooth speed, and therefore places where the individual stayed longer, e.g.,~home, work, etc., are indistinguishable.
The algorithm needs to know the starting and ending point of the trajectory; thus, it can only apply to offline scenarios.
Furthermore, it works better with fine grained data sets because in this way it can achieve optimal geolocation and timestamp pairing.
Moreover, the definition of $\varepsilon$ cannot provide versatile privacy protection since it is data dependent.
% Differentially Private and Utility Preserving Publication of Trajectory Data
% - microdata (trajectory)
% - finite
% - batch
% - linkage
% - user
% - differential privacy
% - perturbation (Laplace)
% - global
\hypertarget{gursoy2018differentially}{Gursoy et al.}~\cite{gursoy2018differentially} designed \emph{DP-Star}, a differential privacy framework that publishes synthetic trajectories featuring similar statistics compared to the original ones.
By utilizing the \emph{Minimum Description Length} (MDL) principle~\cite{grunwald2007minimum}, DP-Star eliminates redundant data points in the original trajectories, and generates trajectories containing only representative points.
In this way, it is necessary to allocate the available privacy budget to far less data points, striking a balance between preciseness and conciseness.
Moreover, the algorithm constructs a density-aware grid, with granularity that adapts to the geographical density of the trajectory points of the data set and preserves the spatial density despite any necessary perturbation.
Then, DP-Star preserves the dependence between the trajectories' start and end points by extracting (through a first-order Markov mobility model) the trip distribution, and the intra-trajectory mobility.
Finally, a Median Length Estimation (MLE) mechanism approximates the trajectories' lengths, and the framework generates privacy and utility preserving synthetic trajectories.
Every phase of the process consumes some predefined privacy budget, keeping the respective products of each phase
private and eligible for publishing.
The authors compare their design with that of~\cite{chen2012differentially} and~\cite{he2015dpt} by running several tests, and ascertain that it outperforms them in terms of data utility.
However, due to DP-Star's privacy budget distribution to its different phases, for small values of $\varepsilon$ the framework's privacy performance is inferior to that of its competitors.
% An Optimal Pufferfish Privacy Mechanism for Temporally Correlated Trajectories
% - microdata
% - finite (sequential)
% - batch
% - dependence (temporal)
% - local
% - event
% - differential privacy
% - perturbation (randomized response, Laplace)
\hypertarget{ou2018optimal}{Ou et al.}~\cite{ou2018optimal} designed \emph{FGS-Pufferfish} for publishing temporally correlated trajectory data while protecting temporal correlation.
FGS-Pufferfish transforms a user's daily trajectories into a set of sine and cosine waves of different frequencies along with the corresponding Fourier coefficients.
Then, it adds Laplace noise to the Fourier coefficients' geometric sum.
The authors obtain the optimal noisy Fourier coefficients by solving the constrained optimization problem via the Lagrange Multiplier method depending on the available privacy budget.
They evaluate both the location data utility and the temporal correlation utility.
The experimental evaluation shows that FGS-Pufferfish outperforms CTS-DP~\cite{wang2017cts} in terms of the trade-off between privacy and location utility.
% Continuous privacy preserving publishing of data streams
% - microdata
% - infinite
% - stream
% - as k-anonymity
% - event
% - k-anonymity
% - generalization
\hypertarget{zhou2009continuous}{Zhou et al.}~\cite{zhou2009continuous} introduce the problem of infinite private data publishing, and propose a randomized solution based on $k$-anonymity.
More precisely, they continuously publish equivalence classes of size greater than or equal to $k$ containing generalized tuples from distinct persons (or identifiers in general).
To create the equivalence classes they set several desiderata.
Except for the size of a class, which should be greater than or equal to $k$, the information loss occurred by the generalization should be minimal, whereas the delay in forming and publishing the class should be kept low as well.
To achieve these requirements, they built a randomized model using the popular structure of $R$-trees, extended to accommodate data density distribution information.
In this way, they achieve a better quality/publishing delay ratio for the released private data.
On the one hand, the formed classes contain data items that are close to each other (in dense areas), while on the other hand, classes with tuples of sparse areas are released as soon as possible so that the delay will remain low.
% Maskit: Privately releasing user context streams for personalized mobile applications
% - microdata (context)
% - infinite
% - streaming
% - dependence
% - event
% - $\delta$-privacy
% - suppression
% - temporal (Markov)
% - local
\hypertarget{gotz2012maskit}{Gotz et al.}~\cite{gotz2012maskit} developed \emph{MaskIt}, a system that interfaces the sensors of a personal device, identifies various sets of contexts, and releases a stream of privacy-preserving contexts to untrusted applications installed on the device.
A context represents the circumstances that form the setting for an event, e.g.,~`at the office', `running', etc.
The individuals have to define the sensitive contexts that they wish to be protected, and the desired level of privacy.
The system models the individuals' various contexts, and transitions between them.
It captures temporal correlations, and models individuals' movement in the space using Markov chains while taking into account historical observations.
After the initialization, MaskIt filters a stream of individual's contexts by checking for each context whether it is safe to release it or it is necessary to suppress it.
The authors define \emph{$\delta$-privacy} as the privacy model of MaskIt.
More specifically, a system preserves $\delta$-privacy
if the difference between the posterior and prior knowledge of an adversary after observing an output at any possible timestamp is bounded by $\delta$.
After filtering all the elements of an input stream, MaskIt releases an output sequence for a single day.
The system can repeat the process to publish longer context streams.
The expected number of released contexts quantifies the utility of the system.
% PLP: Protecting location privacy against correlation analyze Attack in crowdsensing
% - microdata (context, location)
% - infinite
% - streaming
% - dependence
% - event
% - $\delta$-privacy
% - suppression
% - spatiotemporal (CRF)
% - local
\hypertarget{ma2017plp}{Ma et al.}~\cite{ma2017plp} propose \emph{PLP} (Protecting Location Privacy), a crowdsensing scheme that protects location privacy against adversaries that can extract spatiotemporal correlations from crowdsensing data.
PLP filters an individual's context (location, sensing data) stream while it takes into consideration long-range dependencies among locations and reported sensing data, which are modeled by CRFs.
It suppresses sensing data at all sensitive locations while data at non-sensitive locations are reported with a certain probability defined by observing the corresponding CRF model.
On the one hand, the scheme estimates the privacy of the reported data by the difference $\delta$ between the probability that an individual would be at a specific location given the supplementary information versus the same probability without the extra information.
On the other hand, it quantifies the utility by measuring the total amount of reported data (more is better).
An estimation algorithm searches for the optimal strategy that maximizes utility while preserving a predefined privacy threshold.
% An adaptive geo-indistinguishability mechanism for continuous LBS queries
% - microdata
% - infinite/finite (not clear)
% - streaming
% - dependence
% - event
% - geo-indistinguishability
% - perturbation (planar Laplace)
% - local
\hypertarget{al2018adaptive}{Al-Dhubhani and Cazalas}~\cite{al2018adaptive} propose an adaptive privacy-preserving technique based on geo-indistinguishability, which adjusts the amount of noise required to obfuscate an individual's location based on its correlation level with the previously published locations.
Before adding noise, an evaluation of the adversary's ability to estimate an individual's position takes place.
This process utilizes a regression algorithm for a certain prediction window that exploits previous location releases.
More concretely, in areas with locations presenting strong correlations, an adversary can predict the current location with low estimation error.
Consequently, it is necessary to add more noise to the locations prior to their release.
Adapting the amount of injected noise depending on the data correlation level might lead to a better performance, in terms of both privacy and utility, in the short term.
However, alternating the amount of injected noise at each timestamp, without
ensuring the preservation of the features (including correlations) present in the original data, might lead to arbitrary utility loss.
% Preventing velocity-based linkage attacks in location-aware applications
% - microdata (trajectory)
% - infinite
% - streaming
% - dependence (velocity)
% - event
% - temporal and spatial cloaking
% - local and global
\hypertarget{ghinita2009preventing}{Ghinita et al.}~\cite{ghinita2009preventing} tackle attacks to location privacy that arise from the linkage of maximum velocity with cloaked regions when using an LBS.
The authors propose methods that can prevent the disclosure of the exact location coordinates of an individual, and bound the association probability of an individual to a sensitive location-related feature.
The first method is based on temporal cloaking and utilizes deferral, and postdating.
Deferral delays the disclosure of a cloaked region that is impossible for an individual to have reached based on the latest region that she published and her known maximum speed.
Postdating reports the nearest previous cloaked region that will allow the LBS to return relevant results with high probability, since the two regions are close.
The second method implements spatial cloaking.
First, it creates cloaked regions by taking into account all of the user-specified sensitive features that are relevant to the current location (filtering of features).
Then, it enlarges the area of the region to satisfy the privacy requirements (cloaking).
Finally, it defers the publishing of the region until it includes the current timestamp (safety enforcement) similar to temporal cloaking.
The system measures the quality of service of both methods in terms of the cloaked region size, time and space error, and failure ratio.
The cloaked region size is important because larger regions may decrease the utility of the information that the LBS might return.
The time and space error is possible due to delayed location reporting and region cloaking.
Failure ratio corresponds to the percentage of dropped queries in cases where it is impossible to satisfy the privacy requirements.
Although both methods experimentally prove to offer adequate quality of service, the privacy requirements and metrics that the authors consider do not offer substantial privacy guarantees for commercial application.
% A Trajectory Privacy-Preserving Algorithm Based on Road Networks in Continuous Location-based Services
% - microdata (trajectory)
% - infinite
% - streaming
% - linkage
% - event
% - $l$-diversity
% - generalization (cloaking)
% - LBS but global
\hypertarget{ye2017trajectory}{Ye et al.}~\cite{ye2017trajectory} present an $l$-diversity method for producing a cloaked area, based on the local road network, for protecting trajectories.
A trusted entity divides the spatial region of interest based on the density of the road network, using quadtree structures, until every subregion contains at least $l$ road segments.
Then, it creates a database for each subregion by generating all the possible trajectories based on real road network information.
The trusted entity uses this database, when individuals attempt to interact with an LBS by sending their current location, to predict their next locations.
Thereafter, it selects the $l -1$ nearest trajectories to the individual's current location, and constructs a minimum cloaking region.
The resulting cloaking area covers the $l$ nearest trajectories and ensures a minimum area of coverage.
This method addresses the limitations of $k$-anonymity in terms of continuous data publishing of trajectories.
The required calculation of every possible trajectory, for the construction of a trajectory database for every subregion, might require an arbitrary amount of computations depending on the area's features.
Nonetheless, the utilization of quadtrees can limit the overhead of the searching process.
% Quantifying Differential Privacy under Temporal Correlations
% - statistical
% - infinite/finite
% - streaming
% - dependence
% - mainly (w-)event but also user
% - differential privacy
% - perturbation (Laplace)
% - temporal correlations (Markov)
\hypertarget{cao2017quantifying}{Cao et al.}~\cite{cao2017quantifying,cao2018quantifying} propose a method for computing the temporal privacy loss of a differential privacy mechanism in the presence of temporal correlations and background knowledge.
The goal of their technique is to guarantee privacy protection and to bound the privacy loss at every time point under the assumption of independent data releases.
It calculates the temporal privacy loss as the sum of the backward and forward privacy loss minus the default privacy loss $\varepsilon$ of the mechanism (because it is counted twice in the aforementioned entities).
This calculation is done for each individual that is included in the original data set, and the overall temporal privacy loss is equal to the maximum calculated value at every time point.
The backward/forward privacy loss at any time point depends on the backward/forward privacy loss at the previous/next instance, the backward/forward temporal correlations, and $\varepsilon$.
The authors propose solutions to bound the temporal privacy loss, under the presence of weak to moderate correlations, in both finite and infinite data publishing scenarios.
In the latter case, they try to find a value for $\varepsilon$ for which the backward and forward privacy loss are equal.
In the former, they similarly try to balance the backward and forward privacy loss while they allocate more $\varepsilon$ at the first and last time points, since they have higher impact to the privacy loss of the next and previous ones.
This way they achieve an overall constant temporal privacy loss throughout the time series.
According to the technique's intuition, stronger correlations result in higher privacy loss.
However, the loss is smaller when the dimension of the transition matrix, which is extracted according to the modeling of the correlations (here it is Markov chain), is larger due to the fact that larger transition matrices tend to be uniform, resulting in weaker data dependence.
The authors investigate briefly all of the possible privacy levels; however, the solutions that they propose are suitable only for the event-level.
Last but not least, the technique requires the calculation of the temporal privacy loss for every individual within the data set which might prove computationally inefficient in real-time scenarios.
\hypertarget{naim2019off}{Naim et al.}~\cite{naim2019off, ye2019preserving, ye2020off, ye2021off} proposed the notion of \emph{ON-OFF privacy} according to which, users require privacy protection only at certain timestamps over time.
They investigate the privacy risk due to the correlation between a user's requests when toggling the privacy protection ON and OFF.
The goal is to minimize the information throughput and always answer users' requests while protecting their requests to online services when privacy is set to ON.
They model the dependence between requests using a Markov chain, which is publicly known, where each state represents an available service.
Setting privacy to ON, the user obfuscates their original query by randomly sending requests to (and receiving answers from) a subset of all of the available services.
Although this randomization step makes the original query indistinguishable while making sure that the users always get the information that they need, there is no clear quantification of the privacy guarantee that the scheme offers over time.