privacy: Addressed \kat{} in microdata

This commit is contained in:
Manos Katsomallos 2021-09-03 13:56:22 +03:00
parent a78c03127d
commit f3fe1149a5

View File

@ -131,7 +131,12 @@ For completeness, in this section we present the seminal works for privacy-prese
Sweeney coined \emph{$k$-anonymity}~\cite{sweeney2002k}, one of the first established works on data privacy.
A released data set features $k$-anonymity protection when the sequence of values for a set of identifying attributes, called the \emph{quasi-identifiers}, is the same for at least $k$ records in the data set.
Computing the quasi-identifiers in a set of attributes is still a hard problem on its own~\cite{motwani2007efficient}.
$k$-anonymity is syntactic\kat{meaning?}, it constitutes an individual indistinguishable from at least $k-1$ other individuals in the same data set.\kat{you just said this in another way,two sentences before}
% $k$-anonymity
% is syntactic,
% \kat{meaning?}
% it
% constitutes an individual indistinguishable from at least $k-1$ other individuals in the same data set.
% \kat{you just said this in another way,two sentences before}
In a follow-up work~\cite{sweeney2002achieving}, the author describes a way to achieve $k$-anonymity for a data set by the suppression or generalization of certain values of the quasi-identifiers.
Several works identified and addressed privacy concerns on $k$-anonymity. Machanavajjhala et al.~\cite{machanavajjhala2006diversity} pointed out that $k$-anonymity is vulnerable to homogeneity and background knowledge attacks.
@ -146,7 +151,8 @@ A data set features $\theta$-closeness when all of its groups satisfy $\theta$-
The main drawback of $k$-anonymity (and its derivatives) is that it is not tolerant to external attacks of re-identification on the released data set.
The problems identified in~\cite{sweeney2002k} appear when attempting to apply $k$-anonymity on continuous data publishing (as we will also see next in Section~\ref{sec:micro}).
These attacks include multiple $k$-anonymous data set releases with the same record order, subsequent releases of a data set without taking into account previous $k$-anonymous releases, and tuple updates.
Proposed solutions include rearranging the attributes, setting the whole attribute set of previously released data sets as quasi-identifiers or releasing data based on previous $k$-anonymous releases.\kat{and the citations of these solutions?}
Proposed solutions include rearranging the attributes, setting the whole attribute set of previously released data sets as quasi-identifiers or releasing data based on previous $k$-anonymous releases~\cite{simi2017extensive}.
% \kat{and the citations of these solutions?}
\subsubsection{Statistical data}