From 8902e2b66ff927a52ea068ef78d08b79c2352c13 Mon Sep 17 00:00:00 2001 From: Manos Katsomallos Date: Sat, 17 Jul 2021 16:53:10 +0200 Subject: [PATCH] Sectioning --- acknowledgements.tex | 2 ++ background.tex | 35 ++++++++++++++++------------------- main.tex | 1 + 3 files changed, 19 insertions(+), 19 deletions(-) diff --git a/acknowledgements.tex b/acknowledgements.tex index 51d5341..3e991f7 100644 --- a/acknowledgements.tex +++ b/acknowledgements.tex @@ -1,4 +1,6 @@ \chapter{Acknowledgements} +\label{ch:ack} + Upon the completion of my thesis, I would like to express my deep gratitude to my research supervisors for their patient guidance, enthusiastic encouragement and useful critiques of this research work. Besides my advisors, I would like to thank the reporters as well as the rest of the jury for their invaluable contribution. diff --git a/background.tex b/background.tex index 68454c8..8e63759 100644 --- a/background.tex +++ b/background.tex @@ -1,15 +1,12 @@ -\chapter{Background} -\label{ch:bg} +\chapter{Preliminaries} +\label{ch:prel} -\section{Background} -\label{sec:background} - -In this section, we introduce some relevant terminology and background knowledge around the problem of continuous publishing of sensitive data sets. +In this chapter, we introduce some relevant terminology and background knowledge around the problem of continuous publishing of sensitive data sets. First, we categorize data as we view them in the context of continuous data publishing. Second, we define data privacy, we list the kinds of attacks that have been identified in the literature, as well as the desired privacy levels that can be achieved, and the basic privacy operations that are applied to achieve data privacy. Third, we provide a brief overview of the seminal works on privacy-preserving data publishing, used also in continuous data publishing, fundamental in the domain and important for the understanding of the rest of the survey. -To accompany and facilitate the descriptions in this section, we provide the following running example. +To accompany and facilitate the descriptions in this chapter, we provide the following running example. \begin{example} \label{ex:snapshot} @@ -54,11 +51,11 @@ To accompany and facilitate the descriptions in this section, we provide the fol \end{example} -\subsection{Data} -\label{subsec:data} +\section{Data} +\label{sec:data} -\subsubsection{Categories} +\subsection{Categories} \label{subsec:data-categories} As this survey is about privacy, the data that we are interested in, contain information about individuals and their actions. @@ -145,7 +142,7 @@ In incremental data, an original data set is augmented in each subsequent timest For example, trajectories can be considered as incremental data, when at each timestamp we consider all the previously visited locations by an individual, incremented by his current position. -\subsubsection{Processing and publishing} +\subsection{Processing and publishing} \label{subsec:data-publishing} We categorize data processing and publishing based on the implemented scheme, as: @@ -221,8 +218,8 @@ We identify two main data processing and publishing modes: Batch data processing and publishing (Figure~\ref{fig:mode-batch}) is performed (usually offline) over both finite and infinite data, while streaming processing and publishing (Figure~\ref{fig:mode-streaming}) is by definition connected to infinite data (usually in real-time). -\subsection{Privacy} -\label{subsec:privacy} +\section{Privacy} +\label{sec:privacy} When personal data are publicly released, either as microdata or statistical data, individuals' privacy can be compromised, i.e,~an adversary becomes certain about an individual's personal information with a probability higher than a desired threshold. In the literature, this compromise is know as \emph{information disclosure} and is usually categorized as~\cite{li2007t, wang2010privacy, narayanan2008robust}: @@ -242,7 +239,7 @@ Identity disclosure appears when we can guess that the sixth record of (a privac Attribute disclosure appears when it is revealed from (a privacy-protected version of) the microdata of Table~\ref{tab:snapshot-micro} that Quackmore is $62$ years old. -\subsubsection{Levels} +\subsection{Levels} \label{subsec:privacy-levels} The information disclosure that a data release may entail is often linked to the protection level that a privacy-preserving algorithm is trying to achieve. @@ -287,7 +284,7 @@ In the extreme cases where $w$ is set to either $1$ or to the size of the entire Although the described levels have been coined in the context of \emph{differential privacy}~\cite{dwork2006calibrating}, a seminal privacy method that we will discuss in more detail in Section~\ref{subsec:privacy-statistical}, it is possible to apply their definitions to other privacy protection techniques as well. -\subsubsection{Attacks} +\subsection{Attacks} \label{subsec:privacy-attacks} Information disclosure is typically achieved by combining supplementary (background) knowledge with the released data or by setting unrealistic assumptions while designing the privacy-preserving algorithms. @@ -354,7 +351,7 @@ By the data dependence attack, the status of Donald could be more certainly infe In order to better protect the privacy of Donald in case of attacks, the data should be privacy-protected in a more adequate way (than without the attacks). -\subsubsection{Operations} +\subsection{Operations} \label{subsec:privacy-operations} Protecting private information, which is known by many names (obfuscation, cloaking, anonymization, etc.), is achieved by using a specific basic privacy protection operation. @@ -379,13 +376,13 @@ Our focus is limited to techniques that achieve a satisfying balance between bot For these reasons, there will be no further discussion around this family of techniques in this article. -\subsubsection{Seminal works} +\subsection{Seminal works} \label{subsec:privacy-seminal} For completeness, in this section we present the seminal works for privacy-preserving data publishing, which, even though originally designed for the snapshot publishing scenario, have paved the way, since many of the works in privacy-preserving continuous publishing are based on or extend them. -\paragraph{Microdata} +\subsubsection{Microdata} \label{subsec:privacy-micro} Sweeney coined \emph{$k$-anonymity}~\cite{sweeney2002k}, one of the first established works on data privacy. @@ -408,7 +405,7 @@ These attacks include multiple $k$-anonymous data set releases with the same rec Proposed solutions include rearranging the attributes, setting the whole attribute set of previously released data sets as quasi-identifiers or releasing data based on previous $k$-anonymous releases. -\paragraph{Statistical data} +\subsubsection{Statistical data} \label{subsec:privacy-statistical} While methods based on $k$-anonymity have been mainly employed for releasing microdata, \emph{differential privacy}~\cite{dwork2006calibrating} has been proposed for releasing high utility aggregates over microdata while providing semantic privacy guarantees. diff --git a/main.tex b/main.tex index 8792f43..76b2b70 100644 --- a/main.tex +++ b/main.tex @@ -79,6 +79,7 @@ \input{introduction} \input{background} \input{related} +\input{conclusion} \backmatter