comments katerina 2.2.4

This commit is contained in:
katerinatzo 2021-09-14 17:09:41 +02:00
parent e8abe5b3e1
commit 202d77c750

View File

@ -94,6 +94,7 @@ Moreover, in user-level (Figure~\ref{fig:level-user}) it is hard to determine wh
Finally, in $2$-event-level (Figure~\ref{fig:level-w-event}) it is hard to determine whether Quackmore was ever included in the released series of events between the timestamps $t_1$ and $t_2$, $t_2$ and $t_3$, etc. (i.e.,~for a window $w = 2$).
\kat{Already, by looking at the original counts, for the reader it is hard to see if Quackmore was in the event/database. So, we don't really get the difference among the different levels here.}
\mk{It is without background knowledge.}
\kat{But you discuss event and level here by showing just counts, with no background knowledge, and you want the reader to understand how in one case we are not sure if he participated in the event t1 or in any of the events. It is not clear to me what is the difference, just by looking at the example with the counts. }
\begin{figure}[htp]
\centering
@ -114,7 +115,7 @@ Finally, in $2$-event-level (Figure~\ref{fig:level-w-event}) it is hard to deter
\end{figure}
Contrary to event-level, which provides privacy guarantees for a single event, user- and $w$-event-level offer stronger privacy protection by protecting a series of events.
Event- and $w$-event-level handle better scenarios of infinite data observation, whereas user-level is more appropriate when the span of data observation is finite.
Event- and $w$-event-level better fit scenarios of infinite data observation, whereas user-level is more appropriate when the span of data observation is finite.
$w$-event- is narrower than user-level protection due to its sliding window processing methodology.
In the extreme cases where $w$ is equal either to $1$ or to the length of the time series, $w$-event- matches event- or user-level protection, respectively.
Although the described levels have been coined in the context of \emph{differential privacy}~\cite{dwork2006calibrating}, a seminal privacy method that we will discuss in more detail in Section~\ref{subsec:prv-statistical}, they are used for other privacy protection techniques as well.
@ -123,17 +124,18 @@ Although the described levels have been coined in the context of \emph{different
\subsection{Privacy-preserving operations}
\label{subsec:prv-operations}
Protecting private information
%Protecting private information
% , which is known by many names (obfuscation, cloaking, anonymization, etc.),
% \kat{the techniques are not equivalent, so it is correct to say that they are different names for the same thing}
is achieved by using a specific basic
%is achieved by using a specific basic
% \kat{but later you mention several ones.. so what is the specific basic one ?}
privacy protection operation.
Depending on the
technique
%privacy protection operation.
%Depending on the
%technique
% intervention
% \kat{?, technique, algorithm, method, operation, intervention.. we are a little lost with the terminology and the difference among all these }
that we choose to perform on the original data, we identify the following operations:
%that we choose to perform on the original data,
We identify the following privacy operations that can be applied on the original data to achieve privacy preservation:
% \kat{you can mention that the different operations have different granularity}
% \mk{``granularity''?}
@ -153,11 +155,11 @@ that we choose to perform on the original data, we identify the following operat
\end{itemize}
For example, consider the table schema \emph{User(Name, Age, Location, Status)}.
If we want to protect the \emph{Age} of the user by aggregation, we may replace it by the average age in her Location; by generalization, we may replace the Age by age intervals; by suppression we may delete the entire table column corresponding to \emph{Age}; by perturbation, we may augment each age by a predefined percentage of the age; by randomization we may randomly replace each age by a value taken from the probability density function of the attribute.
If we want to protect the \emph{Age} of the user by aggregation, we may replace it by the average age in her Location\kat{This example does not follow the description you give before for aggregation. Indeed, it fits better the perturbation (you replaced the value with the average age of the same location, which is a deterministic process). Don't you mean counts by aggregation? If you mean aggregation as in sql functions then you should not say in the definition that you replace the rows with the aggregate, but a specific attribute's value. }; by generalization, we may replace the Age by age intervals; by suppression we may delete the entire table column corresponding to \emph{Age}; by perturbation, we may augment each age by a predefined percentage of the age; by randomization we may randomly replace each age by a value taken from the probability density function of the attribute.
It is worth mentioning that there is a series of algorithms (e.g.,~\cite{benaloh2009patient, kamara2010cryptographic, cao2014privacy}) based on the \emph{cryptography} operation.
However, the majority of these methods, among other assumptions that they make, have minimum or even no trust to the entities that handle the personal information.
Furthermore, the amount and the way of data processing of these techniques usually burden the overall procedure, deteriorate the utility of the resulting data sets to a point where they are completely useless, and restrict their applicability.
Furthermore, the amount and the way of data processing of these techniques usually burden the overall procedure, deteriorate the utility of the resulting data sets to a point where they are completely useless, and thus restrict their usage by third-parties.
% \kat{All these points apply also to the non-cryptography techniques. So you should mostly point out that they do not only deteriorate the utility but make them non-usable at all.}
Our focus is limited to techniques that achieve a satisfying balance between both participants' privacy and data utility.
% For these reasons, there will be no further discussion around this family of techniques in this article.