61 lines
6.3 KiB
TeX
61 lines
6.3 KiB
TeX
\section{Significant events}
|
|
\label{sec:thething}
|
|
|
|
The privacy mechanisms for the user, $w$-event, and event levels that are already proposed in the literature, assume that in a time series any single event, or any sequence of events, or the entire series of events is equally privacy-significant for the users.
|
|
In reality, this is
|
|
% a simplistic
|
|
% \kat{I would not say simplistic, but unrealistic assumption that deteriorates unnecessarily the quality of the perturbed data}
|
|
an assumption that deteriorates unnecessarily the utility of the released data.
|
|
The significance of an event is related to certain user-defined privacy criteria, or to its adjacent events, as well as to the entire time series.
|
|
We term significant events as \emph{{\thething} events} or simply \emph{\thethings}, following relevant literature~\cite{gaskell2000telescoping}.
|
|
% \kat{can you find some other work that uses the same term? otherwise one can raise the question why not ot use the word significant }
|
|
% \mk{OK, but then again `significant privacy doesn't' sound great}
|
|
|
|
Identifying {\thethings} in time series can be done in an automatic or manual way.
|
|
For example, in spatiotemporal data, \emph{places where an individual spent some time} denote \emph{points of interest} (POIs) (called also stay points)~\cite{zheng2015trajectory}.
|
|
Such events, and more particularly their spatial attribute values, can be less privacy-sensitive~\cite{primault2018long}, e.g.,~parks, theaters, etc., or, if individuals frequent them, they can reveal supplementary information, e.g.,~residences (home addresses)~\cite{gambs2010show}, places of worship (religious beliefs)~\cite{franceschi-bicchierairussell2015redditor}, etc.
|
|
POIs can be an example of how we can choose {\thethings}, but the idea is not limited to these.
|
|
Another example is the detection of privacy-sensitive user interactions by \emph{contact tracing} applications.
|
|
This can be practical in decease control~\cite{eames2003contact}, similar to the recent outbreak of the Coronavirus disease 2019 (COVID-19) epidemic~\cite{ahmed2020survey}.
|
|
Last but not least, {\thethings} in \emph{smart grid} electricity usage patterns may not only reveal the energy consumption of a user but also information regarding activities, e.g.,~`at work', `sleeping', etc., or types of appliances already installed or recently purchased~\cite{khurana2010smart}.
|
|
We stress out that {\thething} identification is an orthogonal problem to ours, and that we consider {\thethings} given as input to our problem.
|
|
|
|
We argue that protecting only {\thething} events along with any regular event is sufficient for the user privacy protection, while it improves data utility with respect to the conventional user-level privacy.
|
|
Considering {\thethings} can prevent over-perturbing the data in the benefit of their final utility.
|
|
Revisiting the scenario in Figure~\ref{fig:st-cont}, if we want to protect the {\thething} points, we have to allocate at most a budget of $\varepsilon$ to the {\thethings}, while saving some for the release of regular events.
|
|
Essentially, the more budget we allocate to an event the less we protect it, but at the same time the more we maintain its utility.
|
|
With {\thething} privacy we propose to distribute the budget by accounting only for the {\thethings} when we release an event of the time series, i.e.,~allocating $\frac{\varepsilon}{5}$ ($4$ {\thethings} $+ 1$ regular point) to each event (see Figure~\ref{fig:st-cont}).
|
|
This way, we still guarantee
|
|
% \footnote{$\varepsilon$-differential privacy guarantees that the allocated budget should be less or equal to $\varepsilon$, and not precisely how much.
|
|
% \kat{Mano check.}
|
|
% \mk{It's not clear what you want to say}
|
|
% }
|
|
that the {\thethings} are adequately protected, as they receive a total budget of $\frac{4\varepsilon}{5} < \varepsilon$.
|
|
At the same time, we avoid over-perturbing the regular events, as we allocate to them a higher total budget ($\frac{4\varepsilon}{5}$) than in user-level ($\frac{\varepsilon}{2}$), and thus less noise.
|
|
Hence, at any timestamp we achieve an overall privacy protection bounded by $\varepsilon$ in the event set consisting of the released event and the {\thethings}.
|
|
|
|
\begin{example}
|
|
\label{ex:st-cont}
|
|
Continuing Example~\ref{ex:scenario}, Quackmore cares about protecting his {\thethings} ($p_1$, $p_3$, $p_5$, $p_8$) along with every release that he makes, however he is not equally interested for the other regular events in his trajectory.
|
|
More technically, he cares about allocating a total budget of $\varepsilon$ on any set of timestamps containing the {\thethings} and one regular event.
|
|
Event-level protection is not suitable for this case, since it can only protect one event at a time.
|
|
So, let us assume that we apply user-level privacy\footnote{In this scenario, in order to protect all the {\thethings} from timestamp $1$ to $8$, $w$ must be set to $8$, which makes $w$-event privacy equivalent to user-level.}, by distributing equal portions of $\varepsilon$ to all the events, i.e.,~$\frac{\varepsilon}{8}$ to each one (see Figure~\ref{fig:st-cont}).
|
|
Indeed, we have protected the {\thething} points plus one regular event at any release as expected; we have allocated a total of $\frac{5\varepsilon}{8}<\varepsilon$ to these $5$ events.
|
|
|
|
\begin{figure}[htp]
|
|
\centering
|
|
\includegraphics[width=.75\linewidth]{problem/st-cont}
|
|
\caption{User-level and {\thething} $\varepsilon$-differential privacy protection for the time series of Figure~\ref{fig:lmdk-scenario}.}
|
|
\label{fig:st-cont}
|
|
\end{figure}
|
|
|
|
However, perturbing by $\frac{\varepsilon}{8}$ each one of the regular points deteriorates the data utility unnecessarily; any budget lower than or equal to $\frac{4\varepsilon}{8}$ would be sufficient for covering the user privacy requirements.
|
|
On the other hand, our proposed privacy model, {\thething} privacy, directly considers only the $5$ events of interest ($4$ {\thethings} $+ 1$ current event) in every release, thus changing the scope from all the time series to a significant subset of events.
|
|
Subsequently, it allocates $\frac{\varepsilon}{5}$ to each one of these events.
|
|
Consequently, we still achieve to protect all the significant events, while the utility of a perturbed event is higher than in the case of user-level privacy ($\frac{\varepsilon}{5}>\frac{\varepsilon}{8}$).
|
|
\end{example}
|
|
|
|
\input{problem/thething/contribution}
|
|
\input{problem/thething/problem}
|
|
\input{problem/thething/solution}
|