the-last-thing/text/problem/thething/main.tex
2022-01-07 06:09:10 +01:00

61 lines
6.3 KiB
TeX

\section{Significant events}
\label{sec:thething}
The privacy mechanisms for the user, $w$-event, and event levels that are already proposed in the literature, assume that in a time series any single event, or any sequence of events, or the entire series of events is equally privacy-significant for the users.
In reality, this is
% a simplistic
% \kat{I would not say simplistic, but unrealistic assumption that deteriorates unnecessarily the quality of the perturbed data}
an assumption that deteriorates unnecessarily the utility of the released data.
The significance of an event is related to certain user-defined privacy criteria, or to its adjacent events, as well as to the entire time series.
We term significant events as \emph{{\thething} events} or simply \emph{\thethings}, following relevant literature~\cite{gaskell2000telescoping}.
% \kat{can you find some other work that uses the same term? otherwise one can raise the question why not ot use the word significant }
% \mk{OK, but then again `significant privacy doesn't' sound great}
Identifying {\thethings} in time series can be done in an automatic or manual way.
For example, in spatiotemporal data, \emph{places where an individual spent some time} denote \emph{points of interest} (POIs) (called also stay points)~\cite{zheng2015trajectory}.
Such events, and more particularly their spatial attribute values, can be less privacy-sensitive~\cite{primault2018long}, e.g.,~parks, theaters, etc., or, if individuals frequent them, they can reveal supplementary information, e.g.,~residences (home addresses)~\cite{gambs2010show}, places of worship (religious beliefs)~\cite{franceschi-bicchierairussell2015redditor}, etc.
POIs can be an example of how we can choose {\thethings}, but the idea is not limited to these.
Another example is the detection of privacy-sensitive user interactions by \emph{contact tracing} applications.
This can be practical in decease control~\cite{eames2003contact}, similar to the recent outbreak of the Coronavirus disease 2019 (COVID-19) epidemic~\cite{ahmed2020survey}.
Last but not least, {\thethings} in \emph{smart grid} electricity usage patterns may not only reveal the energy consumption of a user but also information regarding activities, e.g.,~`at work', `sleeping', etc., or types of appliances already installed or recently purchased~\cite{khurana2010smart}.
We stress out that {\thething} identification is an orthogonal problem to ours, and that we consider {\thethings} given as input to our problem.
We argue that protecting only {\thething} events along with any regular event is sufficient for the user privacy protection, while it improves data utility with respect to the conventional user-level privacy.
Considering {\thethings} can prevent over-perturbing the data in the benefit of their final utility.
Revisiting the scenario in Figure~\ref{fig:st-cont}, if we want to protect the {\thething} points, we have to allocate at most a budget of $\varepsilon$ to the {\thethings}, while saving some for the release of regular events.
Essentially, the more budget we allocate to an event the less we protect it, but at the same time the more we maintain its utility.
With {\thething} privacy we propose to distribute the budget by accounting only for the {\thethings} when we release an event of the time series, i.e.,~allocating $\frac{\varepsilon}{5}$ ($4$ {\thethings} $+ 1$ regular point) to each event (see Figure~\ref{fig:st-cont}).
This way, we still guarantee
% \footnote{$\varepsilon$-differential privacy guarantees that the allocated budget should be less or equal to $\varepsilon$, and not precisely how much.
% \kat{Mano check.}
% \mk{It's not clear what you want to say}
% }
that the {\thethings} are adequately protected, as they receive a total budget of $\frac{4\varepsilon}{5} < \varepsilon$.
At the same time, we avoid over-perturbing the regular events, as we allocate to them a higher total budget ($\frac{4\varepsilon}{5}$) than in user-level ($\frac{\varepsilon}{2}$), and thus less noise.
Hence, at any timestamp we achieve an overall privacy protection bounded by $\varepsilon$ in the event set consisting of the released event and the {\thethings}.
\begin{example}
\label{ex:st-cont}
Continuing Example~\ref{ex:scenario}, Quackmore cares about protecting his {\thethings} ($p_1$, $p_3$, $p_5$, $p_8$) along with every release that he makes, however he is not equally interested for the other regular events in his trajectory.
More technically, he cares about allocating a total budget of $\varepsilon$ on any set of timestamps containing the {\thethings} and one regular event.
Event-level protection is not suitable for this case, since it can only protect one event at a time.
So, let us assume that we apply user-level privacy\footnote{In this scenario, in order to protect all the {\thethings} from timestamp $1$ to $8$, $w$ must be set to $8$, which makes $w$-event privacy equivalent to user-level.}, by distributing equal portions of $\varepsilon$ to all the events, i.e.,~$\frac{\varepsilon}{8}$ to each one (see Figure~\ref{fig:st-cont}).
Indeed, we have protected the {\thething} points plus one regular event at any release as expected; we have allocated a total of $\frac{5\varepsilon}{8}<\varepsilon$ to these $5$ events.
\begin{figure}[htp]
\centering
\includegraphics[width=.75\linewidth]{problem/st-cont}
\caption{User-level and {\thething} $\varepsilon$-differential privacy protection for the time series of Figure~\ref{fig:lmdk-scenario}.}
\label{fig:st-cont}
\end{figure}
However, perturbing by $\frac{\varepsilon}{8}$ each one of the regular points deteriorates the data utility unnecessarily; any budget lower than or equal to $\frac{4\varepsilon}{8}$ would be sufficient for covering the user privacy requirements.
On the other hand, our proposed privacy model, {\thething} privacy, directly considers only the $5$ events of interest ($4$ {\thethings} $+ 1$ current event) in every release, thus changing the scope from all the time series to a significant subset of events.
Subsequently, it allocates $\frac{\varepsilon}{5}$ to each one of these events.
Consequently, we still achieve to protect all the significant events, while the utility of a perturbed event is higher than in the case of user-level privacy ($\frac{\varepsilon}{5}>\frac{\varepsilon}{8}$).
\end{example}
\input{problem/thething/contribution}
\input{problem/thething/problem}
\input{problem/thething/solution}