the-last-thing/text/problem/thething/main.tex

53 lines
5.4 KiB
TeX
Raw Normal View History

2021-10-08 21:32:06 +02:00
\section{Significant events}
\label{sec:thething}
2021-10-14 16:15:32 +02:00
The privacy mechanisms for the user, w-event and event levels that are already proposed in the literature, assume that in a time series any single event, or any sequence of events, or the entire series of events is equally privacy-significant for the users.
In reality, this is a simplistic\kat{I would not say simplistic, but unrealistic assumption that deteriorates unnecessarily the quality of the perturbed data} assumption.
The fact that an event is significant, can be related to certain user-defined privacy criteria, or to its adjacent events, as well as to the entire time series.
We term significant events as \emph{{\thething} events} or simply \emph{\thethings}, following relevant literature\kat{can you find some other work that uses the same term? otherwise one can raise the question why not ot use the word significant }.
Identifying {\thethings} in timeseries can be done in an automatic or manual way.
2021-10-08 21:32:06 +02:00
For example, in spatiotemporal data, \emph{places where an individual spent some time} denote \emph{points of interest} (POIs) (called also stay points)~\cite{zheng2015trajectory}.
Such events, and more particularly their spatial attribute values, can be less privacy-sensitive~\cite{primault2018long}, e.g.,~parks, theaters, etc. or, if individuals frequent them, they can reveal supplementary information, e.g.,~residences (home addresses)~\cite{gambs2010show}, places of worship (religious beliefs)~\cite{franceschi-bicchierairussell2015redditor}, etc.
POIs can be an example of how we can choose {\thethings}, but the idea is not limited to these.
Another example is the detection of privacy-sensitive user interactions by \emph{contact tracing} applications.
This can be practical in decease control~\cite{eames2003contact}, similar to the recent outbreak of the Coronavirus disease 2019 (COVID-19) epidemic~\cite{ahmed2020survey}.
Last but not least, {\thethings} in \emph{smart grid} electricity usage patterns could not only reveal the energy consumption of a user but also information regarding activities, e.g.,~`at work', `sleeping', etc. and types of appliances already installed or recently purchased~\cite{khurana2010smart}.
2021-10-14 16:15:32 +02:00
We stress out that {\thething} identification is an orthogonal problem to ours, and that we consider {\thethings} given as input to our problem.
2021-10-08 21:32:06 +02:00
\begin{example}
\label{ex:st-cont}
Figure~\ref{fig:st-cont} shows the case when we want to protect all of Bob's significant events ($p_1$, $p_3$, $p_5$, $p_8$) in his trajectory shown in Figure~\ref{fig:scenario}.
% That is, we have to allocate privacy budget $\varepsilon$ such that at any timestamp $t$ it holds that $\varepsilon_t + \varepsilon_1 + \varepsilon_3 + \varepsilon_5 + \varepsilon_8 \leq \varepsilon$.
In this scenario, event-level protection is not suitable since it can only protect one event at a time.
Hence, we have to apply user-level privacy protection by distributing equal portions of $\varepsilon$ to all the events, i.e.,~$\frac{\varepsilon}{8}$ to each one (the equivalent of applying $8$-event privacy).
In this way, we have protected the {\thething} points; we have allocated a total of $\frac{\varepsilon}{2}<\varepsilon$ to the {\thethings}.
\begin{figure}[htp]
\centering
2021-10-10 06:11:55 +02:00
\includegraphics[width=\linewidth]{problem/st-cont}
2021-10-08 21:32:06 +02:00
\caption{User-level and {\thething} $\varepsilon$-differential privacy protection for the time series of Figure~\ref{fig:scenario}.}
\label{fig:st-cont}
\end{figure}
However, perturbing by $\frac{\varepsilon}{8}$ each regular point deteriorates the data utility unnecessarily.
Notice that the overall privacy budget that we ended up allocating to the user-defined significant events is equal to $\frac{\varepsilon}{2}$ and leaves an equal amount of budget to distribute to any current event.
In other words, uniformly allocating $\frac{\varepsilon}{5}$ to every event would still achieve the Bob's privacy goal, i.e.,~protect every significant event, while achieving better utility overall.
\end{example}
We argue that protecting only {\thething} events along with any regular event release is sufficient for the user's protection, while it improves data utility.
Considering {\thething} events can prevent over-perturbing the data in the benefit of their final quality.
Take for example the scenario in Figure~\ref{fig:st-cont}, where {\thethings} are highlighted in gray.
If we want to protect the {\thething} points, we have to allocate at most a budget of $\varepsilon$ to the {\thethings}, while saving some for the release of regular events.
Essentially, the more budget we allocate to an event the less we protect it, but at the same time we maintain its utility.
With {\thething} privacy we propose to distribute the budget taking into account only the existence of the {\thethings} when we release an event of the time series, i.e.,~allocating $\frac{\varepsilon}{5}$ ($4\ \text{\thethings} + 1\ \text{regular point}$) to each event (see Figure~\ref{fig:st-cont}).
This way, we still guarantee that the {\thethings} are adequately protected, as they receive a total budget of $\frac{4\varepsilon}{5}<\varepsilon$.
At the same time, we avoid over-perturbing the regular events, as we allocate to them a higher total budget ($\frac{4\varepsilon}{5}$) than in user-level ($\frac{\varepsilon}{2}$), and thus less noise.
2021-09-07 16:06:42 +02:00
\input{problem/thething/contribution}
\input{problem/thething/problem}
2021-10-08 21:32:06 +02:00
\input{problem/thething/solution}