Merge branch 'master' of git.delkappa.com:manos/the-last-thing

This commit is contained in:
Manos Katsomallos 2021-10-15 00:00:18 +02:00
commit 4e6f04cdbe

View File

@ -8,45 +8,48 @@ We term significant events as \emph{{\thething} events} or simply \emph{\thethin
Identifying {\thethings} in time series can be done in an automatic or manual way. Identifying {\thethings} in time series can be done in an automatic or manual way.
For example, in spatiotemporal data, \emph{places where an individual spent some time} denote \emph{points of interest} (POIs) (called also stay points)~\cite{zheng2015trajectory}. For example, in spatiotemporal data, \emph{places where an individual spent some time} denote \emph{points of interest} (POIs) (called also stay points)~\cite{zheng2015trajectory}.
Such events, and more particularly their spatial attribute values, can be less privacy-sensitive~\cite{primault2018long}, e.g.,~parks, theaters, etc. or, if individuals frequent them, they can reveal supplementary information, e.g.,~residences (home addresses)~\cite{gambs2010show}, places of worship (religious beliefs)~\cite{franceschi-bicchierairussell2015redditor}, etc. Such events, and more particularly their spatial attribute values, can be less privacy-sensitive~\cite{primault2018long}, e.g.,~parks, theaters, etc., or, if individuals frequent them, they can reveal supplementary information, e.g.,~residences (home addresses)~\cite{gambs2010show}, places of worship (religious beliefs)~\cite{franceschi-bicchierairussell2015redditor}, etc.
POIs can be an example of how we can choose {\thethings}, but the idea is not limited to these. POIs can be an example of how we can choose {\thethings}, but the idea is not limited to these.
Another example is the detection of privacy-sensitive user interactions by \emph{contact tracing} applications. Another example is the detection of privacy-sensitive user interactions by \emph{contact tracing} applications.
This can be practical in decease control~\cite{eames2003contact}, similar to the recent outbreak of the Coronavirus disease 2019 (COVID-19) epidemic~\cite{ahmed2020survey}. This can be practical in decease control~\cite{eames2003contact}, similar to the recent outbreak of the Coronavirus disease 2019 (COVID-19) epidemic~\cite{ahmed2020survey}.
Last but not least, {\thethings} in \emph{smart grid} electricity usage patterns could not only reveal the energy consumption of a user but also information regarding activities, e.g.,~`at work', `sleeping', etc. and types of appliances already installed or recently purchased~\cite{khurana2010smart}. Last but not least, {\thethings} in \emph{smart grid} electricity usage patterns may not only reveal the energy consumption of a user but also information regarding activities, e.g.,~`at work', `sleeping', etc., or types of appliances already installed or recently purchased~\cite{khurana2010smart}.
We stress out that {\thething} identification is an orthogonal problem to ours, and that we consider {\thethings} given as input to our problem. We stress out that {\thething} identification is an orthogonal problem to ours, and that we consider {\thethings} given as input to our problem.
\begin{example} We argue that protecting only {\thething} events along with any regular event release -- instead of protecting every event in the timeseries -- is sufficient for the user's protection, while it improves data utility.
\label{ex:st-cont} More specifically, important events are adequately protected, while less important ones are not excessively perturbed. \kat{something feels wrong with this statement, because in figure 2 regular and landmarks seem to receive the same amount of noise..}
%In fact, considering {\thething} events can prevent over-perturbing the data in the benefit of their final quality.
Figure~\ref{fig:st-cont} shows the case when we want to protect all of Bob's significant events ($p_1$, $p_3$, $p_5$, $p_8$) in his trajectory shown in Figure~\ref{fig:scenario}.
% That is, we have to allocate privacy budget $\varepsilon$ such that at any timestamp $t$ it holds that $\varepsilon_t + \varepsilon_1 + \varepsilon_3 + \varepsilon_5 + \varepsilon_8 \leq \varepsilon$.
In this scenario, event-level protection is not suitable since it can only protect one event at a time.
Hence, we have to apply user-level privacy protection by distributing equal portions of $\varepsilon$ to all the events, i.e.,~$\frac{\varepsilon}{8}$ to each one (the equivalent of applying $8$-event privacy).
In this way, we have protected the {\thething} points; we have allocated a total of $\frac{\varepsilon}{2}<\varepsilon$ to the {\thethings}.
\begin{figure}[htp]
\centering
\includegraphics[width=\linewidth]{problem/st-cont}
\caption{User-level and {\thething} $\varepsilon$-differential privacy protection for the time series of Figure~\ref{fig:scenario}.}
\label{fig:st-cont}
\end{figure}
However, perturbing by $\frac{\varepsilon}{8}$ each regular point deteriorates the data utility unnecessarily.
Notice that the overall privacy budget that we ended up allocating to the user-defined significant events is equal to $\frac{\varepsilon}{2}$ and leaves an equal amount of budget to distribute to any current event.
In other words, uniformly allocating $\frac{\varepsilon}{5}$ to every event would still achieve the Bob's privacy goal, i.e.,~protect every significant event, while achieving better utility overall.
\end{example}
We argue that protecting only {\thething} events along with any regular event release is sufficient for the user's protection, while it improves data utility.
Considering {\thething} events can prevent over-perturbing the data in the benefit of their final quality.
Take for example the scenario in Figure~\ref{fig:st-cont}, where {\thethings} are highlighted in gray. Take for example the scenario in Figure~\ref{fig:st-cont}, where {\thethings} are highlighted in gray.
If we want to protect the {\thething} points, we have to allocate at most a budget of $\varepsilon$ to the {\thethings}, while saving some for the release of regular events. If we want to protect the {\thething} points, we have to allocate at most a budget of $\varepsilon$ to the {\thethings}, while saving some for the release of regular events.
Essentially, the more budget we allocate to an event the less we protect it, but at the same time we maintain its utility. Essentially, the more budget we allocate to an event the less we protect it, but at the same time we maintain its utility.
With {\thething} privacy we propose to distribute the budget taking into account only the existence of the {\thethings} when we release an event of the time series, i.e.,~allocating $\frac{\varepsilon}{5}$ ($4\ \text{\thethings} + 1\ \text{regular point}$) to each event (see Figure~\ref{fig:st-cont}). With {\thething} privacy we propose to distribute the budget taking into account only the existence of the {\thethings} when we release an event of the time series, i.e.,~allocating $\frac{\varepsilon}{5}$ ($4\ \text{\thethings} + 1\ \text{regular point}$) to each event (see Figure~\ref{fig:st-cont}).
This way, we still guarantee that the {\thethings} are adequately protected, as they receive a total budget of $\frac{4\varepsilon}{5}<\varepsilon$. This way, we still guarantee\footnote{$\epsilon$-differential privacy guarantees that the allocated budget should be less or equal to $\epsilon$, and not precisely how much.\kat{Mano check.}} that the {\thethings} are adequately protected, as they receive a total budget of $\frac{4\varepsilon}{5}<\varepsilon$.
At the same time, we avoid over-perturbing the regular events, as we allocate to them a higher total budget ($\frac{4\varepsilon}{5}$) than in user-level ($\frac{\varepsilon}{2}$), and thus less noise. At the same time, we avoid over-perturbing the regular events, as we allocate to them a higher total budget ($\frac{4\varepsilon}{5}$) compared to the user-level scenario ($\frac{\varepsilon}{2}$), and thus less noise.
\begin{example}
\label{ex:st-cont}
Figure~\ref{fig:st-cont} shows the case when we want to protect all of Bob's significant events ($p_1$, $p_3$, $p_5$, $p_8$) in his trajectory shown in Figure~\ref{fig:scenario}.
% That is, we have to allocate privacy budget $\varepsilon$ such that at any timestamp $t$ it holds that $\varepsilon_t + \varepsilon_1 + \varepsilon_3 + \varepsilon_5 + \varepsilon_8 \leq \varepsilon$.
In this scenario, event-level protection is not suitable since it can only protect one event at a time.
Hence, we have to apply user-level privacy protection by distributing equal portions of $\varepsilon$ to all the events, i.e.,~$\frac{\varepsilon}{8}$ to each one (the equivalent of applying $8$-event privacy).
In this way, we have protected the {\thething} points; we have allocated a total of $\frac{\varepsilon}{2}<\varepsilon$ to the {\thethings}.
\begin{figure}[htp]
\centering
\includegraphics[width=\linewidth]{problem/st-cont}
\caption{User-level and {\thething} $\varepsilon$-differential privacy protection for the time series of Figure~\ref{fig:scenario}.}
\label{fig:st-cont}
\end{figure}
However, perturbing by $\frac{\varepsilon}{8}$ each regular point deteriorates the data utility unnecessarily.
Notice that the overall privacy budget that we ended up allocating to the user-defined significant events is equal to $\frac{\varepsilon}{2}$ and leaves an equal amount of budget to distribute to any current event.
In other words, uniformly allocating $\frac{\varepsilon}{5}$ to every event would still achieve the Bob's privacy goal, i.e.,~protect every significant event, while achieving better utility overall.
\end{example}
\input{problem/thething/contribution} \input{problem/thething/contribution}
\input{problem/thething/problem} \input{problem/thething/problem}
\input{problem/thething/solution} \input{problem/thething/solution}