Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising^{1}^{1}1To appear at the 35th International Conference on Machine Learning (ICML), 2018
Abstract
The Gaussian mechanism is an essential building block used in multitude of differentially private data analysis algorithms. In this paper we revisit the Gaussian mechanism and show that the original analysis has several important limitations. Our analysis reveals that the variance formula for the original mechanism is far from tight in the high privacy regime () and it cannot be extended to the low privacy regime (). We address this limitations by developing an analytic Gaussian mechanism whose variance is calibrated directly using the Gaussian cumulative density function instead of a tail bound approximation. We also propose to equip the Gaussian mechanism with a post-processing step based on adaptive denoising estimators by leveraging that the variance of the perturbation is known. Our experiments show that analytical calibration removes at least a third of the variance of the noise compared to the classical Gaussian mechanism, and that denoising dramatically improves the accuracy of the Gaussian mechanism in the high-dimensional regime.
1 Introduction
Output perturbation is a cornerstone of mechanism design in differential privacy (DP). Well-known mechanisms in this class are the Laplace and Gaussian mechanisms Dwork et al. (2006); Dwork and Roth (2014). More complex mechanisms are often obtained by composing multiple applications of these basic output perturbation mechanisms. For example, the Laplace mechanism is the basic building block of the sparse vector mechanism Dwork et al. (2009), and the Gaussian mechanism is the building block of private empirical risk minimization algorithms based on stochastic gradient descent Bassily et al. (2014). Analysing the privacy of such complex mechanisms turns out to be a delicate and error-prone task Lyu et al. (2017). In particular, obtaining tight privacy analyses leading to optimal utility is one of the main challenges in the design of advanced DP mechanisms. An alternative to tight a-priori analyses is to equip complex mechanisms with algorithmic noise calibration and accounting methods. These methods use numerical computations to, e.g. callibrate perturbations and compute cumulative privacy losses at run time, without relying on hand-crafted worst-case bounds. For example, recent works have proposed methods to account for the privacy loss under compositions occurring in complex mechanisms Rogers et al. (2016); Abadi et al. (2016).
In this work we revisit the Gaussian mechanism and develop two ideas to improve the utility of output perturbation DP mechanisms based on Gaussian noise. The first improvement is an algorithmic noise calibration strategy that uses numerical evaluations of the Gaussian cumulative density function (CDF) to calibrate the variance of Gaussian perturbations without relying on tail bounds. The analysis and the resulting algorithm are provided in Section 3. In order to motivate the need for a numerical approach to calibrate the noise of a DP Gaussian perturbation mechanism, we start with an analysis of the main limitations of the classical Gaussian mechanism in Section 2.
The second improvement equips the Gaussian perturbation mechanism with a post-processing step based on adaptive denoising estimators coming from the statistical estimation literature. Since DP is preserved by post-processing and the distribution of the perturbation added to the desired outcome is known, this allows a mechanism to achieve the desired privacy guarantee while increasing the accuracy of the released value. The relevant denoising estimators and their utility guarantees are discussed in Section 4. This is aligned with other works that considered private learning algorithms based on denoising privatized sufficient statistics. For example, Bernstein et al. (2017) used Expectation–Maximization to obtain the parameters of a class of graphical models starting from noisy empirical moments obtained from the Laplace mechanism. A key difference is that our approach provides tight utility analyses in the Bayesian and minimax sense because in the Gaussian case one can leverage sophisticated results from the adaptive estimation literature.
Our contributions also include a thorough experimental evaluation showcasing the benefits of numerical noise calibration and denoising using synthetic data. We also evaluate our mechanism on a New York City taxi dataset. The experimental results are described in Section 5 and yield dramatic utility improvements in the high-privacy and high-dimensional regimes.
2 Limitations of the Classical Gaussian Mechanism
Let be an input space equipped with a symmetric neighbouring relation . Let and be two privacy parameters. A -valued randomized algorithm is -DP if for every pair of neighbouring inputs and every possible (measurable) output set the following inequality holds:
(1) |
The definition of DP captures the intuition that a computation on private data will not reveal sensitive information about individuals in a dataset if removing or replacing an individual in the dataset has a negligible effect in the output distribution.
In this paper we focus on the family of so-called output perturbation DP mechanisms. An output perturbation mechanism for a deterministic vector-valued computation is obtained by computing the function on the input data and then adding random noise sampled from a random variable to the output. The amount of noise required to ensure the mechanism satisfies a given privacy guarantee typically depends on how sensitive the function is to changes in the input and the specific distribution chosen for . The Gaussian mechanism gives a way to calibrate a zero mean isotropic Gaussian perturbation to the global sensitivity of as follows.
Theorem 1 (Classical Gaussian Mechanism).
For any , the Gaussian output perturbation mechanism with is -DP.
A natural question one can ask about this result is whether this value of provides the minimal amount of noise required to obtain -DP with Gaussian perturbations. Another natural question is what happens in the case . This section addresses both these questions. First we show that the value of given in Theorem 1 is suboptimal in the high privacy regime . Then we show that this problem is in fact inherent to the usual proof strategy used to analyze the Gaussian mechanism. We conclude the section by showing that for large values of the standard deviation of a Gaussian perturbation that provides )-DP must scale like . This implies that the scaling provided by the classical Gaussian mechanism in the range cannot be extended beyond any bounded interval.
2.1 Limitations in the High Privacy Regime
To illustrate the sub-optimality of the classical Gaussian mechanism in the regime we start by showing it is possible to achieve -DP using Gaussian perturbations. This clearly falls outside the capabilities of the classical Gaussian mechanism, since the standard deviation provided by Theorem 1 grows to infinity as .
Theorem 2.
A Gaussian output perturbation mechanism with is -DP^{3}^{3}3Proofs for all results given in the paper are presented in Appendix A..
Previous analyses of the Gaussian mechanism are based on a simple sufficient condition for DP in terms of the privacy loss random variable Dwork and Roth (2014). The next section explains why the usual analysis of the Gaussian mechanism cannot yield tight bounds for the regime . This shows that our example is not a corner case, but a fundamental limitation of trying to establish -DP through said sufficient condition.
2.2 Limitations of Privacy Loss Analyses
Given a vector-valued mechanism let denote the density of the random variable . The privacy loss function of on a pair of neighbouring inputs is defined as
The privacy loss random variable is the transformation of the output random variable by the function . For the particular case of a Gaussian mechanism with it is well-known that the privacy loss random variable is also Gaussian Dwork and Rothblum (2016).
Lemma 3.
The privacy loss of a Gaussian output perturbation mechanism follows a distribution with , where .
The privacy analysis of the classical Gaussian mechanism relies on the following sufficient condition: a mechanism is -DP if the privacy loss satisfies
(2) |
Since Lemma 3 shows the privacy loss of the Gaussian mechanism is a Gaussian random variable with mean , we have for any pair of datasets with . This observation shows that in general it is not possible to use this sufficient condition for -DP to prove that the Gaussian mechanism achieves -DP for any . In other words, the sufficient condition is not necessary in the regime . We conclude that an alternative analysis is required in order to improve the dependence on in the Gaussian mechanism.
2.3 Limitations in the Low Privacy Regime
The last question we address in this section is whether the order of magnitude given by Theorem 1 for can be extended to privacy parameters of the form . We show this is not the case by providing the following lower bound.
Theorem 4.
Let have global sensitivity . Suppose and . If the mechanism with is -DP, then .
Note that as the upper bound on in Theorem 4 converges to . Thus, as increases the range of ’s requiring noise of the order increases to include all parameters of practical interest. This shows that the rate provided by the classical Gaussian mechanism cannot be extended beyond the interval . Note this provides an interesting contrast with the Laplace mechanism, which can achieve -DP with standard deviation in the low privacy regime.
3 The Analytic Gaussian Mechanism
The limitations of the classical Gaussian mechanism described in the previous section suggest there is room for improvement in the calibration of the variance of a Gaussian perturbation to the corresponding global sensitivity. Here we present a method for optimal noise calibration for Gaussian perturbations that we call analytic Gaussian mechanism. To do so we must address the two sources of slack in the classical analysis: the sufficient condition (2) used to reduce the analysis to finding an upper bound for , and the use of a Gaussian tail approximation to obtain such upper bound. We address the first source of slack by showing that the sufficient condition in terms of the privacy loss random variable comes from a relaxation of a necessary and sufficient condition involving two privacy loss random variables. When specialized to the Gaussian mechanism, this condition involves probabilities about Gaussian random variables, which instead of approximating by a tail bound we represent explicitly in terms of the CDF of the standard univariate Gaussian distribution:
Using this point of view, we introduce a calibration strategy for Gaussian perturbations that requires solving a simple optimization problem involving . We discuss how to solve this optimization at the end of this section.
The first step in our analysis is to provide a necessary and sufficient condition for differential privacy in terms of privacy loss random variables. This is captured by the following result.
Theorem 5.
A mechanism is -DP if and only if the following holds for every :
(3) |
Note that Theorem 5 immediately implies the sufficient condition given in (2) through the inequality
Now we can use Lemma 3 to specialize (3) for a Gaussian output perturbation mechanism. The relevant computations are packaged in the following result, where we express the probabilities in (3) in terms of the Gaussian CDF .
Lemma 6.
Suppose is a Gaussian output perturbation mechanism with . For any let . Then the following hold for any :
(4) | ||||
(5) |
This result specializes the left hand side of (3) in terms of the distance between the output means on a pair of neighbouring datasets. To complete the derivation of our analytic Gaussian mechanism we need to ensure that (3) is satisfied for every pair . The next lemma shows that this reduces to plugging the global sensitivity in the place of in (4) and (5).
Lemma 7.
For any , the function defined as follows is monotonically increasing:
Now we are ready to state our main result, whose proof follows directly from Theorem 5, Lemma 7, and equations (4) and (5).
Theorem 8 (Analytic Gaussian Mechanism).
Let be a function with global sensitivity . For any and , the Gaussian output perturbation mechanism with is -DP if and only if
(6) |
This result shows that in order to obtain an -DP Gaussian output perturbation mechanism for a function with global sensitivity it is enough to find a noise variance satisfying (6). One could now use upper and lower bounds for the tail of the Gaussian CDF to derive an analytic expression for a parameter satisfying this constraint. However, this again leads to a suboptimal result due to the slack in these tail bounds in the non-asymptotic regime. Instead, we propose to find using a numerical algorithm by leveraging the fact that the Gaussian CDF can be written as , where is the standard error function. Efficient implementations of this function to very high accuracies are provided by most statistical and numerical software packages. However, this strategy requires some care in order to avoid numerical stability issues around the point where the expression in (6) changes sign. Thus, we further massage the left hand side (6) we obtain the implementation of the analytic Gaussian mechanism given in Algorithm 1. The correctness of this implementation is provided by the following result.
Theorem 9.
Let be a function with global sensitivity . For any and , the mechanism described in Algorithm 1 is -DP.
Given a numerical oracle for computing based on the error function it is relatively straightforward to implement a solver for finding the values and needed in Algorithm 1. For example, using the fact that is monotonically increasing we see that computing is a root finding problem for which one can use Newton’s method since the derivative of can be computed in closed form using Leibniz’s rule. In practice we find that a simple scheme based on binary search initiated from an interval obtained by finding the smallest such that provides a very efficient and robust way to find up to arbitrary accuracies (the same applies to ).
4 Optimal Denoising
Can we improve the performance of analytical Gaussian mechanism even further? The answer is “yes” and “no”. We can’t because Algorithm 1 is already the exact calibration of the Gaussian noise level to the given privacy budget. But if we consider the problem of designing the best differentially private procedure that approximates , then there could still be room for improvement.
In this section, we consider a specific class of mechanisms that denoise the output of a Gaussian mechanism. Let , we are interested to design a post-processing function such that is closer to than . This class of mechanisms are of particular interest for differential privacy because (1) since differential privacy is preserved by post-processing, releasing a function of a differentially private output is again differentially private; (2) since information about and the distribution of the noise are publicly known, this information can be leveraged to design denoising functions.
This is a statistical estimation problem, where is the underlying parameter and is the data. The fact that we add the noise ourselves allows us to tap into nearly a century’s worth of research in statistical estimation and come up with powerful post-processing techniques that are simultaneously optimal and parameter-free. It is worth pointing out that this is one of the rare cases where classical statistical theory on Gaussian models can be applied as is, because the Gaussian assumption is now true by construction.
This is however also an unusual estimation problem where all we observe is a single data point. Since is the maximum likelihood estimator, if there is no additional information about , we cannot hope to improve the estimation error uniformly over all . But there is still something we can do when we consider either of the following assumptions: (A.1) is drawn from some underlying distribution, thus inducing some distribution on ; or, (A.2) for some , where is the -norm (or pseudo-norm when ).
Optimal Bayesian denoising.
Assumption A.1. translates the problem of optimal denoising into a Bayesian estimation problem, where the underlying parameter has a prior distribution, and the task is to find an estimator that attains the Bayes risk — the minimum of the average estimation error integrated over a prior , defined as
For square loss, the Bayes estimator is simply the posterior mean estimator, as the following theorem shows:
Theorem 10.
Let and assume the induced distribution of is square integrable. Then the Bayes estimator is given by
The proof can be found in any standard statistics textbook (see, e.g., Lehmann and Casella, 2006). One may ask what the corresponding MSE is and how much it improves over the version without post-processing. The answer depends on the prior and the amount of noise added for differential privacy. When , the posterior mean estimator can be written analytically into
(7) |
and the corresponding Bayes risk is
(8) |
In other word, we get a factor of improvement over simply using .
In general, there is no analytical form for the posterior mean, but if we can evaluate the density of or sample from the distribution of , then we can obtain an arbitrarily good approximation of using Markov Chain Monte Carlo techniques.
Optimal frequentist denoising.
Assumption A.2. spells out a minimax estimation problem, where the underlying parameter is assumed to be within a set . In particular, we are interested in finding that attains the minimax risk
associated with balls of radius
A complete characterization of this minimax risk (up to a constant) is given by Birgé and Massart (2001, Proposition 5), which shows that in the non-trivial region ^{4}^{4}4When for a constant that depends only on . of the signal to noise ratio , the ball satisfies
(9) |
for and when , Donoho et al. (1990) show that
Deriving exact minimax estimators is challenging and people often work with certain asymptotic regimes (see the case for by Bickel et al. (1981)), but several techniques have been shown to match up to a small constant factor in finite sample (see, e.g., Donoho et al., 1990; Donoho and Johnstone, 1994).
What this means is that when we have the additional information that is in some ball, then we can improve the square error from to . This could be especially helpful in the high-dimensional case for . For instance if and , then we obtain a risk , which improves exponentially in over the risk of . More practically, if is a sparse histogram with non-zero elements, then taking will result in an error bound on the order of , which is linear in the sparsity rather than the dimension .
Adaptive estimation.
What if we do not know the prior parameter , or a right choice of and ? Can we still come up with estimators that take advantage of these structures? It turns out that this is the problem of designing adaptive estimators which sits at the heart of statistical research. An adaptive estimator in our case, is one that does not need to know or a pair of and , yet behave nearly as well as Bayes estimator that knows or the minimax estimator that knows and for each parameter regime.
We first give an example of an adaptive Bayes estimator that does not require us to specify a prior, yet can perform almost as well as the optimal Bayes estimator for all isotropic Gaussian prior simultaneously.
Theorem 11 (James-Stein estimator and its adaptivity).
When , substituting in with its maximum likelihood estimate under
produces the James-Stein estimator
Moreover, it has an MSE
The James-Stein estimator has the property that it always improves the MLE when (Stein, 1956) and it always achieves a risk that is within a multiplicative factor of the Bayes risk in (8) for any .
We now move on to describe a method that is adaptive to and in minimax estimation. Quite remarkably, Donoho (1995) shows that choosing in the soft-thresholding estimator
(10) |
yields a nearly optimal estimator for every ball.
Theorem 12 (The adaptivity of soft-thresholding, Theorem 4.2 of (Donoho, 1995)).
Let for some . The soft-thresholding estimator with obeys that
The result implies that the soft-thresholding estimator is nearly optimal for all balls up to a multiplicative factor of . In other words,
Thanks to the fact that we know the parameter exactly, both and are now completely free of tuning parameters. Yet, they can achieve remarkable adaptivity that covers a large class of model assumptions and function classes. A relatively small price to pay for such adaptivity is that we might lose a constant (or a ) factor. Whether such adaptivity is worth it varies on a case-by-case basis.
Take the problem of private releasing a histogram of items in bins. Theorem 12 and Equation (9) with imply that the soft-thresholding estimator obeys
where denotes the number of nonzero elements in and is the largest power-law exponent greater than such that order statistics for all and hides logarithmic factors in . The fact that implies that the soft-thresholding estimator improves over the naive private release for all and the factor suggests that we can take advantage of an unknown power law distribution even if the histogram is not really sparse. This makes our technique effective in the many data mining problems where power law distributions occur naturally (Faloutsos et al., 1999).
A note on related work and novelty.
To conclude the section, we would like to emphasize again that the optimal denoising techniques that we described above are not our novel contributions. They are results known in the statistics literature for decades. The purpose of our exposition is to compile facts that are relevant to the practice of DP and initiate a systematic study of the application of these ideas.
As a matter of fact, denoising as a post-processing step in the context of differentially privacy is not new either. Notably, Barak et al. (2007); Hay et al. (2009) shows that post-processing steps that enforces the consistency of contingency table releases and graph degree sequences could lead to more accurate estimations. Williams and McSherry (2010) sets up the general statistical (Bayesian) inference problem of DP releases by integrating auxiliary information (a prior). Karwa et al. (2016) exploits the known distribution of the added noise from privacy in the inference procedure of a network model and shows that it helps to preserve asymptotic efficiency. Nikolov et al. (2013) demonstrates that projecting linear regression solutions to a known -ball can improves the estimation error from to when the underlying ground truth is sparse.
In all the above references, there are some known prior knowledge (constraint sets, sparsity or Bayesian prior) that one uses to improve the utility of DP releases. To the best of our knowledge, we are the first to consider “adaptive estimation” and demonstrate how classical techniques can be helpful even without such prior knowledge and how they can perform optimal denoising on a variety of minimax setups at the same time.
5 Numerical Experiments
This section provides an experimental evaluation of the improvements in utility provided by our two contributions. First we numerically compare the variance of the analytic Gaussian mechanism and the classical mechanism for a variety of privacy parameters. Then we evaluate the contributions of denoising and analytic calibration against a series of baselines for the task of private mean estimation using synthetic data. We also evaluate several denoising strategies on the task of releasing heat maps based on the New York City taxi dataset under differential privacy. Further experiments are presented in Appendix B, including an evaluation of our denoising strategies for the task of private histogram release.
5.1 Analytic Gaussian Mechanism
We implemented Algorithm 1 in Python and run experiments to compare the standard deviation of the perturbation obtained with the analytic Gaussian mechanism versus the standard deviation required by the classical Gaussian mechanism. In all our experiments the values of and were solved up to an accuracy of using binary search and the implementation of the function provided by SciPy Jones et al. (2001). Overall, our experiments required calls to Algorithm 1 which took in total less than one second in a standard laptop.
The results are presented in the two leftmost panels in Figure 1. As expected, the plots show that as our new method outperforms the classical mechanism by several orders of magnitude. Furthermore, we see that even for values of close to our mechanism reduces the noise standard deviation by a factor of or more, with higher improvements for larger values of .
5.2 Denoising for Mean Estimation
In this section we evaluate the utility of denoising combined with the analytical Gaussian mechanism for the task of private mean estimation. The input to the mechanism is a dataset containing vectors and the underlying deterministic functionality is . This relatively simple task is a classic example from the family of linear queries which are frequently considered in the differential privacy literature, and it arises in a number of applications, including releasing summary statistics and performing stochastic optimization. We compare the utility of several mechanisms for releasing a private version of in terms of the euclidean distance . We evaluate a series of mechanisms, including the analytical Gaussian mechanism with either James-Stein denoising cf. Theorem 11 (aGM-JS) or optimal thresholding denoising cf. Theorem 12 (aGM-TH), as well as several baselines to asses the relative improvements provided by denoising and analytic calibration. In particular, our set of baselines includes the classical Gaussian mechanism (cGM), the analytical Gaussian mechanism without denoising (aGM), and the classical Laplace mechanism (Lap) using the same parameter as the Gaussian mechanisms.
To provide a thorough comparison of all the mechanisms listed above we explore the effect on the utility of the different parameters of the problem. The main parameters of the problem are the dimension and the privacy parameters and . The dimension affects the utility through the error bounds provided in Theorem 11 and Theorem 12. The privacy parameters affect the accuracy through the variance of the mechanism, which is also affected by the sample size through the global sensitivity. This means we can characterize the effect of by keeping fixed and trying several values for the privacy parameters. In our experiments we fix the sample size to and the privacy parameter while trying several values for .
The last parameter that affects the utility guarantees is the “size” of , controlled either through the variance or the norm ball . Since the two denoising estimators we use in our mechanisms are adaptive to these parameters and do not need to know them in advance, in our experiments we sample the dataset repeatedly to obtain a diversity of values for . Each dataset is sampled as follows: we first sample a center and then let with , where each is i.i.d. with independent coordinates sampled uniformly from the interval . Thus, in each dataset the points all lie in an -ball of radius , leading to a global sensitivity and a global sensitivity . These are used to calibrate the Gaussian and Laplace perturbations, respectively.
The results of our experiments are presented in two rightmost panels of Figure 1. Each point in every plot is the result of averaging the error obtained in repetitions with a different sampled dataset every time. The first plots shows uses and shows the denoised methods improve the accuracy over all the other methods, sometimes by orders of magnitude. The second plot shows that for this problem the James-Stein estimator provides better accuracy in the high-dimensional setting.
5.3 New York City Taxi Heat Maps
In this section, we apply our method to New York City taxi data. The data set is a collection of time-stamped pick-ups and drop-offs of taxi drivers and we are interested in sharing a density map of such pick-ups and drop-offs in Manhattan at a specific time of a specific day under differential privacy.
This is a problem of significant practical interest. Ever since the NYC Taxi & Limousine Commission released this data set, there has been multiple independent reports concerning the security and privacy risk this data set poses for taxi drivers and their passengers (see, e.g., Pandurangan, ; Douriez et al., 2016). The techniques presented in this paper allow us to provably prevent individuals (on both the per-trip level and per-cab level) in the dataset from being identified, while remarkably, permitting the release of rich information about the data set that has fine-grained spatial and temporal resolution.
Specifically, we apply the analytical Gaussian mechanism to release the number of picks-ups and drop-offs at every traffic junction in Manhattan. There are a total of 3,784 such traffic junctions and they are connected by 7,070 sections of roads. We will treat them as nodes and edges of a graph. In the post-processing phase, we apply graph smoothing techniques to reveal the underlying signal despite the noise due to aGM. Specifically, we compare the JS-estimator and the soft-thresholding estimator we described in Section 4, as well as the same soft-thresholding estimator applied to the coefficients of a graph wavelet transform due to Sharpnack et al. (2013). The basis transformation is important because the data might be sparser in the transformed domain. For reference, we also include the state-of-the-art graph smoothing techniques called graph trend filtering (Wang et al., 2016), which has one additional tuning parameter but has been shown to perform significantly better than wavelet smoothing in practice.
In the experiment that follows, we assume that every taxi driver can have a maximum of trips within an hour ^{5}^{5}5This is a conservative but reasonable estimate and can be enforced by preprocessing the data. so that we have a global -sensitivity of . By the norm inequality, we know that the sensitivity implies the sensitivity that we use for the analytical Gaussian mechanism in this paper. In addition, all results are presented for data collected on September 24, 2014 and the data within each hour are gathered and distributed to each traffic junction through a kernel density estimator. The detailed approach is documented in Doraiswamy et al. (2014) and we thank the authors for sharing their processed data.
Here we present some qualitative comparisons in Figure 2, where we visualize the privately released heat map with and without post-processing. As we can see, at different time of the day, the active hotspots of taxi activities are clearly different. Note that there might be a gap the results with smallest approximation to the input data and the most-visually appealing denoised results, as arguably, the original data is also noisy and what we are interested is to infer the underlying mean. Relatively speaking, trend filtering performs better than wavelet smoothing, but both approaches significantly improves the RMSE over the unprocessed differential privacy release. The results in Appendix B provide quantitative results by comparing the mean square error (MSE) of cGM, aGM as well as the aforementioned denoising techniques.
Lastly, we remark that all experiments are stated for the stronger cab-level differential privacy. To achieve trip-level differential privacy the global sensitivity over the entire 5 year data sets is reduces to and we can release heat maps with arbitrary time and spatial resolution by adding noise proportional to to every value of the spatio-temporal heat map of interest.
6 Conclusion and Discussion
In this paper, we embark on a journey of pushing the utility limit of Gaussian mechanism for -differential privacy. We propose a novel analytical Gaussian mechanism which calibrates the noise using the exact CDF of Gaussian distribution rather than concentration inequalities, yielding at least a factor of improvement in all cases and much more in cases when is small. We also review decades of research in statistical estimation theory and show that combining these techniques with differential privacy one obtains powerful adaptivity that denoises differentially private outputs nearly optimally without additional hyperparameters. On synthetic data and on the New York City Taxi dataset we illustrate a significant gain in estimation error and fine-grained spatial-temporal resolution.
There are a number of theoretical problems of interest for future work. First, on the problem of differentially private estimation. Our post-processing approach effectively restricts our choice of algorithms to the composition of privacy release and post-processing. While we now know that we are optimal in both components, it is unclear whether we lose anything relative to the best differentially private algorithms. Secondly, the analytical calibration proposed in this paper is optimal for achieving -DP with Gaussian noise. But when building complex mechanisms we are stuck in the dilemma of choosing between (a) using the aGM with the advanced composition (Kairouz et al., 2015); or, (b) using Rényi DP (Mironov, 2017) or zCDP (Bun and Steinke, 2016) for tighter composition and calculate the from moment bounds. While (a) is tighter in the calculation the privacy parameters of each intermediate value, (b) is tighter in the composition but cannot take advantage of aGM. It would be interesting if we could get the best of both worlds.
References
- Abadi et al. (2016) Martín Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308–318. ACM, 2016.
- Barak et al. (2007) Boaz Barak, Kamalika Chaudhuri, Cynthia Dwork, Satyen Kale, Frank McSherry, and Kunal Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 273–282. ACM, 2007.
- Bassily et al. (2014) Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Foundations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on, pages 464–473. IEEE, 2014.
- Bernstein et al. (2017) Garrett Bernstein, Ryan McKenna, Tao Sun, Daniel Sheldon, Michael Hay, and Gerome Miklau. Differentially private learning of undirected graphical models using collective graphical models. In International Conference on Machine Learning (ICML), 2017.
- Bickel et al. (1981) PJ Bickel et al. Minimax estimation of the mean of a normal distribution when the parameter space is restricted. The Annals of Statistics, 9(6):1301–1309, 1981.
- Birgé and Massart (2001) Lucien Birgé and Pascal Massart. Gaussian model selection. Journal of the European Mathematical Society, 3(3):203–268, 2001.
- Bun and Steinke (2016) Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer, 2016.
- Donoho (1995) David L Donoho. De-noising by soft-thresholding. IEEE transactions on information theory, 41(3):613–627, 1995.
- Donoho and Johnstone (1994) David L Donoho and Iain M Johnstone. Minimax risk over p-balls for p-error. Probability Theory and Related Fields, 99(2):277–303, 1994.
- Donoho et al. (1990) David L Donoho, Richard C Liu, and Brenda MacGibbon. Minimax risk over hyperrectangles, and implications. The Annals of Statistics, pages 1416–1437, 1990.
- Doraiswamy et al. (2014) Harish Doraiswamy, Nivan Ferreira, Theodoros Damoulas, Juliana Freire, and Cláudio T Silva. Using topological analysis to support event-guided exploration in urban data. IEEE transactions on visualization and computer graphics, 20(12):2634–2643, 2014.
- Douriez et al. (2016) Marie Douriez, Harish Doraiswamy, Juliana Freire, and Cláudio T Silva. Anonymizing nyc taxi data: Does it matter? In Data Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on, pages 140–148. IEEE, 2016.
- Dwork and Roth (2014) Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014.
- Dwork and Rothblum (2016) Cynthia Dwork and Guy N Rothblum. Concentrated differential privacy. arXiv preprint arXiv:1603.01887, 2016.
- Dwork et al. (2006) Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography, pages 265–284. Springer, 2006.
- Dwork et al. (2009) Cynthia Dwork, Moni Naor, Omer Reingold, Guy N Rothblum, and Salil Vadhan. On the complexity of differentially private data release: efficient algorithms and hardness results. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 381–390. ACM, 2009.
- Faloutsos et al. (1999) Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. On power-law relationships of the internet topology. In ACM SIGCOMM computer communication review, volume 29, pages 251–262. ACM, 1999.
- Gordon (1941) Robert D Gordon. Values of mills’ ratio of area to bounding ordinate and of the normal probability integral for large values of the argument. The Annals of Mathematical Statistics, 12(3):364–366, 1941.
- Hay et al. (2009) Michael Hay, Chao Li, Gerome Miklau, and David Jensen. Accurate estimation of the degree distribution of private networks. In Data Mining, 2009. ICDM’09. Ninth IEEE International Conference on, pages 169–178. IEEE, 2009.
- Jones et al. (2001) Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientific tools for Python, 2001. URL http://www.scipy.org/.
- Kairouz et al. (2015) Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. In International Conference on Machine Learning (ICML), 2015.
- Karwa et al. (2016) Vishesh Karwa, Aleksandra Slavković, et al. Inference using noisy degrees: Differentially private -model and synthetic graphs. The Annals of Statistics, 44(1):87–112, 2016.
- Lehmann and Casella (2006) Erich L Lehmann and George Casella. Theory of point estimation. Springer Science & Business Media, 2006.
- Lyu et al. (2017) Min Lyu, Dong Su, and Ninghui Li. Understanding the sparse vector technique for differential privacy. Proceedings of the VLDB Endowment, 10(6):637–648, 2017.
- Mironov (2017) Ilya Mironov. Renyi differential privacy. In Computer Security Foundations Symposium (CSF), 2017 IEEE 30th, pages 263–275. IEEE, 2017.
- Nikolov et al. (2013) Aleksandar Nikolov, Kunal Talwar, and Li Zhang. The geometry of differential privacy: the sparse and approximate cases. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 351–360. ACM, 2013.
- (27) Vijay Pandurangan. On taxi and rainbows. https://tech.vijayp.ca/of-taxis-and-rainbows-f6bc289679a1. Accessed: 2014-06-21.
- Rogers et al. (2016) Ryan M Rogers, Aaron Roth, Jonathan Ullman, and Salil Vadhan. Privacy odometers and filters: Pay-as-you-go composition. In Advances in Neural Information Processing Systems, pages 1921–1929, 2016.
- Sharpnack et al. (2013) James Sharpnack, Aarti Singh, and Akshay Krishnamurthy. Detecting activations over graphs using spanning tree wavelet bases. In Artificial Intelligence and Statistics, pages 536–544, 2013.
- Stein (1956) Charles Stein. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the Third Berkeley symposium on mathematical statistics and probability, volume 1, pages 197–206, 1956.
- Telgarsky (2013) Matus Telgarsky. Dirichlet draws are sparse with high probability. CoRR, abs/1301.4917, 2013.
- Wang et al. (2016) Yu-Xiang Wang, James Sharpnack, Alexander J. Smola, and Ryan J. Tibshirani. Trend filtering on graphs. Journal of Machine Learning Research, 17(105):1–41, 2016.
- Williams and McSherry (2010) Oliver Williams and Frank McSherry. Probabilistic inference and differential privacy. In Advances in Neural Information Processing Systems, pages 2451–2459, 2010.
Appendix A Proofs
In this appendix we present supporting proofs for all the results mentioned in the main text.
a.1 Proofs from Section 2
Proof of Theorem 2.
An simple way to see that -DP is achievable with Gaussian noise is to recall that -DP is equivalent to a bound of on the total variation (TV) distance between the output distributions of and for any neighbouring pair . If is an output perturbation mechanism for with noise , then using Pinsker’s inequality we have
Thus, we see that a Gaussian perturbation with standard deviation is enough to achieve -DP. ∎
Proof of Theorem 4.
Note that the proof of Theorem 9 shows that a Gaussian perturbation with yields a -DP mechanism, where . Thus, it is not possible to attain -DP with without increasing the variance of the perturbation.
The result follows by showing that the upper bound for proposed in Theorem 4 is a lower bound for . Since , all we need to show is .
Let be the complementary of the standard Gaussian CDF. The Mill’s ratio for the Gaussian distribution is the quantity . Bounding the Mill’s ratio is a standard approach to approximate the tail of the Gaussian distribution. A well-known bound for the Mill’s ratio is Gordon’s inequality Gordon [1941]. By using the symmetry we obtain :
∎
Proof of Lemma 3.
Recall that the density of the Gaussian output perturbation mechanism with is given by . Plugging this expression into the definition of the privacy loss function and performing a quick computation we get
To compute the privacy loss random variable we need to plug with in the above inner product. By observing that we obtain the distribution of the privacy loss random variable is given by
Therefore, the privacy loss of the Gaussian mechanism has the form for some . ∎
a.2 Proofs from Section 3
Proof of Theorem 5.
Given a pair of neighbouring datasets let and be the densities of the output random variables and . Note that given an event one can rewrite (1) as follows:
(11) |
Defining the event and its complementary , we can partition into the sets and . Therefore, by the definition of we have
Because (11) has to hold for any event and the upper bound above holds for any event, we conclude that is -DP if and only if
(12) |
holds for any . To complete the proof we need to show that (12) is equivalent to (3). Expanding the definition of we get:
A similar argument with also shows:
Putting the last two equations together we obtain see that the left hand side of (3) equals the left hand side of (12). ∎
Proof of Lemma 6.
Note that Lemma 3 shows that the privacy loss random variables and both follow the same distribution with . This allows us to write the left hand side of (4) in terms of the Gaussian CDF as follows:
where we used and the symmetry of the distribution of the Gaussian distribution around its mean. A similar argument applied to the left hand side of (5) yields:
∎
Proof of Lemma 7.
We prove the result by using Leibniz’s rule for differentiation under the integral sign to show that the function of interest has non-negative derivatives. First note that from the derivation of (4) we have
where . Now we can use Leibniz’s rule to write
Similarly, for the second term in the function we have where . Using Leibniz’s rule again we get
Therefore, we see that the derivative of satisfies:
where we used that . ∎
Proof of Theorem 9.
Recall that the derivations in Section 3 establish that in order to calibrate a Gaussian perturbation to achieve -DP all that is required is find the smallest such that
(13) |
To establish the correctness of the analytic Gaussian mechanism we begin by observing that the argument in the first term of (13) changes sign at , while the argument for the second terms is always negative. Thus, we substitute in the expression above and obtain:
To solve the optimization using numerical evaluations of it is convenient to consider the cases and separately. In the case we define and substitute the corresponding in to obtain
Similarly, by taking in the case we obtain
Note that, as expected, these definitions satisfy and , since the limits correspond to and , respectively. Furthermore, we have
which corresponds to the privacy guarantee -DP obtained by taking ; i.e. .
These observations motivate the mechanism described in Algorithm 1. In particular, for we can achieve -DP with , and the smallest such that corresponds to the largest such that . Similarly, for we require , and the smallest such that corresponds to the smallest such that . ∎
a.3 Proofs from Section 4
The proofs in this section are well-known and not part of the contribution of the current paper. We include these proofs because they are short and revealing and we hope to be self-contained as much as possible.
Proof of Theorem 10.
Let be the distribution of induced by . Let , define its posterior error
Take the gradient with respect to on both sides and apply Fubini’s theorem
Assign the gradient to we get that the minimizer is .
Now, assume is suboptimal, there exists such that