For Developers‎ > ‎Design Documents‎ > ‎

Rappor (Randomized Aggregatable Privacy Preserving Ordinal Responses)

RAPPOR reports consist of randomly generated data that is biased based on data collected from the user. Data from many users can be aggregated to learn information about the population, but little or nothing can be concluded about individual users from their reports.

Descriptions of individual metrics should be found in tools/metrics/rappor/rappor.xml

Algorithm

The first time the RapporService is started, a client will generate and save a random 128-byte secret key, which won't change and is never transmitted to the server.  It will also assign itself to a random cohort.

For each metric we collect, we store a Bloom filter, represented as an array of m bits.  Each cohort uses a different set of hash functions for the bloom filter. When the RapporService is passed a sample for recording, it sets bits in the Bloom filter for that metric.  For example, with the "Settings.HomePage2" metric, which is collected only for users who opt-in to UMA, our Bloom filter will be an array of 128 bits, and one or two of those bits will be set based on the eTLD+1 of the user's homepage.

Once we have collected samples, and are ready to generate a report, we take the array of bits we've gathered for the metric and introduce two levels of noise by taking the following steps.
  1. Add deterministic noise.
    1. Create a deterministic psuedo-random function by passing the client's secret key, the metric name, and the bloom filter value into an HMAC_DRBG function.
    2. For each bit use this deterministic function to flip two weighted coins.
      1. If the first is heads, replace the bit with the result of the second coin.
      2. Otherwise, leave the bit as is.
  2. Add fresh random noise.
    1. Using fresh randomness, flip two more coins, with different weights.
      1. If the bit is true, report the result of the first coin flip.
      2. Otherwise, report the result of the second.
The cohort that the client that belongs to and the results from the above process are sent to the server.

The large amount of randomness means that we can't draw meaningful conclusions from a small number of reports.  Even if we aggregate many reports from the same user, they include the same pseudo-random noise in all of their reports of the same value, so we are effectively limited to one report for each distinct value.

Indeed, even with infinite amounts of data on a RAPPOR statistic, there are strict bounds on how much information can be learned, as outlined in more detail at http://arxiv.org/abs/1407.6981.  In particular, the data collected from any given user or client contains such significant uncertainty, and guarantees such strong deniability, as to prevent observers from drawing conclusions with any certainty.

Code overview


CL/49753002 introduces a RapporService object which is instantiated by the browser process object.  The service allows collection of RAPPOR metrics and periodically uploads reports to our servers.

Sample Collection


In order to collect samples, we call RapporService::RecordSample to record samples of it. RecordSample requires the name of the metric, a RapporType value which determines the parameters used to record and report the metric, and the sample itself.


#include "components/rappor/rappor_service.h"


g_browser_process->rappor_service()->RecordSample(
"Settings.HomePage2",
rappor::ETLD_PLUS_ONE_RAPPOR_TYPE,
net::registry_controlled_domains::GetDomainAndRegistry(homepage_url,
net::registry_controlled_domains::INCLUDE_PRIVATE_REGISTRIES));

The first call to RecordSample after the browser starts or a report is generated will instantiate a new RapporMetric object to hold the data for that metric. If subsequent calls to RecordSample are made for the same metric during the same reporting interval, one sample is randomly selected for generating the report. Calls to RecordSample should take place on the browser UI thread.


Currently, samples may only be collected in the browser process, but RapporMetrics could be serialized to IPC calls to enable collection from other processes.

Report generation and uploading


The other function of RapporService is generating and uploading reports of the collected data. First RapporService::RegisterPrefs() should be called to register prefs::kRapporSecret and prefs::kRapporCohort, and then RapporService::Start() may be called to begin generating reports.


The RapporService generates a report shortly after it is started at fixed time intervals afterwards.  If any new RapporMetricss  have been created, randomized data is generated for them by calling RapporMetric::GetReport() and recorded into a RapporReports proto, and the RapporMetric is deleted.


message RapporReports {
// Which cohort these reports belong to. The RAPPOR participants are
// partioned into cohorts in different ways, to allow better statistics and
// increased coverage. In particular, the cohort will serve to choose the
// hash functions used for Bloom-filter-based reports.
optional int32 cohort = 2;
message Report {
// The name of the metric, hashed.
optional fixed64 name_hash = 1;
// The sequence of bits produced by random coin flips in
// RapporMetric::GetReport(). For a complete description of RAPPOR
// metrics, refer to the design document at:
// http://www.chromium.org/developers/design-documents/rappor
optional bytes bits = 2;
}
repeated Report report = 3;
}

The proto is passed to the LogUploader.  It stores all of the logs it is passed in a queue, and sends them to the server.  When uploads fail, it retries with exponential backoff.  For now, if chrome exits before the logs are uploaded, they are lost.  We may implement caching unsent logs in prefs similar to UMA in the future.
Comments