RAPPOR reports consist of randomly generated data that is biased based on data collected from the user. Data from many users can be aggregated to learn information about the population, but little or nothing can be concluded about individual users from their reports.
Descriptions of individual metrics should be found in tools/metrics/rappor/rappor.xml
For full technical details of the algorithm, see the RAPPOR paper.
The first time the RapporService is started, a client will generate and save a random 128-byte secret key, which won't change and is never transmitted to the server. It will also assign itself to a random cohort.
For each metric we collect, we store a Bloom filter, represented as an array of m bits. Each cohort uses a different set of hash functions for the bloom filter. When the RapporService is passed a sample for recording, it sets bits in the Bloom filter for that metric. For example, with the "Settings.HomePage2" metric, which is collected only for users who opt-in to UMA, our Bloom filter will be an array of 128 bits, and one or two of those bits will be set based on the eTLD+1 of the user's homepage.
Once we have collected samples, and are ready to generate a report, we take the array of bits we've gathered for the metric and introduce two levels of noise by taking the following steps.
The cohort that the client that belongs to and the results from the above process are sent to the server.
The large amount of randomness means that we can't draw meaningful conclusions from a small number of reports. Even if we aggregate many reports from the same user, they include the same pseudo-random noise in all of their reports of the same value, so we are effectively limited to one report for each distinct value.
Indeed, even with infinite amounts of data on a RAPPOR statistic, there are strict bounds on how much information can be learned, as outlined in more detail at http://arxiv.org/abs/1407.6981. In particular, the data collected from any given user or client contains such significant uncertainty, and guarantees such strong deniability, as to prevent observers from drawing conclusions with any certainty.
To add a new rappor metric, you need to add a bit of code to collect your sample, and add your metric to tools/metrics/rappor/rappor.xml. Samples should be recorded on the UI thread of the browser process. For most use cases, you will want to use one of the helper methods in chrome/browser/metrics/rappor/sampling.h e.g.
If you need to do something more specific you may need to call RapporService::RecordSample directly, e.g.
If you collect multiple samples for the same metric in one reporting interval (currently 30 minutes), a single sample will be randomly selected for generating the randomized report.
Remember to add documentation for your metric to tools/metrics/rappor/rappor.xml
CL/49753002 introduces a
In order to collect samples, we call
Currently, samples may only be collected in the browser process, but
The other function of
The proto is passed to the