Google opened the library code for confidential data processing

Google company опубликовала library source codeDifferential Privacy» with the implementation of methods differential privacy, allowing to perform statistical operations on a data set with a sufficiently high accuracy without the possibility of identifying individual records in it. The library code is written in C++ and open licensed under Apache 2.0.

Analysis using differential privacy methods enables organizations to produce analytical samples from statistical databases, without allowing data to be divided and the parameters of specific individuals to be extracted from the general information. For example, in order to identify differences in patient care, researchers can be provided with information that allows comparison of the average amount of time patients spend in hospitals, while maintaining the confidentiality of patients and not allowing identification of information about them.

The proposed library includes the implementation of several algorithms for generating aggregated statistics based on numerical data sets that include confidential information. To check the correctness of the algorithms, it is provided stochastic probe. The algorithms allow you to perform summation, counting, calculation of averages, standard deviation, variance and order statistics on data, including determining the minimum, maximum and median. Also included is the implementation Laplace mechanism, which can be used for calculations not covered by predefined algorithms.

The library uses a modular architecture that allows you to expand the existing functionality and add additional mechanisms, aggregate functions, and privacy controls.
Based on the library for PostgreSQL 11 DBMS prepared extension with a set of anonymous aggregate functions using differential privacy methods - ANON_COUNT, ANON_SUM, ANON_AVG, ANON_VAR, ANON_STDDEV and ANON_NTILE.

Source: opennet.ru

Add a comment