Google intends to add telemetry to the toolkit for the Go language

Google plans to add telemetry collection to the Go language toolkit and enable sending the collected data by default. The telemetry will cover command-line utilities developed by the Go language development team, such as the "go" utility, the compiler, the gopls and govulncheck applications. The collection of information will be limited only to the accumulation of information about the features of the utilities, i.e. telemetry will not be added to custom applications built using the toolkit.

The motive for collecting telemetry is the desire to get missing information about the needs and features of the work of developers, which cannot be caught using error reports and surveys as a feedback method. Collecting telemetry will help in identifying anomalies and abnormal behavior, evaluating how developers interact with the toolkit, and understanding which options are most in demand and which are almost never used. It is expected that the accumulated statistics will make it possible to modernize the toolkit, increase the efficiency and convenience of work, and focus special attention on the capabilities necessary for developers.

For data collection, a new architecture of "transparent telemetry" (transparent telemetry) is proposed, aimed at providing an opportunity for an independent public audit of the data received and collecting only the minimum necessary generalized information in order to prevent leakage of traces with detailed information about user activity. For example, when assessing the traffic consumed by the toolkit, it is planned to take into account such metrics as the data counter in kilobytes for the entire year. All collected data will be published in the public domain for inspection and analysis. To disable sending telemetry, you need to set the environment variable "GOTELEMETRY=off".

Key principles for building transparent telemetry:

  • Decisions about the metrics collected will be made through an open, public process.
  • The telemetry collection configuration will be automatically generated based on the list of actively monitored metrics, without collecting data that is not associated with these metrics.
  • The telemetry collection configuration will be maintained in a transparent audit log with verifiable records, making it more difficult to selectively apply different collection settings for different systems.
  • The telemetry collection configuration will be in the form of a cached proxied Go module, which can be automatically used in systems with local Go proxies already in use. The telemetry configuration download will be initiated no more than once a week with a probability of 10% (i.e. each system will download the configuration about 5 times a year).
  • Information transmitted to external servers will include only total counters that take into account statistics in the context of a full week and are not tied to a specific time.
  • Submitted reports will not include any form of system and user identifiers.
  • The reports sent will only contain strings that are already known on the server, i.e. name of counters, names of typical programs, known version numbers, names of functions in regular toolkit utilities (when sending stack traces). Non-string data will be limited to counters, dates, and row counts.
  • IP addresses from which telemetry servers are accessed will not be stored in the logs.
  • To obtain the required sample, it is planned to collect 16 reports per week, which, given the presence of two million installations of the toolkit, will require sending reports every week from only 2% of the systems.
  • The collected metrics in aggregated form will be published publicly in graphical and tabular presentations. The full initial data accumulated during the collection of telemetry will also be published.
  • Telemetry collection will be enabled by default, but an easy way to disable will be provided.

Source: opennet.ru

Add a comment