MIT removes Tiny Images collection due to racist and misogynistic terms

Massachusetts Institute of Technology removed data set Tiny Images, which includes an annotated collection of 80 million small 32Γ—32 images. The set was maintained by the computer vision group and has been used since 2008 by various researchers to train and test object recognition in machine learning systems.

The reason for the removal was revealing the use of racist and misogynistic terms in labels that characterize the objects depicted in the pictures, as well as the presence of images that were perceived as offensive. For example, there were images of genitals with slang terms, images of some women were characterized as β€œwhores”, terms that were unacceptable in modern society for blacks and Asians were used.

However, the document cited by MIT also names more serious problems with such collections: computer vision technologies can be used to develop facial recognition systems, to search for representatives of population groups that are forbidden for some reason; a neural network for generating images can restore the original from anonymized data.

The reason for the appearance of invalid words was the use of an automated process that uses semantic relationships from the lexical database of the English language when classifying WordNetcreated in the 1980s at Princeton University. Since it is not possible to manually check for offensive language in 80 million small pictures, it was decided to completely block access to the database. MIT also urged other researchers to stop using the collection and remove copies of it. Similar problems are observed in the largest annotated image database ImageNet, which also uses bindings from WordNet.

MIT removes Tiny Images collection due to racist and misogynistic terms

MIT removes Tiny Images collection due to racist and misogynistic terms

Source: opennet.ru

Add a comment