52 datasets for training projects

  1. Mall Customer Dataset - data of store visitors: id, gender, age, income, spending rating. (Application option: Customer Segmentation Project with Machine Learning)
  2. Iris Dataset — a dataset for beginners containing sepal and petal sizes for various flowers.
  3. MNIST Dataset — a dataset of handwritten digits. 60 training images and 000 test images.
  4. The Boston Housing Dataset is a popular dataset for pattern recognition. Contains information about houses in Boston: the number of apartments, the cost of rent, the crime index.
  5. Fake News Detection Dataset - contains 7796 entries with news markup: true or false. (Use case with Python source: Fake News Detection Python Project )
  6. Wine quality dataset - contains information about wine: 4898 entries with 14 parameters.
  7. SOCR data – Heights and Weights Dataset is a good place to start. Contains 25 height and weight records for 000 year olds.

    52 datasets for training projects

    The article was translated with the support of EDISON Software, which performs "excellent" orders from South Chinaand develops web applications and websites.

  8. Parkinson Dataset - 195 records of patients with Parkinson's disease, with 25 test parameters. It can be used for a preliminary assessment of the difference between sick people and healthy people. (Use case with Python source: Machine Learning Project on Detecting Parkinson's Disease)
  9. Titanic Dataset - contains information about passengers (age, gender, relatives on board, etc.) 891 in the training set and 418 in the test set.
  10. Uber Pickup Dataset - Information about 4.5 million Uber trips in 2014 and 14 million in 2015. (Use case with R source code: Uber Data Analysis Project in R)
  11. Chars74k Dataset — contains images of British and Canadian symbols of 64 classes: 0-9, AZ, az. 7700 7.7k natural images, 3400k handwritten, 62000 computer-synthesized fonts.
  12. Credit Card Fraud Detection Dataset - contains information about transactions of compromised credit cards. (Application with source code: Credit Card Fraud Detection Machine Learning Project)
  13. Chatbot Intents Dataset - JSON file that contains various tags: greetings, goodbye, hospital_search, pharmacy_search, etc. Contains a set of question and answer templates. (Use case with Python source: Chatbot Project in Python)
  14. Enron Email Dataset — contains half a million letters from 150 Enron managers.
  15. The Yelp Dataset - contains 1,2 million recommendations from 1,6 million users about 1,2 million organizations.
  16. Jeopardy Dataset - Over 200 Q&A entries from the popular TV game.
  17. Recommended Systems Dataset - a portal with a collection of datasets from the UCSD University. Contains reviews on popular sites (Goodreads, Amazon). Great for building recommender systems. (Use case with R source code: Movie Recommendation System Project in R )
  18. UCI Spambase Dataset — dataset for training for spam detection. Contains 4601 emails with 57 metadata options.
  19. Flickr 30k Dataset - more than 30 images and captions to them. (Flickr 8k Dataset - 8000 images. Python source project: Image Caption Generator Python Project)
  20. IMDB reviews — 25 movie reviews in the training set and 000 in the test set. (Use case with R source code: Sentiment Analysis Data Science Project)
  21. MS COCO dataset - 1,5 million tagged images.
  22. CIFAR-10 and CIFAR-100 dataset — CIFAR-10 contains 60,000 small images of 32*32 pixels digits 0-9. CIFAR-100 - respectively, 0-100.
  23. GTRB (German traffic sign recognition benchmark) Dataset — 50 images of 000 road signs. (Use case with Python source: Traffic Signs Recognition Python Project)
  24. ImageNet dataset - contains over 100 phrases and about 000 images per phrase.
  25. Breast Histopathology Images Dataset — the dataset contains images of breast cancer samples. (Use case with source on Breast Cancer Classification Python Project)
  26. Cityscapes Dataset - contains high-quality annotations of video sequences of streets of different cities.
  27. Kinetics Dataset - contains a URL link to about 6,5 million high-quality videos.
  28. MPII human pose dataset — the dataset contains 25 images of human poses annotated by joints.
  29. 20BN-something-something dataset v2 - a set of high-quality videos that show how a person performs some action.
  30. Object 365 dataset — dataset of high-quality images with bounding boxes of objects.
  31. Photo sketching dataset - contains more than 1000 images with their contour drawings.
  32. CQ500 Dataset — the dataset contains 491 head CT scans with 193 slices.
  33. IMDB-Wiki dataset — a dataset with more than 5 million images of faces marked by gender and age. (Use case with source on Gender & Age Detection Python Project)
  34. Youtube 8M Dataset - a labeled video dataset that contains 6,1 million Youtube video IDs
  35. Urban Sound 8K dataset - a set of urban sound data (contains 8732 urban sounds from 10 classes).
  36. LSUN Dataset - a data set of millions of color images of scenes and objects (about 59 million images, 10 different categories of scenes and 20 different categories of objects).
  37. RAVDESS Dataset — audiovisual database of emotional speech. (Use case with source on Speech Emotion Recognition Python Project)
  38. Librispeech Dataset — the dataset contains 1000 hours of English speech with different accents.
  39. Baidu Apolloscape Dataset — dataset for the development of self-driving technologies.
  40. Quandl Data Portal - a repository of economic and financial data (there is free and paid content).
  41. The World Bank Open Data Portal — information on loans issued by the World Bank to developing countries.
  42. IMF Data Portal - the portal of the international monetary fund, which publishes data on international finance, debt rates, investments, foreign exchange reserves and goods.
  43. American Economic Association (AEA) Data Portal is a resource for searching US macroeconomic data.
  44. Google Trends Data Portal Google Trends data can be used for visual exploration and data analysis.
  45. Financial Times Market Data Portal is a resource for obtaining up-to-date information on financial markets from around the world.
  46. Data.gov Portal - US government open data portal (agriculture, health, climate, education, energy, finance, science and research, etc.).
  47. Data Portal: Open government data (India) India's open government data platform.
  48. Food environment Atlas Data Portal - contains data from research on nutrition in the United States.
  49. Health Data Portal is the US Department of Health and Human Services portal.
  50. Centers for Disease Control and Prevention Data Portal - contains a wide range of health-related data.
  51. London Datastore Portal - data about the life of people in London.
  52. Canada Government Open Data Portal - open data portal about Canadians (agriculture, art, music, education, government, healthcare, etc.)

Read more

Source: habr.com

Add a comment