14 open-source projects to improve Data Science skills (easy, normal, hard)

Data Science for Beginners

1. Sentiment Analysis (Mood analysis through text)

14 open-source projects to improve Data Science skills (easy, normal, hard)

View the complete implementation of the Data Science project using source code βˆ’ Sentiment Analysis Project in R.

Sentiment Analysis is the analysis of words to identify sentiments and opinions, which can be positive or negative. This is a type of classification where the classes can be binary (positive and negative) or plural (happy, angry, sad, nasty...). We will implement this Data Science project in R and will use the dataset in the "janeaustenR" package. We will use general purpose dictionaries like AFINN, bing and loughran, do an inner join and at the end we will create a word cloud to display the result.

Language: R
Dataset/Package: janeoustenR

14 open-source projects to improve Data Science skills (easy, normal, hard)

The article was translated with the support of EDISON Software, which makes virtual fitting rooms for multi-brand storesand tests software.

2. Fake News Detection

Take your skills to the next level by working on the Data Science Project for Beginners βˆ’ fake news detection with Python.

14 open-source projects to improve Data Science skills (easy, normal, hard)

Fake news is false information spread through social media and other online media in order to achieve political goals. In this Data Science project idea, we will use Python to build a model that can accurately determine whether news is real or fake. We'll create a TfidfVectorizer and use the PassiveAggressiveClassifier to classify news into "real" and "fake". We will use a 7796Γ—4 shape dataset and do everything in Jupyter Lab.

Language: Python

Dataset/Package: news.csv

3. Detecting Parkinson's Disease

Move forward by working on the Data Science Project Idea βˆ’ detection of Parkinson's disease with XGBoost.

14 open-source projects to improve Data Science skills (easy, normal, hard)

We have started using Data Science to improve healthcare and services - if we can predict the disease at an early stage, then we will have many advantages. So, in this Data Science project idea, we will learn how to detect Parkinson's disease using Python. It is a neurodegenerative, progressive disease of the central nervous system that affects movement and causes trembling and stiffness. It affects the dopamine-producing neurons in the brain, and every year, it affects over 1 million people in India.

Language: Python

Dataset/Package: UCI ML Parkinsons dataset

Data Science projects of medium complexity

4. Speech Emotion Recognition

Check out the full implementation of the Data Science sample project βˆ’ speech recognition with Librosa.

14 open-source projects to improve Data Science skills (easy, normal, hard)

Let's now learn how to use different libraries. This Data Science project uses librosa for speech recognition. SER is the process of identifying human emotions and affective states from speech. Because we use tone and pitch to express emotions with our voice, SER is relevant. But since emotions are subjective, audio annotation is a difficult task. We will use the mfcc, chroma and mel functions and use the RAVDESS dataset for emotion recognition. We will create an MLPC classifier for this model.

Language: Python

Dataset/Package: RAVDESS dataset

5. Gender and Age Detection

Impress employers with the latest Data Science project - gender and age detection with OpenCV.

14 open-source projects to improve Data Science skills (easy, normal, hard)

This is an interesting Data Science with Python. Using only one image, you will learn how to predict a person's gender and age. In this, we will introduce you to Computer Vision and its principles. We will build convolutional neural network and will use models trained by Tal Hassner and Gil Levy on the Adience dataset. We will use some .pb, .pbtxt, .prototxt and .caffemodel files along the way.

Language: Python

Dataset/Package: Adience

6. Uber Data Analysis

View the complete implementation of the Data Science project with source code βˆ’ Uber Data Analysis Project in R.

14 open-source projects to improve Data Science skills (easy, normal, hard)

This is a data visualization project with ggplot2 in which we will use R and its libraries and analyze various parameters. We will use the Uber Pickups New York dataset and create visualizations for different time frames of the year. This tells us how time affects customer journeys.

Language: R

Dataset/Package: Uber Pickups in New York City dataset

7. Driver Drowsiness detection

Upgrade your skills by working on the Top Data Science Project - sleepiness detection system with OpenCV & Keras.

14 open-source projects to improve Data Science skills (easy, normal, hard)

Sleepy driving is extremely dangerous, with about a thousand accidents every year due to drivers falling asleep while driving. In this Python project, we will create a system that can detect sleepy drivers and also alert them with a beep.

This project is implemented using Keras and OpenCV. We will use OpenCV to detect the face and eyes and with the help of Keras we will classify the state of the eye (Open or Closed) using deep neural network methods.

8.Chatbot

Build a chatbot with Python and take a step forward in your career - Chatbot with NLTK & Keras.

14 open-source projects to improve Data Science skills (easy, normal, hard)

Chatbots are an integral part of business. Many businesses have to offer services to their customers and it takes a lot of manpower, time and effort to serve them. Chatbots can automate much of the customer interaction by answering some of the common questions that customers ask. There are basically two types of chatbots: Domain-specific and Open-domain. A domain-specific chatbot is often used to solve a specific problem. Thus, you need to customize it to work effectively in your field. Open-domain chatbots can be asked any questions, so training them requires a huge amount of data.

Data set: Intents json file

Language: Python

Advanced Data Science projects

9. Image Caption Generator

Check out the complete project implementation with source code βˆ’ Image Caption Generator with CNN & LSTM.

14 open-source projects to improve Data Science skills (easy, normal, hard)

Describing what's in an image is an easy task for humans, but for computers, an image is just a collection of numbers that represent the color value of each pixel. This is a difficult task for computers. Understanding what is in an image and then creating a natural language description (eg English) is another difficult task. This project uses deep learning techniques in which we implement a Convolutional Neural Network (CNN) with a Recurrent Neural Network (LSTM) to create an image description generator.

Data set: Flickr 8K

Language: Python

Framework: Hard

10. Credit Card Fraud Detection

Do your best by working on Data Science project idea βˆ’ credit card fraud detection with machine learning.

14 open-source projects to improve Data Science skills (easy, normal, hard)

By now you have begun to understand the methods and concepts. Let's move on to some advanced data science projects. In this project, we will use the R language with algorithms such as decision trees, logistic regression, artificial neural networks and gradient boosting classifier. We will use the card transactions dataset to classify credit card transactions as fraudulent and genuine. We will select different models for them and build performance curves.

Language: R

Dataset/Package: Card Transactions dataset

11. Movie Recommendation System

Explore the implementation of the best Data Science project with Source Code - Movie Recommendation System in R

14 open-source projects to improve Data Science skills (easy, normal, hard)

In this Data Science project, we will use R to execute the movie's recommendations through machine learning. The recommendation system sends suggestions to users through a filtering process based on other users' preferences and browsing history. If A and B like Home Alone, and B likes Mean Girls, then you can suggest A - they might like it too. This allows clients to interact with the platform.

Language: R

Dataset/Package: MovieLens dataset

12. Customer Segmentation

Impress employers with a Data Science project (including source code) - Customer segmentation with machine learning.

14 open-source projects to improve Data Science skills (easy, normal, hard)

Buyer segmentation is a popular application unsupervised learning. Using clustering, companies define customer segments to work with a potential user base. They divide customers into groups according to common characteristics such as gender, age, interests, and spending habits, so that they can effectively market their products to each group. We will use K-means clustering, as well as visualize the distribution by sex and age. We then analyze their annual income and expenditure levels.

Language: R

Dataset/Package: Mall_Customers dataset

13. Breast Cancer Classification

See the complete implementation of the Data Science project in Python βˆ’ Breast Cancer Classification Using Deep Learning.

14 open-source projects to improve Data Science skills (easy, normal, hard)

Returning to the medical contribution of data science, let's learn how to detect breast cancer with Python. We will use the IDC_regular dataset to detect invasive ductal carcinoma, the most common form of breast cancer. It develops in the milk ducts, penetrating into the fibrous or fatty tissue of the mammary gland outside the duct. In this data collection science project idea, we will use Deep Learning and the Keras library for classification.

Language: Python

Dataset/Package: IDC_regular

14. Traffic Signs Recognition

Achieving precision in self-driving car technology with Data Science project on traffic sign recognition using CNN open source.

14 open-source projects to improve Data Science skills (easy, normal, hard)

Road signs and traffic rules are very important for every driver to avoid accidents. To follow the rule, you first need to understand what the road sign looks like. A person must learn all road signs before he is given the right to drive any vehicle. But now the number of autonomous vehicles is growing, and in the near future, a person will no longer drive a car on his own. In the Road Sign Recognition project, you will learn how a program can recognize a type of road sign by taking an image as input. The German Road Sign Recognition Reference Dataset (GTSRB) is used to build a deep neural network to recognize the class to which a traffic sign belongs. We are also creating a simple GUI for interacting with the application.

Language: Python

Data set: GTRB (German Traffic Sign Recognition Benchmark)

Read more

Source: habr.com

Add a comment