5 Simple Machine Learning Project Ideas For Beginners Using Python

Machine learning is a branch of artificial intelligence that enables software to learn without being explicitly programmed.

If you’re new to machine learning and looking for some simple projects that work well as a starting point, even if you don’t have much experience, then this is the right place for you.

In this article, I will try to demonstrate 5 simple project ideas that can help you to get started in machine learning as a beginner.

Iris Flowers Classification

When we learn C or C++ programming language, the first program we write is mostly a “Hello world” program. In machine learning, We mostly start with the Iris Flowers Classification project.


The aim of this project is to predict the class (species) of the new Iris flower based on the length of petals and sepals. There are three species Setosa, Versicolor, and Virginica.

The model is trained on the measurements of the large number of Iris flowers, which are given in the Iris flower dataset. The dataset can be downloaded from this link Iris Flowers Classification Dataset.


You can use the Scikit-Learn library to import the Iris Flower dataset, but the recommended way is to manually download it and use it in the project.

Python Implementation

Now I will show you the Python implementation of this project.


Start with importing necessary libraries as:

1. Import pandas as pd 2. Import numpy as np 3. Import matplotlib.pyplot as plt
Code language: JavaScript (javascript)

Load the Iris flower data set.

Iris = pd.read_csv(“Iris.csv”) # Iris.csv
Code language: PHP (php)

It must be in the same directory otherwise use the full path.

Now I will train the model, and for that, I will first split the dataset into the training set and test set. In the ratio of 80:20 respectively.

Then I will use the KNN algorithm to Train the model.

x = iris.drop("species", axis=1) y = iris["species"] from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)   from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier(n_neighbors=1) knn.fit(x_train, y_train)
Code language: JavaScript (javascript)

Now our model is ready and we give the new measurements to the model to predict the class of Iris Flower.

New_x = np.array([[5, 2.9, 1, 0.2]]) Prediction = knn.predict(New_x) Print(“Predicted specie is: { }”.format(Prediction))
Code language: PHP (php)

So, this is how we train our machine learning model for Iris Flower classification. This is a simple and must-do project for every machine learning newbie.

Spam Mail Detection

Today, spam is distributed and consumed in many ways. It’s found on social media, through email, on websites, and even in the news.

There are many applications that automatically detect and protect users from spam. Google’s Gmail is the best example of such an application, it’s spam filters automatically detect spam mails and separates them from other mails.

Here I will show you how to build a Spam Detection Model using python.

Python Implementation of Spam Detection Model.

The first step in any project is to import necessary libraries. So, let’s import the libraries that we need in this project.

Import pandas as pd Import numpy as np From sklearn.feature_extraction.text import CountVectorizer From sklearn.naive_bayes import MultinomialNB From sklearn.model_selection import train_test_split
Code language: JavaScript (javascript)

Now let’s import the dataset on which we train our model, you can download the dataset from this link: Spam Mails Dataset.

This dataset contains, only two features, class and message. So, we load the dataset and then reduce this dataset to only two feature datasets.

Spam_data = pd.read_csv(“Spam_mail_data.csv”) New_data = Spam_data[[“class”, “message”]]
Code language: PHP (php)

Now let’s split the new dataset into training and test sets in the ratio of 70:30, and train the model.

X=np.array(New_data[“message”]) Y=np.array(data[“class”]) CV=CountVectorizer() X=cv.fit_transform(X) X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size=0.30, random_state=40) Clf=MultinomialNB() Clf.fit(X_train, Y_train)

So, your Model is ready, to test the model take a user input as a message and transform the message into a suitable format. Then try to predict the result. Here I will show you an example

Mail = “You have won $100 cash price” Mail_data = cv.transform([Mail]).toarray() Print(clf.predict(Mail_data))
Code language: PHP (php)

So, this is how we built a Spam Detection machine Learning Model, that understands the message contents in the mail and predicts whether it is spam or not.

Boston House Pricing Prediction

This is another simple machine learning project, the aim of the project is to predict the price of houses in Boston.

If you are thinking to buy a Home in Boston then this project will help you to find the best price.

In this project you don’t have to find the class of a house, you need to find a number(price) of a House, for that we need an algorithm called Linear Regression.

Linear Regression Algorithm

Linear Regression is a Machine Learning algorithm that learns a model which is a linear combination of features of the input example.

In simple words, this algorithm finds a relationship between a dependent variable and the independent variable. This algorithm is mainly used to predict numeric values, which we need in this project.

Python Implementation of Boston House Price Prediction.

First, I will import the necessary libraries

import numpy as np import panda as pd from sklearn import linear_model from sklearn.model_selection import train_test_split
Code language: JavaScript (javascript)

Now, we need the Boston House pricing dataset, you can download it from this link Boston House Prices Or you can use the load_boston() function of sklearn.datasets to import the dataset directly.

I will use the load_boston() function to load the dataset in this project. So, I will load the dataset as:

From sklearn.datasets import load_boston Boston_Data = load_boston()
Code language: JavaScript (javascript)

If you print the Data you will find that it is a complete mess.

Code language: PHP (php)
boston data

In the above Image, it is hard to understand anything, so I will transform the data into a data frame to make it more understandable. For that I will write the following code:

data_frame_x = pd.DataFrame(Boston_Data.data, columns = Boston_Data.feature_names)<br>data_frame_y = pd.DataFrame(Boston_Data.target)
Code language: HTML, XML (xml)

Now, I will initialize the Linear Regression model and split the data into 70% training and 30% testing data you can also change the ratio to see the variation in the result.

And then I will train the model using the training data set.

Regression = Linear_model.LinearRegression() x_train, x_test, y_train, y_test = train_test_split(data_fram_x,  data_frame_y, test_size = 0.30, radom_state =42) Regression.fit(x_train, y_train)

The training of our Model is complete, I have trained the model on the 70% of our total data, and I will test the model on the remaining 30% of our data in the dataset, to check the accuracy of our model.

Predictions = Regression.predict(x_test)

Now, to find the difference between the model predictions and actual values, let’s plot the actual values(y_test) vs predicted values(predictions) as:

Import matplotlib.pyplot as plt plt.scatter(y_test,predictions) plt.xlabel(“Prices”) plt.ylabel(“Predicted Prices”) plt.title(:Prices vs Predicted Prices”) plt.show()
Code language: CSS (css)

Now I will use the Mean Square Error Estimator to find how accurate is my model, The output is always positive and a value close to zero is better.

from sklearn.metrics import mean_squared_error print(mean_squared_error(y_test, predictions)) 21.51744423117755
Code language: JavaScript (javascript)

The result is good, it shows that our model is not overfitting.

So, this is how we built a Boston house pricing Model, that predicts the prices of houses in Boston.

News Classification ML Model

As you can see in the above picture (taken from The Hindu e-paper) News is classified into different categories. This article is about the same topic of classifying news in different sections using machine learning.

Making the different sections or we can say making different categories manually, takes a good amount of time because we have to put every news in its respective category.

If we implement News Classification Machine learning model for the same task, we can save a considerable amount of time.

The machine learning model reads the Headlines of the news, or the contents of the news and categorizes them into different sections.

Python Implementation of News Classification Machine Learning Model.

Let’s import the necessary libraries used in this project, these libraries make our work much easy.

import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.model_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB
Code language: JavaScript (javascript)

I have downloaded the dataset from Kaggle, you can download this dataset from this link News Classification Dataset

This dataset has 3 columns named as 

  • News_headline, 
  • News_article,
  • News_category.

It also contains news from 7 different categories like

  • Technology
  • Sports
  • Politics
  • Entertainment
  • World
  • Automobile
  • Science

Now, let’s load the dataset

data = pd.read_csv(“inshort_news_data.csv”)  print(data.head())
Code language: PHP (php)

Now, we train the model to read the title and classify the news, for that I will take news_title and category only and split the dataset into training data and testing data in the ratio of 70:30.

data = data[["news_headline", "news_category"]] x= np.array(data[“news_headline”]) y= np.array(data[“news_category”]) cv= ContVecorizer() x=cv.fit_transform(x) x_train, x_test, y_train, y_test = train_test_spit(x,y,test_size = 030,random_state=42
Code language: PHP (php)

Now, I will train the model using the MNB algorithm

model = MultinomialNB() model.fit(X_train,y_train) 

Now, the model is ready, and we can test the model with a news title that I have taken from “The Hindu e_paper”

title="‘Kerala Savari’, India’s first online taxi service as a public option" data1 = cv.transform([title]).toarray() print(model.predict(data1))
Code language: PHP (php)

So, this is how we built a News Classification Machine Learning Model, that predicts the category of the News.

Article Recommendation Model

A Recommendation System is one of the best applications of machine learning, because due to enormous data present in every field of life, a person needs a lot of time to choose what he/she needs.

For example, you want to read an article, but there are hundreds of articles being written every day, so, which article do you want to read? The recommendation System helps you in that case.

The Recommendation System checks your reading history, learns what you like to read, and recommends similar articles to you.

This is the beginner’s project and the content-based recommendation is based on similarities among articles. Source Code:  code