Perceptron Algorithm Python Implementation with Examples

In this post, we will use examples to further our understanding of Perceptron Algorithm Python Implementation. This tutorial will show you how to build Perceptron in Python algorithm from the scratch.

Further, we will discuss the Perceptron algorithm implementation in python, the most fundamental single-layered neural network used for binary classification. After analyzing the Unit Step Function and how is the Perceptron as a neural net, we will discuss the perceptron update rule.

Finally, we can plot the data’s decision border. Since Python Perceptron is a binary classifier, we will utilize data with only two characteristics, resulting in two classes.

Moving on…

Let us know What is Perceptron algorithm in python

What is Perceptron algorithm in python?

The Perceptron algorithm was inspired by how neurons, the basic processing units in the brain, process inputs signals. Frank Rosenblatt invented it using the McCulloch-Pitts neuron and Hebb’s results.

The Perceptron algorithm Python is a machine learning approach for two-class (binary) classification.

It is a form of artificial neural network model, possibly the simplest form. It is hardly deep learning, but it is an essential building component.

It consists of a single node or neuron that predicts a class label given a row of data as input. This is accomplished by calculating the weighted sum of the inputs in addition to a bias (set to 1). The weighted sum of the model’s inputs is known as the activation.

activation = sum(weight_i * x_i) + bias

The model will output 1.0 if the activation is greater than 0.0; else, it will output 0.0.

Predict 1: If Activation > 0.0
Predict 0: If Activation <= 0.0

Given that the inputs are multiplied by model coefficients in linear regression and logistic regression, it is best practice to normalize or standardize data before employing the model.


The Perceptron is an algorithm for linear classification. This means that it learns a decision boundary that separates two classes in the feature space using a line (called a hyperplane). Therefore, it is ideal for issues in which the classes can be effectively separated by a line or linear model, sometimes known as being linearly separable.

In addition to this…

The stochastic gradient descent optimization algorithm is used to train the coefficients of the model, which are referred to as input weights.

One-by-one examples from the training dataset are presented to the model, which then makes a prediction and calculates error. The model’s weights are then modified to lessen the example’s mistakes.

The term for this is the Perceptron update rule. This is performed for each example in the training dataset, which is referred to as an epoch. This process of upgrading the model through the use of examples is then repeated for numerous epochs.

Hence, each batch of model weights is updated with a small percentage of the error, and this percentage is regulated by a hyperparameter known as the learning rate, which is normally set to a small value.

Also read: Python Reduce Function with Example Program

This is done to guarantee that learning does not occur too rapidly, which could result in a model with poorer skill and is known as premature convergence of the optimization (search) technique for the model weights.

•	weights(t + 1) = weights(t) + learning_rate * (expected_i – predicted_) * input_i

After the model’s error falls to a low level or stops improving, or when a maximum number of epochs have been completed, training is terminated.

The starting weight values for the model are modest random numbers. In addition, the training dataset is randomized before each training cycle. This is intended to improve and accelerate the model training process.

As a result, the learning method is stochastic and may produce varying outcomes each time it is executed. As a result, it is standard practice to summarize the performance of an algorithm on a dataset by evaluating it multiple times and giving the mean classification accuracy.

The algorithm’s learning rate and number of training epochs are hyperparameters that can be tuned or set using heuristics.

Moving on let us proceed to the perceptron as a neural net…

Perceptron as a neural net

The above graphic helps us grasp the Perceptron visually. For each training example, we first compute theta, which is the dot product of the input features and parameters. The Unit Step Function is then used to make the prediction (y_hat).

And if the prediction is incorrect, or if the model misclassifies the example, we adjust the parameters theta. When the forecast is right (or equal to the true/target value y), we do not update.

Example program

Let us try to understand the Perceptron algorithm using the following data as a motivating example.

from sklearn import datasets
X, y = datasets.make_blobs(n_samples=150,n_features=2,
fig = plt.figure(figsize=(10,8))
plt.plot(X[:, 0][y == 0], X[:, 1][y == 0], 'r^')
plt.plot(X[:, 0][y == 1], X[:, 1][y == 1], 'bs')
plt.xlabel("feature 1")
plt.ylabel("feature 2")
plt.title('Random Classification Data with 2 classes')

Let’s code the step function.

def step_func(z):
        return 1.0 if (z > 0) else 0.0

Perceptron update rule

The perception update rule is very similar to the Gradient Descent update rule.

Example program

def perceptron(X, y, lr, epochs):
    # X --> Inputs.
    # y --> labels/target.
    # lr --> learning rate.
    # epochs --> Number of iterations.
    # m-> number of training examples
    # n-> number of features 
    m, n = X.shape
    # Initializing parapeters(theta) to zeros.
    # +1 in n+1 for the bias term.
    theta = np.zeros((n+1,1))
    # Empty list to store how many examples were 
    # misclassified at every iteration.
    n_miss_list = []
    # Training.
    for epoch in range(epochs):
        # variable to store #misclassified.
        n_miss = 0
        # looping for every example.
        for idx, x_i in enumerate(X):
            # Insering 1 for bias, X0 = 1.
            x_i = np.insert(x_i, 0, 1).reshape(-1,1)
            # Calculating prediction/hypothesis.
            y_hat = step_func(, theta))
            # Updating if the example is misclassified.
            if (np.squeeze(y_hat) - y[idx]) != 0:
                theta += lr*((y[idx] - y_hat)*x_i)
                # Incrementing by 1.
                n_miss += 1
        # Appending number of misclassified examples
        # at every iteration.
    return theta, n_miss_list

Plotting Decision boundary

We already know that the model predicts

y=1 when y_hat ≥ 0
y=0 when y_hat < 0

So, theta.X = 0 is going to be our Decision boundary.

The following code for charting the Decision Boundary is valid only when X has two features.

Example program

def plot_decision_boundary(X, theta):
    # X --> Inputs
    # theta --> parameters
    # The Line is y=mx+c
    # So, Equate mx+c = theta0.X0 + theta1.X1 + theta2.X2
    # Solving we find m and c
    x1 = [min(X[:,0]), max(X[:,0])]
    m = -theta[1]/theta[2]
    c = -theta[0]/theta[2]
    x2 = m*x1 + c
    # Plotting
    fig = plt.figure(figsize=(10,8))
    plt.plot(X[:, 0][y==0], X[:, 1][y==0], "r^")
    plt.plot(X[:, 0][y==1], X[:, 1][y==1], "bs")
    plt.xlabel("feature 1")
    plt.ylabel("feature 2")
    plt.title(’Perceptron Algorithm’)
    plt.plot(x1, x2, 'y-')

Training and Plotting

The accompanying decision boundary graph shows that we can perfectly distinguish the red and blue classes. That is, we achieve 100% accuracy.

Training program

theta, miss_l = perceptron(X, y, 0.5, 100)
plot_decision_boundary(X, theta)

Limitation of perceptron algorithm in python

  • It can never separate data that are not linearly separable, as it is simply a linear classifier.
  • The algorithm is only applied to issues involving Binary Classification.


Here is a summary of what you have learned about the Perceptron Algorithm Python Implementation:

  • Perceptron imitates the human brain neuron
  • Perceptron is a machine learning algorithm because it is used to learn the weights of input signals.
  • The weight is learned via the Perceptron method using the gradient descent approach. Both stochastic gradient descent and batch gradient descent may be used to discover the input signal weights.
  • Perceptron’s activation function is based on the unit step function, which returns 1 when the net input value is greater than or equal to 0 and 0 otherwise.
  • Additionally, the prediction is founded on the unit step function.

Visit our other article if you want to learn about Building A Fully Homomorphic Encryption Scheme in Python.

Leave a Comment