Chatbot Capstone Project in Python with Source Code (2026)

Most chatbot capstones we see fail in defense for the same reason. The panel asks “what model did you train?” and the team looks at each other for a second too long before someone says “we used the OpenAI API.” End of defense.

This guide gives you a chatbot capstone that won’t have that problem.

What you’ll build below is a Python chatbot that actually trains its own intent classifier on a JSON file of questions and answers you control. No API key. No external service. Runs on your laptop. Train it on a school FAQ, a clinic schedule, a barangay services list — whatever your capstone domain is — and the panel can poke at it live during your defense.

You’ll have a working version running in about 30 minutes. Customizing it for your own domain takes another 1 to 2 days.

What you’ll build

A small Flask web app that opens a chat interface in your browser. You type a question. The chatbot classifies your intent (greeting, asking about hours, asking about fees, etc.), then pulls a matching reply from your training data. If it’s not confident, it tells you so instead of hallucinating.

Features

Real intent classification using TF-IDF and Logistic Regression
Trained on a JSON file of intents you control
Flask-based chat UI
Confidence score on every reply
Fallback message when the bot isn’t sure
Easy to extend with a database, an API, or even a hybrid LLM later

Tech stack

Python 3.10 or higher
scikit-learn (the model)
Flask (web server)
NLTK (preprocessing)
Plain HTML, CSS, and JS for the chat UI
About 150 lines of code in total

Nothing exotic. Nothing that needs a paid API. Nothing that needs a GPU.

Why this chatbot survives a panel defense

Most chatbots students bring to defense get torn apart in three minutes. Here’s why this one doesn’t.

It trains a real classifier. Not regex. Not keyword matching. Not an API call. TF-IDF turns your training questions into numerical vectors, then Logistic Regression learns to map vectors to intents. You can print the confusion matrix and put it in Chapter 4.

It’s domain-customizable. Your intents.json is what makes the chatbot yours. Train it on hospital FAQs and you have a clinic bot. Train it on school enrollment questions and you have a registrar bot. Same code, different data. That’s how you defend “what’s new about this” — your dataset, not your model.

It has a fallback. When confidence drops below a threshold (we set it at 0.5), the bot says “I’m not sure what you’re asking — try rephrasing.” That single line saves you from the panel question “what happens when the user asks something not in the data?”

Before you start

You’ll need:

Python 3.10 or newer installed (python --version to check)
A code editor — VS Code is free and works well
Comfort with the terminal or PowerShell
Around 30 minutes for the first run

If you’ve never used pip before, install Python first from python.org and tick the “Add to PATH” option during install. Skipping that step is the #1 reason students get stuck on minute 2.

Project file structure

By the end, your folder will look like this:

chatbot-capstone/
├── app.py
├── train.py
├── chatbot.py
├── intents.json
├── model.pkl
├── requirements.txt
├── templates/
│   └── index.html
└── static/
    └── style.css

Create that folder now. Open it in VS Code.

Step 1 — Install the dependencies

In your terminal, inside the project folder:

pip install flask scikit-learn nltk numpy

Then create a requirements.txt so your panel and your future self know what to install:

flask==3.0.0
scikit-learn==1.4.0
nltk==3.8.1
numpy==1.26.0

You’ll also need NLTK’s tokenizer data. Run this once in a Python shell:

import nltk
nltk.download('punkt')

That’s it for setup.

Step 2 — Define your intents (intents.json)

This file is the brain of your chatbot. Each “intent” is a category of question. Each intent has a list of example questions (patterns) and a list of possible answers (responses).

Create intents.json with this starting structure:

{
  "intents": [
    {
      "tag": "greeting",
      "patterns": [
        "hi", "hello", "hey", "good morning", "good afternoon",
        "kumusta", "hi po", "magandang umaga"
      ],
      "responses": [
        "Hello! How can I help you today?",
        "Hi there! What can I do for you?"
      ]
    },
    {
      "tag": "hours",
      "patterns": [
        "what are your office hours",
        "when are you open",
        "anong oras kayo bukas",
        "are you open today",
        "operating hours"
      ],
      "responses": [
        "We're open Monday to Friday, 8 AM to 5 PM.",
        "Office hours are 8 AM to 5 PM on weekdays."
      ]
    },
    {
      "tag": "location",
      "patterns": [
        "where are you located",
        "what is your address",
        "asan kayo",
        "how do I find your office"
      ],
      "responses": [
        "We're located at 123 Sample Street, Binalbagan, Negros Occidental."
      ]
    },
    {
      "tag": "fees",
      "patterns": [
        "how much",
        "what are the fees",
        "magkano",
        "is there a charge",
        "do I need to pay"
      ],
      "responses": [
        "Our fees vary by service. Please call us for a quote."
      ]
    },
    {
      "tag": "fallback",
      "patterns": [],
      "responses": [
        "I'm not sure what you're asking. Could you try rephrasing?"
      ]
    }
  ]
}

A few things to notice. The tag is the intent name your model will predict. The patterns are example user inputs — the more variations you add (including Tagalog or Bisaya for local capstones), the better your model gets. The responses are what the bot will randomly pick from when it matches that intent.

The fallback intent has no patterns. It’s only triggered when confidence is low.

A real capstone should have 15 to 30 intents with at least 5 to 10 patterns each. The example above is just enough to get the system running. Expand it once you confirm everything works.

Step 3 — Train the model (train.py)

Create train.py:

import json
import pickle
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

with open('intents.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

X = []
y = []
for intent in data['intents']:
    if intent['tag'] == 'fallback':
        continue
    for pattern in intent['patterns']:
        X.append(pattern.lower())
        y.append(intent['tag'])

vectorizer = TfidfVectorizer(ngram_range=(1, 2))
X_vec = vectorizer.fit_transform(X)

model = LogisticRegression(max_iter=1000)
model.fit(X_vec, y)

predictions = model.predict(X_vec)
print(classification_report(y, predictions))

with open('model.pkl', 'wb') as f:
    pickle.dump({'model': model, 'vectorizer': vectorizer}, f)

print("\nModel trained and saved to model.pkl")

Run it:

python train.py

You should see a classification report (precision, recall, F1 per intent) and a confirmation that model.pkl was saved.

That classification report is gold for your Chapter 4. Screenshot it.

Step 4 — Build the chatbot logic (chatbot.py)

Create chatbot.py:

import json
import pickle
import random

class Chatbot:
    def __init__(self, model_path='model.pkl', intents_path='intents.json', threshold=0.5):
        with open(model_path, 'rb') as f:
            data = pickle.load(f)
        self.model = data['model']
        self.vectorizer = data['vectorizer']

        with open(intents_path, 'r', encoding='utf-8') as f:
            self.intents = {i['tag']: i for i in json.load(f)['intents']}

        self.threshold = threshold

    def reply(self, message):
        message = message.lower().strip()
        if not message:
            return {'tag': 'fallback', 'response': "Please type a message.", 'confidence': 0.0}

        X = self.vectorizer.transform([message])
        probs = self.model.predict_proba(X)[0]
        max_prob = float(probs.max())
        tag = self.model.classes_[probs.argmax()]

        if max_prob < self.threshold or tag not in self.intents:
            tag = 'fallback'

        response = random.choice(self.intents[tag]['responses'])
        return {'tag': tag, 'response': response, 'confidence': round(max_prob, 3)}

This class loads the trained model, takes a user message, predicts the intent, checks if confidence is high enough, and returns a response with the confidence score. The confidence score is what saves you in defense — the panel will love that the bot can say “I don’t know” instead of guessing.

Step 5 — Build the Flask web app (app.py)

Create app.py:

from flask import Flask, render_template, request, jsonify
from chatbot import Chatbot

app = Flask(__name__)
bot = Chatbot()

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/chat', methods=['POST'])
def chat():
    message = request.json.get('message', '')
    result = bot.reply(message)
    return jsonify(result)

if __name__ == '__main__':
    app.run(debug=True, port=5001)

That’s the entire backend. One route to serve the page, one route to handle messages.

Step 6 — Build the chat UI

Create templates/index.html:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <title>Capstone Chatbot</title>
  <link rel="stylesheet" href="/static/style.css" />
</head>
<body>
  <div class="chat-container">
    <header>
      <h1>Capstone Chatbot</h1>
    </header>
    <div id="messages" class="messages"></div>
    <form id="form" class="input-row">
      <input id="input" type="text" placeholder="Type your question..." autocomplete="off" />
      <button type="submit">Send</button>
    </form>
  </div>

  <script>
    const form = document.getElementById('form');
    const input = document.getElementById('input');
    const messages = document.getElementById('messages');

    function addMessage(text, who, confidence) {
      const div = document.createElement('div');
      div.className = 'msg ' + who;
      div.textContent = text + (confidence !== undefined ? ' (' + (confidence * 100).toFixed(0) + '%)' : '');
      messages.appendChild(div);
      messages.scrollTop = messages.scrollHeight;
    }

    form.addEventListener('submit', async (e) => {
      e.preventDefault();
      const text = input.value.trim();
      if (!text) return;
      addMessage(text, 'user');
      input.value = '';

      const res = await fetch('/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ message: text })
      });
      const data = await res.json();
      addMessage(data.response, 'bot', data.confidence);
    });
  </script>
</body>
</html>

Create static/style.css (brand colors used, swap if you want):

* { box-sizing: border-box; }
body {
  font-family: system-ui, -apple-system, sans-serif;
  margin: 0;
  background: #fafafa;
  color: #2c3e50;
}
.chat-container {
  max-width: 600px;
  margin: 40px auto;
  background: white;
  border-radius: 12px;
  box-shadow: 0 4px 20px rgba(0,0,0,0.06);
  overflow: hidden;
}
header {
  background: #1F3A5F;
  color: white;
  padding: 16px 20px;
}
header h1 { margin: 0; font-size: 18px; }
.messages {
  height: 420px;
  overflow-y: auto;
  padding: 20px;
  display: flex;
  flex-direction: column;
  gap: 10px;
}
.msg {
  max-width: 75%;
  padding: 10px 14px;
  border-radius: 12px;
  line-height: 1.4;
}
.msg.user {
  align-self: flex-end;
  background: #C9A961;
  color: #1F3A5F;
}
.msg.bot {
  align-self: flex-start;
  background: #f0f3f7;
}
.input-row {
  display: flex;
  border-top: 1px solid #eee;
  padding: 12px;
  gap: 8px;
}
.input-row input {
  flex: 1;
  padding: 10px 12px;
  border: 1px solid #ddd;
  border-radius: 8px;
  font-size: 14px;
}
.input-row button {
  background: #1F3A5F;
  color: white;
  border: none;
  padding: 0 20px;
  border-radius: 8px;
  cursor: pointer;
}

Step 7 — Run the chatbot

Train (only do this once, or every time you change intents.json):

python train.py

Run the server:

python app.py

Open your browser at http://localhost:5000. Type “hi” and watch it respond. Type “what time are you open” and watch it match the hours intent. Type something completely off-topic like “hello world” and watch the fallback kick in.

If you got this far in 30 minutes, you’ve finished what most chatbot capstone groups don’t reach until week 4 of development.

How to defend this in your panel

The four questions every chatbot panel asks, and what to say.

“Is this just ChatGPT?” No. The model is a Logistic Regression classifier trained on our own intents.json file. You can see the model file at model.pkl. No external API is called at any point. (Pull up the project folder during defense if needed.)

“What model and why?” TF-IDF for feature extraction, Logistic Regression for classification. (Have a comparison table in your appendix showing you also tried Naive Bayes and Random Forest, and Logistic Regression had the best macro F1 score on your validation set.)

“Where did your data come from?” We built the intents.json ourselves by interviewing X staff members at Y office and collecting the most common questions. We have Z intents with an average of N example patterns per intent.

“What’s your accuracy?” Show the classification_report output from train.py. Per-intent precision, recall, and F1. The panel doesn’t need to see it overall — they need to see you understand which intents the model handles well and which need more training data.

If you can answer those four questions calmly, you’ll pass.

How to customize this for your domain

The whole point of this setup is that the code stays the same and only your intents.json changes. Some directions:

School FAQ chatbot. Intents like enrollment, schedule, fees, scholarships, lost ID, graduation requirements. Interview your registrar for the questions.
Clinic appointment bot. Intents for appointment booking, services, doctors available, hours, emergency contacts.
Barangay services chatbot. Clearance, certifications, complaints, contact info, schedules.
Library reference bot. Borrowing rules, fines, hours, book search instructions.
Small business customer support. Hours, location, products, returns, contact.

For local relevance, add Tagalog, Bisaya, or Hiligaynon patterns to each intent. The TF-IDF + Logistic Regression combo handles mixed-language input surprisingly well as long as you give it examples.

Common errors and how to fix them

ModuleNotFoundError: No module named 'sklearn' — you missed pip install scikit-learn. Rerun the install command from Step 1.

FileNotFoundError: 'model.pkl' — you ran app.py before train.py. Run python train.py first.

Port 5000 already in use — change the port in app.py to 5001 or 8000.

Bot says “I’m not sure” to every message — your training data is too small or too repetitive. Add more patterns per intent. Aim for at least 5 to 10 example patterns per intent.

NLTK punkt download fails — try python -m nltk.downloader punkt from the terminal instead of inside Python.

Confidence is always 0.95+ — you don’t have enough variety in your patterns. The model is memorizing instead of generalizing. Add paraphrased versions of your patterns.

How to extend this project (beyond the basics)

Once your basic version works, here are extensions that look great in your Chapter 5 (Recommendations):

Database integration. Log every conversation to MySQL or SQLite. Your panel will love seeing analytics in your demo.
Voice input. Add the speech_recognition library and a microphone button to the UI.
Multilingual support. Train separate models per language, or one combined model with language-tagged intents.
Hybrid LLM mode. If confidence is below 0.3, optionally pass the message to a small open-source LLM (like a Hugging Face Phi or Mistral) for a generated answer. Document the privacy trade-offs.
Deploy to the web. Render and Railway both have free tiers that fit this app. Demo it live from a public URL during defense — panels remember the team whose project actually worked over WiFi.

Free download — source code

Grab the full source code. The folder includes everything above plus a sample expanded intents.json (35 intents, 250+ patterns) you can use as a starting point for your own.

Downloadable Source Code

UML diagrams you’ll need for your documentation

Most schools require these for an AI capstone:

Use Case Diagram — actors and what they can do
Activity Diagram — the conversation flow from user input to response
Sequence Diagram — the request from browser to Flask to model and back
Class Diagram — the Chatbot class and its dependencies
DFD Level 0 and Level 1 — data flow from user, through preprocessing, into the model, and out

We have detailed guides on each of these. Use them as templates and adapt to this project specifically.

Frequently Asked Questions

Is a chatbot a good capstone project for IT students?

Yes, a chatbot is one of the strongest AI capstone choices for IT students in 2026, as long as you train your own model instead of just wrapping an external API. Chatbots are easy to demo, easy to extend, and applicable to almost any domain such as schools, clinics, barangay offices, small businesses, and libraries. Panels approve them quickly because the use case is concrete and the AI does real work.

Do I need to use ChatGPT to make a chatbot capstone?

No, you do not need to use ChatGPT or any external API for a chatbot capstone. The chatbot in this guide uses a TF-IDF and Logistic Regression model trained on your own data, all running locally. Many panels actually prefer this approach because the AI part is something you built, not something you called. If you want to add LLM features later, you can do so as an optional extension.

What machine learning model does this chatbot use?

This chatbot uses TF-IDF vectorization to convert text into numerical features, then a Logistic Regression classifier to predict the intent of each user message. The combination is well-documented, fast to train, and easy to defend in a capstone panel. If you want to compare with other models, you can also try Naive Bayes or Random Forest using the same training script with minor changes.

How do I add my own questions and answers?

Open intents.json and add a new intent block with a unique tag, a list of example user questions called patterns, and a list of possible chatbot responses. Save the file, then re-run python train.py to retrain the model on the updated data. The web app does not need to be restarted unless you change the chatbot.py or app.py code. Aim for at least 5 to 10 example patterns per intent for reliable predictions.

Can I use this chatbot for my Tagalog or Filipino capstone?

Yes, you can add Tagalog, Bisaya, Hiligaynon, or any other local language patterns directly into intents.json. The TF-IDF and Logistic Regression model does not require a separate language model and handles mixed Tagalog-English input well as long as you provide enough examples per intent. For a stronger Filipino-language capstone, add at least 8 to 10 Tagalog patterns per intent alongside the English ones.

Pick it up. Make it yours. Defend it well.

Most chatbot capstones fail because students treat the chatbot as the whole project. It isn’t. The chatbot is the engine. Your data, your domain, your interface, and your documentation are the project.

If you’ve got this running, you’re already ahead of most groups.

For more AI capstone source code to study, browse our Python projects library. If you haven’t picked your capstone topic yet, see 150 Best Capstone Project Ideas for IT Students 2026 for the full list. And for the UML diagrams panels will ask you to draw, our UML guides cover every diagram type.

Now stop reading. Open VS Code. Train your first model tonight.

Chatbot Capstone Project in Python (Free Source Code + Docs)

What you’ll build

Features

Tech stack

Why this chatbot survives a panel defense

Before you start

Project file structure

Step 1 — Install the dependencies

Step 2 — Define your intents (intents.json)

Step 3 — Train the model (train.py)

Step 4 — Build the chatbot logic (chatbot.py)

Step 5 — Build the Flask web app (app.py)

Step 6 — Build the chat UI

Step 7 — Run the chatbot

How to defend this in your panel

How to customize this for your domain

Common errors and how to fix them

How to extend this project (beyond the basics)

Free download — source code

UML diagrams you’ll need for your documentation

Frequently Asked Questions

Pick it up. Make it yours. Defend it well.

Looking for similar projects or tutorials?

Leave a Comment Cancel reply

What you’ll build

Features

Tech stack

Why this chatbot survives a panel defense

Before you start

Project file structure

Step 1 — Install the dependencies

Step 2 — Define your intents (intents.json)

Step 3 — Train the model (train.py)

Step 4 — Build the chatbot logic (chatbot.py)

Step 5 — Build the Flask web app (app.py)

Step 6 — Build the chat UI

Step 7 — Run the chatbot

How to defend this in your panel

How to customize this for your domain

Common errors and how to fix them

How to extend this project (beyond the basics)

Free download — source code

UML diagrams you’ll need for your documentation

Frequently Asked Questions

Pick it up. Make it yours. Defend it well.

Looking for similar projects or tutorials?

Leave a Comment Cancel reply

Quick Links

Top Categories

Get Free Capstone Resources