🎓 Free Capstone Projects with Full Documentation, ER Diagrams & Source Code — Updated Weekly for 2026
👨‍💻 Free Source Code & Capstone Projects for Developers

How to Build a RAG Capstone Project (Step-by-Step with Code)

RAG is the most-approved LLM capstone pattern in 2026. Panels approve it because the engineering is real. Students approve it because once you understand the four steps, it’s not hard.

Most students hear “RAG” and assume it’s some elite-tier AI thing. It’s not. The whole pipeline is: parse documents, chunk them, embed them, store them, search for relevant chunks at query time, stuff them into the LLM prompt, generate the answer. That’s it. Four steps. About 250 lines of Python.

This guide walks through all four steps. By the end, you’ll have a working Q&A system over a folder of PDFs — your school handbook, your barangay services manual, your hospital’s clinical guidelines, whatever your capstone domain is. Cost during development: about 50 pesos in OpenAI API charges. Or free if you use the local LLM alternative covered at the end.

RAG capstone project in Python with ChromaDB and Flask

What you’ll build

A Flask web app where a user asks a question in natural language and the system answers using your indexed documents. Every answer includes source citations (which file, which page) so you can verify it’s not hallucinating.

Features

  • Ingest PDFs from a folder
  • Chunk + embed + store in ChromaDB (free, local vector database)
  • Flask web app with question input and cited answer display
  • Top-k retrieval with similarity scores
  • Source citations on every answer (filename + page number)
  • Fallback message when no relevant context is found

Tech stack

  • Python 3.10 or higher
  • ChromaDB (free, local, persistent vector database)
  • pypdf (PDF text extraction)
  • OpenAI Python SDK (embeddings + chat completion)
  • Flask (web server)
  • python-dotenv (API key management)
  • About 250 lines of code total

The Ollama + Llama 3.1 free-local-alternative is covered near the end if your school doesn’t allow paid APIs.

How RAG actually works

Four steps. Every RAG system is built on this pattern. Memorize this — your panel will ask.

1. Ingest. You take your documents (PDFs, Word files, web pages, whatever) and split them into chunks of ~500 tokens each. Each chunk gets turned into a vector (an embedding) using an embedding model. The chunks and their embeddings are stored in a vector database with metadata showing where each chunk came from.

2. Retrieve. When a user asks a question, you turn the question into a vector using the same embedding model. You search the vector database for the top-k chunks whose vectors are closest to the question’s vector. That’s your “relevant context.”

3. Augment. You build a prompt that includes the retrieved chunks plus the original question. Something like: “Based on the following context, answer the question. Context: [retrieved chunks]. Question: [user input].”

4. Generate. You send the prompt to an LLM. The LLM generates the answer based only on what’s in the context — not on its training data. That’s the magic, and the defensibility.

Draw this diagram in your Chapter 3. Every RAG panel will want to see it.

Why RAG beats fine-tuning and vanilla ChatGPT

Three reasons RAG is better than the alternatives for a capstone:

Vanilla ChatGPT can’t answer questions about your data. Your school handbook isn’t in its training set. Even if you paste the whole handbook into the chat, you’ll hit context limits. RAG solves this by selectively retrieving only the relevant parts.

Fine-tuning is expensive, slow, and brittle. To fine-tune a model on your documents, you need thousands of labeled examples, hours of GPU time, and the result is locked to that specific data — change a document and you re-fine-tune everything. RAG updates instantly when you re-ingest a changed file.

RAG is defensible. The panel can see exactly where each answer came from (the source citation). They can verify the answer against the actual document. That transparency is gold during defense.

Before you start

You need:

  • Python 3.10 or higher
  • An OpenAI API key (set a usage cap of 500 pesos to be safe — actual development cost is around 50)
  • 5 to 10 PDF documents to index. We’ll talk about what makes a good document set in a moment.
  • About 60 minutes for the first complete run

If your school doesn’t allow paid APIs, skip ahead to the “Free local alternative” section near the end — same code, swap two libraries.

The document choice — what makes RAG capstones defensible

This is where most students get lazy and pay for it in defense.

A defensible RAG capstone has a document set the panel can verify is real, relevant, and substantial. Some examples that work well:

  • School handbook + faculty manual + policies — partner with your registrar
  • Barangay services manual + ordinances — partner with your local hall
  • Hospital clinical guidelines (with disclaimers, non-diagnostic only) — partner with a clinic adviser
  • Research paper collection in a specific field — 50 papers minimum
  • Legal contracts — anonymized, with permission
  • Filipino recipe collection — published cookbooks or a recipe blog you have rights to
  • Tourism guide for one province or region — DOT brochures, municipal tourism sites
  • HR policy + benefits manual — partner with a small business

The defense-winning sentence: “We ingested 47 PDFs containing 312 pages of barangay ordinances from Binalbagan, Negros Occidental, organized into 1,148 retrievable chunks.” Specific numbers. Specific source. Verifiable.

The defense-losing sentence: “We used some PDFs we downloaded from the internet.”

Project file structure

rag-capstone/
├── ingest.py
├── rag.py
├── app.py
├── .env
├── .env.example
├── requirements.txt
├── documents/
│   ├── handbook.pdf
│   └── policies.pdf
├── chroma_db/
├── templates/
│   └── index.html
└── static/
    └── style.css

The chroma_db/ folder will be created automatically when you run ingest.py. Don’t commit it to Git — add it to .gitignore.

Step 1 — Install the dependencies

In your terminal, inside the project folder:

pip install openai chromadb pypdf flask python-dotenv tiktoken

Create requirements.txt:

openai==1.40.0
chromadb==0.4.24
pypdf==4.0.0
flask==3.0.0
python-dotenv==1.0.1
tiktoken==0.6.0

ChromaDB will take a minute to install — it’s a heavier dependency than the others. Don’t worry about the warnings during install.

Step 2 — Set up your OpenAI API key

Sign up at platform.openai.com, add a payment method, and create an API key. Then in your project folder, create two files:

.env (this stays on your computer, never commit it):

OPENAI_API_KEY=sk-proj-your-actual-key-here

.env.example (this you can commit, shows the format without leaking the key):

OPENAI_API_KEY=your-key-here

Add .env to your .gitignore. Forgetting this step is how students accidentally leak API keys to GitHub and wake up to a 5000-peso bill from someone using their key.

For cost expectations: ingesting 50 PDFs of 10 pages each costs about 5 pesos in embedding fees. Each question costs about 0.10 to 0.30 pesos. Total development: 30 to 80 pesos across the whole capstone.

Step 3 — Build the ingest pipeline (ingest.py)

This script walks your documents/ folder, parses each PDF, chunks the text, embeds the chunks, and stores them in ChromaDB with metadata.

Create ingest.py:

import os
from pypdf import PdfReader
from openai import OpenAI
from dotenv import load_dotenv
import chromadb
import tiktoken

load_dotenv()
client = OpenAI()
chroma_client = chromadb.PersistentClient(path='chroma_db')
collection = chroma_client.get_or_create_collection('documents')

CHUNK_SIZE = 500
CHUNK_OVERLAP = 50
EMBED_MODEL = 'text-embedding-3-small'
encoder = tiktoken.get_encoding('cl100k_base')

def chunk_text(text, size=CHUNK_SIZE, overlap=CHUNK_OVERLAP):
    tokens = encoder.encode(text)
    chunks = []
    i = 0
    while i < len(tokens):
        chunk_tokens = tokens[i:i + size]
        chunks.append(encoder.decode(chunk_tokens))
        i += size - overlap
    return chunks

def embed(texts):
    response = client.embeddings.create(input=texts, model=EMBED_MODEL)
    return [item.embedding for item in response.data]

def ingest_pdf(path):
    reader = PdfReader(path)
    filename = os.path.basename(path)
    docs, ids, metas = [], [], []
    chunk_idx = 0
    for page_num, page in enumerate(reader.pages, start=1):
        text = page.extract_text() or ''
        if not text.strip():
            continue
        for chunk in chunk_text(text):
            docs.append(chunk)
            ids.append(f"{filename}_p{page_num}_c{chunk_idx}")
            metas.append({'source': filename, 'page': page_num})
            chunk_idx += 1

    if not docs:
        print(f"  No text extracted from {filename}")
        return

    print(f"  Embedding {len(docs)} chunks...")
    for i in range(0, len(docs), 100):
        batch = docs[i:i+100]
        embeddings = embed(batch)
        collection.add(
            documents=batch,
            embeddings=embeddings,
            ids=ids[i:i+100],
            metadatas=metas[i:i+100]
        )
    print(f"  Done: {filename}")

if __name__ == '__main__':
    docs_folder = 'documents'
    pdfs = [f for f in os.listdir(docs_folder) if f.lower().endswith('.pdf')]
    print(f"Found {len(pdfs)} PDFs")
    for pdf in pdfs:
        print(f"\nIngesting {pdf}")
        ingest_pdf(os.path.join(docs_folder, pdf))

    print(f"\nTotal chunks in collection: {collection.count()}")

Drop a few PDFs into the documents/ folder, then run:

python ingest.py

You’ll see each PDF being parsed, chunked, and embedded. Total time depends on document size — expect 30 seconds to 5 minutes for a typical handbook.

The script is idempotent only if you change the IDs — if you re-run with the same PDFs, you’ll get a “duplicate ID” error. Either delete chroma_db/ first or add an existence check.

Step 4 — Build the RAG wrapper (rag.py)

This class connects to ChromaDB, embeds the user question, retrieves the top-k chunks, builds the prompt, and calls the LLM.

Create rag.py:

from openai import OpenAI
from dotenv import load_dotenv
import chromadb

load_dotenv()
client = OpenAI()
chroma_client = chromadb.PersistentClient(path='chroma_db')
collection = chroma_client.get_collection('documents')

EMBED_MODEL = 'text-embedding-3-small'
CHAT_MODEL = 'gpt-4o-mini'
TOP_K = 4

SYSTEM_PROMPT = (
    "You are a helpful assistant that answers questions based ONLY on the "
    "provided context. If the context does not contain the answer, reply with "
    "'I do not have that information in my documents.' Always cite the source "
    "filename and page number from the context. Do not make up information."
)

class RAGSystem:
    def __init__(self, top_k=TOP_K):
        self.top_k = top_k

    def retrieve(self, question):
        question_embedding = client.embeddings.create(
            input=[question], model=EMBED_MODEL
        ).data[0].embedding

        results = collection.query(
            query_embeddings=[question_embedding],
            n_results=self.top_k
        )
        return [
            {
                'text': doc,
                'source': meta['source'],
                'page': meta['page'],
                'distance': dist
            }
            for doc, meta, dist in zip(
                results['documents'][0],
                results['metadatas'][0],
                results['distances'][0]
            )
        ]

    def answer(self, question):
        chunks = self.retrieve(question)
        if not chunks:
            return {'answer': 'No documents indexed.', 'sources': []}

        context = '\n\n'.join(
            f"[{c['source']} page {c['page']}]\n{c['text']}"
            for c in chunks
        )
        user_prompt = f"Context:\n{context}\n\nQuestion: {question}"

        completion = client.chat.completions.create(
            model=CHAT_MODEL,
            messages=[
                {'role': 'system', 'content': SYSTEM_PROMPT},
                {'role': 'user', 'content': user_prompt}
            ],
            temperature=0.2
        )
        answer = completion.choices[0].message.content
        sources = [
            {'source': c['source'], 'page': c['page'], 'distance': round(c['distance'], 3)}
            for c in chunks
        ]
        return {'answer': answer, 'sources': sources}

The temperature=0.2 keeps the LLM close to the source material instead of getting creative. For a Q&A system, you want low temperature. For creative writing, you’d raise it.

The SYSTEM_PROMPT is the most important piece. It tells the model to only use the provided context and to admit when it doesn’t know. Without this, the model will hallucinate confidently.

Step 5 — Build the Flask app (app.py)

Create app.py:

from flask import Flask, render_template, request, jsonify
from rag import RAGSystem

app = Flask(__name__)
rag = RAGSystem()

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/ask', methods=['POST'])
def ask():
    question = request.json.get('question', '').strip()
    if not question:
        return jsonify({'error': 'Please enter a question.'}), 400

    result = rag.answer(question)
    return jsonify(result)

if __name__ == '__main__':
    app.run(debug=True, port=5000)

Two routes. The home page renders the form. The ask route takes a question and returns the answer with sources.

Step 6 — Build the UI

Create templates/index.html:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <title>RAG Capstone — Document Q&A</title>
  <link rel="stylesheet" href="/static/style.css" />
</head>
<body>
  <div class="container">
    <header>
      <h1>Document Q&A</h1>
      <p>Ask questions about your indexed documents.</p>
    </header>

    <form id="form">
      <textarea id="question" rows="3" placeholder="Ask a question about your documents..."></textarea>
      <button type="submit">Ask</button>
    </form>

    <div id="status" class="status hidden">Thinking...</div>

    <div id="result" class="result hidden">
      <h2>Answer</h2>
      <p id="answer"></p>
      <h3>Sources</h3>
      <ul id="sources"></ul>
    </div>
  </div>

  <script>
    const form = document.getElementById('form');
    const status = document.getElementById('status');
    const result = document.getElementById('result');
    const answerEl = document.getElementById('answer');
    const sourcesEl = document.getElementById('sources');

    form.addEventListener('submit', async (e) => {
      e.preventDefault();
      const question = document.getElementById('question').value.trim();
      if (!question) return;

      result.classList.add('hidden');
      status.classList.remove('hidden');

      const res = await fetch('/ask', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ question })
      });
      const data = await res.json();

      status.classList.add('hidden');
      answerEl.textContent = data.answer || data.error;
      sourcesEl.innerHTML = '';
      (data.sources || []).forEach(s => {
        const li = document.createElement('li');
        li.textContent = s.source + ' — page ' + s.page + ' (distance ' + s.distance + ')';
        sourcesEl.appendChild(li);
      });
      result.classList.remove('hidden');
    });
  </script>
</body>
</html>

Create static/style.css:

* { box-sizing: border-box; }
body {
  font-family: system-ui, -apple-system, sans-serif;
  margin: 0;
  background: #fafafa;
  color: #2c3e50;
}
.container {
  max-width: 720px;
  margin: 40px auto;
  background: white;
  border-radius: 12px;
  box-shadow: 0 4px 20px rgba(0,0,0,0.06);
  padding: 28px;
}
header h1 {
  margin: 0 0 4px;
  color: #1F3A5F;
}
header p { margin: 0 0 20px; color: #5a6a7a; }
textarea {
  width: 100%;
  padding: 12px;
  border: 1px solid #ddd;
  border-radius: 8px;
  font-size: 14px;
  font-family: inherit;
  resize: vertical;
}
button {
  background: #1F3A5F;
  color: white;
  border: none;
  padding: 12px 24px;
  border-radius: 8px;
  cursor: pointer;
  margin-top: 8px;
  font-size: 14px;
}
button:hover { background: #163049; }
.status {
  margin-top: 16px;
  padding: 12px;
  background: #fdfaf2;
  border-left: 4px solid #C9A961;
  border-radius: 4px;
  color: #5a6a7a;
}
.status.hidden { display: none; }
.result { margin-top: 24px; padding-top: 24px; border-top: 1px solid #eee; }
.result.hidden { display: none; }
.result h2 { color: #1F3A5F; margin: 0 0 8px; }
.result h3 { color: #5a6a7a; margin: 16px 0 8px; font-size: 14px; text-transform: uppercase; letter-spacing: 0.5px; }
.result p {
  background: #f0f3f7;
  padding: 14px;
  border-radius: 8px;
  line-height: 1.6;
  white-space: pre-wrap;
}
.result ul {
  list-style: none;
  padding: 0;
  margin: 0;
}
.result li {
  background: #fdfaf2;
  border-left: 3px solid #C9A961;
  padding: 8px 12px;
  margin-bottom: 6px;
  border-radius: 4px;
  font-size: 14px;
}

Step 7 — Run the system

Make sure you’ve already run ingest.py and your chroma_db/ folder exists. Then start the server:

python app.py

Open http://localhost:5000. Ask a question that should be answerable from your documents — like “What are the requirements for graduation?” if you indexed your school handbook. The answer comes back with source citations below.

Try a question your documents don’t cover, like “Who won the NBA finals in 2024?” Watch the model say “I do not have that information in my documents.” That refusal is the defense-saving behavior — the model isn’t hallucinating, it’s admitting uncertainty.

How to defend a RAG capstone

Four questions you’ll definitely hear.

“Is this just ChatGPT?” No. Vanilla ChatGPT can’t answer questions about our indexed documents — they aren’t in its training data. The retrieval step is where our engineering lives. Show the architecture diagram. Point at the chunking, embedding, vector search. The LLM only sees what we retrieved.

“How do you know the answer is correct?” Two answers. First, every response includes source citations — the panel can verify against the original document. Second, we built an evaluation set of 30 question-answer pairs the answers should match, and our accuracy on that set is X%. Have the eval set in your appendix.

“What if the documents don’t have the answer?” The system prompt instructs the model to reply “I do not have that information in my documents” when context is irrelevant. Demo this live by asking an off-topic question.

“What about hallucinations?” Three mitigations: low temperature (0.2) keeps the model close to the source, the system prompt forbids inventing information, and source citations let users verify. We accept the model can still hallucinate occasionally, but the citation layer makes hallucinations detectable.

Answer those four calmly and you’ll pass.

How to customize for your domain

Same code, different documents in the documents/ folder.

  • School handbook + policies bot — partner with your registrar for the documents
  • Barangay services Q&A — partner with your local hall
  • Hospital clinical guidelines lookup — non-diagnostic, with disclaimers
  • Legal contract Q&A — anonymized contracts with permission
  • Local cookbook + recipe assistant — Filipino dishes, regional cuisines
  • Tourism guide for one province — DOT materials + municipal sites
  • Code documentation search — index your own codebase as text
  • HR policy chatbot — employee handbooks, benefits manuals

The code doesn’t care about the domain. It cares about the document set being substantial, relevant, and verifiable.

Common errors and how to fix them

Error code: 401 — invalid OpenAI API key. Check .env for typos and confirm the key is active at platform.openai.com.

Rate limit exceeded during ingest — you’re hitting OpenAI’s per-minute embedding limit. Add a 1-second sleep between batches in ingest.py.

RuntimeError: collection 'documents' not found — you ran app.py before ingest.py. Run ingestion first.

pypdf.errors.PdfReadError on a specific PDF — that PDF is corrupted or scanned (no text layer). Either skip it or run it through OCR first using Tesseract.

The bot says “I do not have that information” for everything — chunks too small or embedding model mismatch. Confirm CHUNK_SIZE is 500 and you’re using the same EMBED_MODEL in both ingest.py and rag.py.

Answers reference content not in the documents (hallucination) — your SYSTEM_PROMPT isn’t strict enough, or your retrieved chunks have very high distance scores. Add a distance threshold: only use chunks with distance below 0.5.

ChromaDB throws on re-ingest — delete the chroma_db/ folder before re-running ingest.py, OR add an “upsert” check on document IDs.

How to extend this project

Chapter 5 (Recommendations) extensions panels love:

  • Replace OpenAI with Ollama + Llama 3.1 — free, local, no API key. See the next section.
  • Add reranking — use a cross-encoder model (like cross-encoder/ms-marco-MiniLM-L-6-v2) to rerank the top-20 retrieved chunks down to top-4. Usually improves accuracy by 5-15%.
  • Multi-modal RAG — index images alongside text using CLIP embeddings.
  • Hybrid search — combine BM25 keyword search with embedding search for better retrieval on rare terms.
  • Conversational RAG — track chat history so follow-up questions like “what about the second one?” make sense.
  • RAGAS evaluation harness — formal RAG evaluation metrics for Chapter 4.
  • Deploy to Render or Railway — public URL, demo from anywhere during defense.

Free local alternative — Ollama + sentence-transformers

If your school doesn’t allow paid APIs, swap two libraries and the rest of the code stays the same.

Install Ollama from ollama.ai, then pull a model:

ollama pull llama3.1:8b

Install Python dependencies:

pip install sentence-transformers ollama

In ingest.py and rag.py, replace the OpenAI embedding calls with sentence-transformers:

from sentence_transformers import SentenceTransformer
embedder = SentenceTransformer('all-MiniLM-L6-v2')

def embed(texts):
    return embedder.encode(texts).tolist()

Replace the OpenAI chat completion with Ollama:

import ollama

response = ollama.chat(
    model='llama3.1:8b',
    messages=[
        {'role': 'system', 'content': SYSTEM_PROMPT},
        {'role': 'user', 'content': user_prompt}
    ]
)
answer = response['message']['content']

Quality drops a bit (Llama 3.1 8B is slightly weaker than GPT-4o-mini for complex questions), but it’s free, fully local, and survives the “you just paid OpenAI to do your capstone” criticism. For most BSIT capstones, this is the better choice.

Free download — source code

UML diagrams you’ll need for documentation

RAG projects have specific diagram needs:

  • Use Case Diagram — actors are user (asks questions) and admin (uploads documents); main use cases include question answering and document ingestion.
  • Sequence Diagram — request flow: user submits question → embed → query vector DB → top-k retrieve → build prompt → LLM call → return answer with citations.
  • Activity Diagram — two pipelines: ingestion (parse → chunk → embed → store) and query (embed → retrieve → augment → generate).
  • Data Flow Diagram — document → text → chunks → embeddings → vector DB → context → answer.
  • Class Diagram — RAGSystem class, embedding service, vector store wrapper, Flask routes.

We have detailed guides on each. Use them as templates and adapt to RAG specifically — your panel will love seeing both the ingestion and query pipelines as separate activity diagrams.


Frequently Asked Questions

What is a RAG capstone project?
A RAG (Retrieval-Augmented Generation) capstone project is an AI system that answers questions using your own documents instead of relying only on what an LLM learned during training. The system parses your documents, chunks them into smaller pieces, embeds each chunk as a vector, stores the vectors in a database, and at query time retrieves the most relevant chunks to feed into an LLM as context. RAG is the most defensible LLM capstone pattern in 2026 because the data layer is your engineering contribution, not just an API call.
Do I need ChatGPT API for a RAG project?
No, you do not need a paid ChatGPT API for a RAG capstone. You can build the entire system using Ollama with Llama 3.1 (free, runs locally on a 16GB RAM laptop) and sentence-transformers for embeddings (also free). The code structure stays the same — you only swap two libraries. Using local models is often more defensible during a panel because you can speak directly to the privacy and cost benefits of not sending data to OpenAI.
How is RAG different from fine-tuning?
RAG retrieves relevant document chunks at query time and feeds them to a general-purpose LLM as context. Fine-tuning bakes your data directly into the model weights through additional training. RAG is faster to set up (minutes versus hours), easier to update (just re-ingest changed documents), and more defensible because every answer cites which document it came from. Fine-tuning is more appropriate when you need the model to learn a specific writing style or to handle highly technical jargon not easily retrieved. For most student capstones, RAG is the right choice.
How many documents do I need for a RAG capstone?
A defense-ready RAG capstone should have at least 5 to 10 substantial documents (50+ pages total minimum), with 20 to 50 documents being ideal. Quality matters more than quantity — 10 well-chosen documents from a single real domain (your school’s handbook, a hospital’s protocols, a barangay’s records) are more defensible than 100 random PDFs from the internet. The defense-winning sentence is something like “We ingested 47 PDFs containing 312 pages of barangay ordinances from Binalbagan, organized into 1,148 retrievable chunks.”
Can I build RAG without paying for OpenAI?
Yes, you can build a complete RAG capstone without any paid API. Use Ollama (free LLM runner) with Llama 3.1 or Mistral as the language model, and sentence-transformers (free Python library) for embeddings. ChromaDB itself is free. Total cost: zero. The trade-off is slightly lower answer quality compared to GPT-4o-mini and slower inference on a laptop without a GPU, but for most BSIT capstones the difference is negligible during defense, and the privacy argument (no data leaves your machine) is often a stronger defense angle.

Index your docs tonight. Defend the system in a month.

RAG is the cleanest engineering story you can tell in a 2026 capstone defense. Four pipeline steps. Real documents. Source citations. A model that admits when it doesn’t know.

If you’ve followed this guide, you’re already past the point where most LLM capstones fall apart.

For more defensible LLM capstone patterns, see our 100 AI Capstone Project Ideas for IT Students (2026). If you haven’t picked your topic yet, browse 150 Best Capstone Project Ideas for IT Students 2026. For other Python AI source code to study, see our Python projects library. And for the UML diagrams your documentation will need, our UML guides cover every diagram type panels ask about.

Now stop scrolling. Drop your PDFs in the folder. Run ingest tonight.

Leave a Comment