You called orders.merge(customers, on="customer_id") and got KeyError. Both DataFrames have a “customer_id” column, you can see them, but pandas insists one is missing. This guide walks through the 4 most common causes.

📌 Quick answer: Print orders.columns.tolist() and customers.columns.tolist(). The join column name must match exactly in both. If one side has it as the index, use left_on="customer_id", right_index=True instead of on=.
Cause 1: Column name mismatch (case or whitespace)
“customer_id” vs “Customer_ID” vs “customer_id ” (trailing space) are all different to pandas.
orders.columns.tolist() # ['order_id', 'customer_id', 'amount']
customers.columns.tolist() # ['Customer_ID', 'name', 'email']
orders.merge(customers, on="customer_id") # ❌ KeyError: 'customer_id' not in customers
# Fix: normalize both sides
orders.columns = orders.columns.str.strip().str.lower()
customers.columns = customers.columns.str.strip().str.lower()
orders.merge(customers, on="customer_id") # ✓Cause 2: Join key is in the index, not a column
If you did customers.set_index("customer_id") earlier, customer_id is no longer a column.
customers = customers.set_index("customer_id")
orders.merge(customers, on="customer_id") # ❌ KeyError
# Fix 1: use right_index
orders.merge(customers, left_on="customer_id", right_index=True)
# Fix 2: reset the index
customers = customers.reset_index()
orders.merge(customers, on="customer_id") # ✓Cause 3: Column has different names in each DataFrame
“customer_id” in orders, “id” in customers, the natural join key but different labels.
orders.merge(customers, on="customer_id") # ❌ KeyError: 'customer_id' not in customers
# Fix: use left_on and right_on
orders.merge(customers, left_on="customer_id", right_on="id")Cause 4: dtype mismatch causes empty result, looks like KeyError downstream
Both columns exist with matching names, but one is int and the other is string. Merge succeeds but produces empty rows, and downstream code (like result["amount"]) raises KeyError because the column is now empty.
print(orders["customer_id"].dtype) # int64
print(customers["customer_id"].dtype) # object (string)
# Fix: cast to matching dtype first
customers["customer_id"] = customers["customer_id"].astype(int)
result = orders.merge(customers, on="customer_id") # ✓ rows now matchPrevention
- Normalize column names on both DataFrames immediately after load
- Verify dtypes match for join keys:
print(a["k"].dtype, b["k"].dtype) - Print columns before merge:
print(a.columns.tolist(), b.columns.tolist()) - Use suffixes parameter when both sides have overlapping non-key columns
Related Guides
Frequently Asked Questions
Why does merge raise KeyError when both DataFrames have the same column?
Case or whitespace mismatch. ‘customer_id’ in one and ‘Customer_ID’ or ‘customer_id ‘ in the other are different to pandas. Print both .columns.tolist() and normalize.
How do I merge when the key is in the index of one DataFrame?
Use left_index=True / right_index=True instead of on=. Example: orders.merge(customers, left_on=’customer_id’, right_index=True).
How do I merge when the join column has different names?
Use left_on and right_on: orders.merge(customers, left_on=’customer_id’, right_on=’id’). The result has both columns; drop one with .drop(columns=[‘id’]) afterward.
Why does my merge return an empty DataFrame?
dtype mismatch on the join key. orders[‘customer_id’] as int64 won’t match customers[‘customer_id’] as object (string). Cast both to the same dtype before merging.
What’s the difference between merge, join, and concat?
merge() is SQL-like, matches on key columns. join() defaults to joining on index. concat() stacks DataFrames vertically or horizontally without key matching. For column-key joins use merge; for index-key joins use join; for stacking use concat.
