Typeerror: column is not iterable

Have a problem fixing “Typeerror: column is not iterable”?

Don’t worry! and read this article to solve your problem.

In this article, we will discuss “Typeerror: column is not iterable”, provide the possible causes of this error, and give solutions to resolve the error.

Let’s start by understanding what this error means.

What is “Typeerror: column is not iterable”?

This “Typeerror: column is not iterable” is an error message indicating that you are trying to iterate over a column in a data structure (such as a pandas DataFrame or a list) that is not iterable.

In Python, an iterable is an object that can be looped over, such as a list or a tuple.

However, some objects, such as a single column in a pandas DataFrame, are not iterable by default.

So let us know the different reasons why this error occurs.

How does “Typeerror: column is not iterable” occurs?

Typeerror: column is not iterable” is an error that can happen in various situations, such as:

  • Iterating over a single column

Iterating over a single column in a pandas DataFrame without using an appropriate method to make it iterable, such as .iteritems() or .tolist().

  • Attempting to loop over a non-iterable object

An error happens if you are attempting to loop over a non-iterable object that has a column-like structure, such as a single column of a NumPy array or a list of dictionaries.

  • Mistakenly referencing a column

Mistakenly referencing a column that does not exist in the data structure.

It causes the program to try to iterate over a non-existent object.

  • Adding a column with a value from another column

An Error occurs when you try to add a new column to a pandas DataFrame with a value derived from another column.

Especially when you are not using the correct syntax.

For Example, When we try to add a month to the data column with a value from another column.

Here is an example code

from pyspark.sql.functions import add_months
data=[("2023-04-13",1),("2023-04-14",2),("2023-04-15",3)] 
df=spark.createDataFrame(data).toDF("date","increment") 
df.select(df.date,df.increment,add_months(df.date,df.increment)).show()

During the run time of this code, we get an error stating:

TypeError: Column is not iterable

  • Grouping the other column and getting the max of the other column

An error can occur when you try to group a pandas DataFrame by one column and get the maximum value of another column.

Here is an example:

DataFrame linesWithSparkDF has two columns: id and cycle. We want to group by id and find the max value of the cycle.

linesWithSparkGDF = linesWithSparkDF.groupBy(col("id")).agg(max(col("cycle")))

Of this, we get an error message:

TypeError: Column is not iterable

  • Creating a new column by mapping 2 column values using the create_map function

The error can occur when you try to create a new column in a pandas DataFrame by mapping two column values using the create_map() function.

For example:

We have a DataFrame df with a column desc and a column age.

we want to create a new column complex_map by mapping desc and age column values using the create_map function.

df.select(map(col("desc"), col("age")).alias("complex_map")).selectExpr("explode(complex_map)").show(2)

During the run time of this code, we get the error:

TypeError: Column is not iterable

Now let’s fix this error.

Typeerror: column is not iterable – Solutions

Here are the alternative solutions to fix the “Typeerror: column is not iterable”:

Solution 1: Use expr() function

We can use the expr() function, which can evaluate a string expression containing column references and literals.

By using expr(), you can pass a column object as a string to the add_months() function.

Along with the literal value representing the number of months to be added.

Here is the updated code when we try to add a month to the data column with a value from another column

df.select(df.date,df.increment,
     expr("add_months(date,increment)")
  .alias("inc_date")).show()

Solution 2:  Using the agg function

By using the agg function with a dictionary argument instead of passing a Column object directly.

linesWithSparkGDF = linesWithSparkDF.groupBy(col("id")).agg({"cycle": "max"})

Alternatively, we can use an alias for the max function using the as keyword:

from pyspark.sql.functions import max as spark_max

linesWithSparkGDF = linesWithSparkDF.groupBy(col("id")).agg(spark_max(col("cycle")))

Solution 3: use the PySpark create_map function 

Instead of using the map function, we can use the create_map function.

The map function is a Python built-in function, not a PySpark function.

Here is an updated code:

import pyspark.sql.functions as F

df.select(F.create_map(F.col("desc"), F.col("age")).alias("complex_map")).selectExpr("explode(complex_map)").show(2)

So those are the alternative solutions that you can use to solve your problem regarding an error “Typeerror: column is not iterable”.

I hope one or more of them helps you.

Here are the other fixed Python errors that you can visit, you might encounter them in the future.

Conclusion

In conclusion, in this article, we discuss Typeerror: column is not iterable, provide its causes and give solutions that resolve the error.

By following the given solution, surely you can fix the error quickly and proceed to your coding project again.

I hope this article helps you to solve your problem regarding a Typeerror stating column is not iterable.

We’re happy to help you.

Happy coding! Have a Good day and God bless.