Skip to content

Rule F012

Always wrap literal values with lit()

Severity

🟢 LOW — Minor performance impact.

PySpark version

Compatible with PySpark 1.3 and later.

Information

Using a literal value directly (e.g., withColumn("a", 1) or withColumn("a", b)) without lit() can lead to unexpected behavior:

  • Spark may interpret the value as a column if a column with the same name exists
  • Comparisons or assignments may produce incorrect results due to ambiguity between columns and literal values
  • Reduces code clarity and can introduce subtle bugs in transformations

Best practices

  • Use pyspark.sql.functions.lit() to explicitly indicate literal values in any DataFrame operation
  • Ensures the intended value is used, not a column
  • Improves readability and correctness across all transformations, including filter(), withColumn(), and others

Example

Bad:

df.withColumn("new_col", 1)
df.filter(df['status'] == 'active')

Good:

from pyspark.sql.functions import lit
df.withColumn("new_col", lit(1))
df.filter(df['status'] == lit('active'))

Rule of thumb: Always wrap literal values with lit() to avoid ambiguity with columns and ensure correct DataFrame transformations.