Rule F012
Always wrap literal values with lit()
Severity
🟢 LOW — Minor performance impact.
PySpark version
Compatible with PySpark 1.3 and later.
Information
Using a literal value directly (e.g., withColumn("a", 1) or withColumn("a", b)) without lit() can lead to unexpected behavior:
- Spark may interpret the value as a column if a column with the same name exists
- Comparisons or assignments may produce incorrect results due to ambiguity between columns and literal values
- Reduces code clarity and can introduce subtle bugs in transformations
Best practices
- Use
pyspark.sql.functions.lit()to explicitly indicate literal values in any DataFrame operation - Ensures the intended value is used, not a column
- Improves readability and correctness across all transformations, including
filter(),withColumn(), and others
Example
Bad:
Good:
from pyspark.sql.functions import lit
df.withColumn("new_col", lit(1))
df.filter(df['status'] == lit('active'))
Rule of thumb: Always wrap literal values with lit() to avoid ambiguity with columns and ensure correct DataFrame transformations.