Rule ARR002
Avoid array_except(col, None) — use array_compact() instead
Severity
🟢 LOW — Minor performance impact.
PySpark version
Compatible with PySpark 3.4 and later.
Information
array_except(arr, None) and array_except(arr, lit(None)) are a roundabout way to remove null values from an array. array_compact() is the purpose-built function for exactly this operation and expresses the intent directly.
array_except(col1, None)computes the set difference between the array and an empty/null operand — the only elements it removes are nulls, which is whatarray_compact()does- Using
array_exceptwith a null second argument obscures the intent and adds unnecessary overhead computing a set difference array_compact()is available since Spark 3.4 and is semantically clear
Best practices
Replace:
# Bad
df.withColumn("cleaned", array_except(col("items"), lit(None)))
df.withColumn("cleaned", array_except(col("items"), None))
With: