Skip to content

Rule ARR002

Avoid array_except(col, None) — use array_compact() instead

Severity

🟢 LOW — Minor performance impact.

PySpark version

Compatible with PySpark 3.4 and later.

Information

array_except(arr, None) and array_except(arr, lit(None)) are a roundabout way to remove null values from an array. array_compact() is the purpose-built function for exactly this operation and expresses the intent directly.

  • array_except(col1, None) computes the set difference between the array and an empty/null operand — the only elements it removes are nulls, which is what array_compact() does
  • Using array_except with a null second argument obscures the intent and adds unnecessary overhead computing a set difference
  • array_compact() is available since Spark 3.4 and is semantically clear

Best practices

Replace:

# Bad
df.withColumn("cleaned", array_except(col("items"), lit(None)))
df.withColumn("cleaned", array_except(col("items"), None))

With:

# Good
df.withColumn("cleaned", array_compact(col("items")))