ARR — Array
Rules that catch inefficient or incorrect use of PySpark array functions.
| Rule | Title |
|---|---|
| ARR001 | Avoid array_distinct(collect_list()) — use collect_set() instead |
| ARR002 | Avoid array_except(col, None/lit(None)) — use array_compact() instead |
| ARR003 | Avoid array_distinct(collect_set()) — collect_set already returns distinct values |
| ARR004 | Avoid size(collect_set()) inside .agg() — use count_distinct() instead |
| ARR005 | Avoid size(collect_list()) inside .agg() — use count() instead |
| ARR006 | Avoid size(collect_list().over(w)) — use count().over(w) instead |