Skip to content

U — UDF

Rules that flag user-defined functions where native PySpark equivalents exist. UDFs are black boxes to the Spark optimizer and typically 10–100× slower than built-in functions.

Rule Title
U001 Avoid UDFs that return StringType — use built-in string functions
U002 Avoid UDFs that return ArrayType — use built-in array functions
U003 Avoid UDFs — use Spark built-in functions instead
U004 Avoid nested UDF calls — merge logic or use plain Python helpers
U005 Avoid loops inside a UDF body — use transform instead
U006 Avoid all() inside a UDF body — use forall instead
U007 Avoid any() inside a UDF body — use exists instead