Rule F017
Avoid expr() — use native PySpark functions instead
Severity
🟢 LOW — Minor performance impact.
PySpark version
Compatible with PySpark 1.5 and later.
Information
expr() embeds a raw SQL string inside a DataFrame API call. It bypasses the Python type system, IDE autocompletion, and static analysis — errors are only caught at runtime when Spark parses the SQL fragment.
expr("a + b")andcol("a") + col("b")are equivalent, but only the latter is refactorable and statically analysable- SQL strings inside
expr()cannot be linted, renamed, or traced by standard tooling - Mixing
expr()with the DataFrame API is the same footgun as mixingspark.sql()— it fragments your code between two paradigms - Every operation supported by
expr()has a native PySpark equivalent: arithmetic, string functions, conditionals, window functions, etc.
Best practices
Replace SQL string expressions with their native PySpark equivalents: