Rule F003
Avoid using selectExpr(); prefer select()
Severity
🟢 LOW — Minor performance impact.
PySpark version
Compatible with PySpark 1.3 and later.
Information
Using selectExpr() can make transformations less readable and harder to maintain:
- Expressions as strings are prone to typos and errors
- It’s harder to track column lineage in complex transformations
- Debugging becomes more difficult compared to using
select()with column objects
Best practices
- Prefer
select()withcol()or column expressions for clarity - Using
select()improves readability and makes transformations easier to maintain - Enables better compatibility with IDEs, static analysis, and refactoring
Rule of thumb: Use select() instead of selectExpr() for more readable, maintainable, and safer DataFrame transformations.
Example
Bad:
Good: