F — Format
Rules that enforce idiomatic use of the PySpark DataFrame API — covering code style, readability, and correct function usage.
| Rule | Title |
|---|---|
| F001 | Avoid chaining withColumn() and withColumnRenamed() |
| F002 | Avoid drop() — use select() for explicit columns |
| F003 | Avoid selectExpr() — prefer select() with col() |
| F004 | Avoid spark.sql() — prefer native DataFrame API |
| F005 | Avoid stacking multiple withColumn() — use withColumns() |
| F006 | Avoid stacking multiple withColumnRenamed() — use withColumnsRenamed() |
| F007 | Prefer filter() before select() for clarity |
| F008 | Avoid print() — prefer the logging module |
| F009 | Avoid nested when() — use stacked .when().when().otherwise() |
| F010 | Always include otherwise() at the end of a when() chain |
| F011 | Avoid backslash line continuation — use parentheses |
| F012 | Always wrap literal values with lit() |
| F013 | Avoid reserved column names with __ prefix and __ suffix |
| F014 | Avoid explode_outer() — handle nulls with higher-order functions |
| F015 | Avoid multiple consecutive filter() calls — combine conditions |
| F016 | Avoid long DataFrame renaming chains — overwrite the same variable |
| F017 | Avoid expr() — use native PySpark functions instead |
| F018 | Use Spark native datetime functions instead of Python datetime objects |
| F019 | Avoid inferSchema=True and mergeSchema=True in Spark read options |
| F020 | Avoid select("*") — use explicit column names |