Skip to content

F — Format

Rules that enforce idiomatic use of the PySpark DataFrame API — covering code style, readability, and correct function usage.

Rule Title
F001 Avoid chaining withColumn() and withColumnRenamed()
F002 Avoid drop() — use select() for explicit columns
F003 Avoid selectExpr() — prefer select() with col()
F004 Avoid spark.sql() — prefer native DataFrame API
F005 Avoid stacking multiple withColumn() — use withColumns()
F006 Avoid stacking multiple withColumnRenamed() — use withColumnsRenamed()
F007 Prefer filter() before select() for clarity
F008 Avoid print() — prefer the logging module
F009 Avoid nested when() — use stacked .when().when().otherwise()
F010 Always include otherwise() at the end of a when() chain
F011 Avoid backslash line continuation — use parentheses
F012 Always wrap literal values with lit()
F013 Avoid reserved column names with __ prefix and __ suffix
F014 Avoid explode_outer() — handle nulls with higher-order functions
F015 Avoid multiple consecutive filter() calls — combine conditions
F016 Avoid long DataFrame renaming chains — overwrite the same variable
F017 Avoid expr() — use native PySpark functions instead
F018 Use Spark native datetime functions instead of Python datetime objects
F019 Avoid inferSchema=True and mergeSchema=True in Spark read options
F020 Avoid select("*") — use explicit column names