PERF — Performance
Rules that catch runtime performance antipatterns not covered by other categories.
| Rule | Title |
|---|---|
| PERF001 | Avoid .rdd.collect() — use .toPandas() for driver-side consumption |
| PERF002 | Too many getOrCreate() calls — use getActiveSession() everywhere else |
| PERF003 | Too many shuffle operations without a checkpoint |
| PERF004 | Avoid bare .persist() — always pass an explicit StorageLevel |
| PERF005 | DataFrame persisted but never unpersisted |
| PERF006 | Avoid bare .checkpoint() / .localCheckpoint() — always pass an explicit eager argument |
| PERF007 | DataFrame used 2 or more times without caching |
| PERF008 | Avoid spark.read.csv(parallelize()) — use spark.createDataFrame(pd.read_csv()) instead |