Skip to content

PERF — Performance

Rules that catch runtime performance antipatterns not covered by other categories.

Rule Title
PERF001 Avoid .rdd.collect() — use .toPandas() for driver-side consumption
PERF002 Too many getOrCreate() calls — use getActiveSession() everywhere else
PERF003 Too many shuffle operations without a checkpoint
PERF004 Avoid bare .persist() — always pass an explicit StorageLevel
PERF005 DataFrame persisted but never unpersisted
PERF006 Avoid bare .checkpoint() / .localCheckpoint() — always pass an explicit eager argument
PERF007 DataFrame used 2 or more times without caching
PERF008 Avoid spark.read.csv(parallelize()) — use spark.createDataFrame(pd.read_csv()) instead