Skip to content

D — Driver

Rules that flag operations which pull data from the distributed cluster to the driver node. These are the most dangerous antipatterns — they can silently OOM the driver or stall a pipeline entirely.

Rule Title
D001 Avoid using collect()
D002 Avoid accessing .rdd
D003 Avoid .show() in production
D004 Avoid .count() on large DataFrames
D005 Avoid .rdd.isEmpty() — use .isEmpty() directly
D006 Avoid df.count() == 0 — use .isEmpty()
D007 Avoid .filter(...).count() == 0 — use .filter(...).isEmpty()
D008 Avoid .display() in production
D009 Avoid .count() as a boolean — use .isEmpty()