Rule D006

Avoid using df.count() == 0; use .isEmpty() instead

Severity

🔴 HIGH — Major performance impact.

Compatible with PySpark 3.3 and later.

Using df.count() == 0 to check if a DataFrame is empty triggers a full scan, which is expensive on large datasets. This can lead to:

Use .isEmpty() for an efficient empty check:

if df.isEmpty():
    # handle empty DataFrame

Rule of thumb: Use .isEmpty() for emptiness checks to avoid costly full scans.

Bad:

if df.count() == 0: pass

Good:

if df.isEmpty(): pass