Rule D005

Avoid using .rdd.isEmpty(); use .isEmpty() on DataFrames

Severity

🔴 HIGH — Major performance impact.

Compatible with PySpark 3.3 and later.

Calling .rdd.isEmpty() converts the DataFrame to an RDD, bypassing Spark’s optimized DataFrame API. This can lead to:

Use .isEmpty() directly on the DataFrame:

if df.isEmpty():
    # handle empty DataFrame

Rule of thumb: Stick to DataFrame APIs to benefit from Spark optimizations; avoid falling back to .rdd unnecessarily.

Bad:

if df.rdd.isEmpty(): pass

Good:

if df.isEmpty(): pass