Rule D003

Avoid using .display() in production code

Severity

🟡 MEDIUM — Moderate performance impact.

Compatible with PySpark 3.0 and later.

Using .display() in production is generally a bad practice because it:

Use .limit(n) with .collect() only for small samples if needed
Log DataFrame statistics (.count(), .describe()) instead of displaying full data
Avoid .display() in scheduled jobs or ETL pipelines

Rule of thumb: Reserve .display() for local debugging; never use it in production code.

Bad:

df.display()

Good:

# Do not use display()