Rule F008
Avoid using print(); prefer logging
Severity
🟢 LOW — Minor performance impact.
PySpark version
Compatible with PySpark 1.0 and later.
Information
Using print() in production code can lead to:
- High I/O overhead, especially in distributed environments
- Large amounts of unstructured output that are hard to manage
- Reduced performance due to unnecessary console writes
- Difficulty in filtering and tracing important information
Best practices
- Use Python’s
loggingmodule to control log levels and destinations - Configure logs appropriately for the environment (e.g., DEBUG, INFO, WARN, ERROR)
- Avoid leaving
print()statements in production pipelines
Rule of thumb: Replace print() with structured logging to reduce I/O overhead and improve maintainability.
Example
Bad:
Good: