Rule F013
Avoid reserved column names (double-underscore prefix and suffix)
Severity
🟢 LOW — Minor performance impact.
PySpark version
Compatible with PySpark 1.0 and later.
Information
Columns with a leading __ and trailing __ (e.g. __index__, __natural_order__) are reserved by the pandas API on Spark for internal use.
- Pandas API on Spark uses such columns to manage internal behaviors like indexing and ordering
- Using reserved column names is not guaranteed to work and may produce incorrect or undefined results
- The behavior may silently change across Spark or pandas API on Spark versions
Affected methods: withColumn(), withColumnRenamed(), alias(), selectExpr().
Best practices
- Never name a column with both a
__prefix and__suffix - Use descriptive names without double-underscore wrappers:
_internal_idinstead of__internal_id__ - If you are interoperating with the pandas API on Spark, check the reserved column list in the documentation
Example
Bad:
df.withColumn("__index__", lit(1))
df.withColumnRenamed("id", "__natural_order__")
col("value").alias("__metadata__")
Good: