Rule F013

Avoid reserved column names (double-underscore prefix and suffix)

Severity

🟢 LOW — Minor performance impact.

Compatible with PySpark 1.0 and later.

Columns with a leading __ and trailing __ (e.g. __index__, __natural_order__) are reserved by the pandas API on Spark for internal use.

Pandas API on Spark uses such columns to manage internal behaviors like indexing and ordering
Using reserved column names is not guaranteed to work and may produce incorrect or undefined results
The behavior may silently change across Spark or pandas API on Spark versions

Affected methods: withColumn(), withColumnRenamed(), alias(), selectExpr().

Never name a column with both a __ prefix and __ suffix
Use descriptive names without double-underscore wrappers: _internal_id instead of __internal_id__
If you are interoperating with the pandas API on Spark, check the reserved column list in the documentation

Bad:

df.withColumn("__index__", lit(1))
df.withColumnRenamed("id", "__natural_order__")
col("value").alias("__metadata__")

Good:

df.withColumn("internal_index", lit(1))
df.withColumnRenamed("id", "natural_order")
col("value").alias("metadata")