Skip to content

Rule F013

Avoid reserved column names (double-underscore prefix and suffix)

Severity

🟢 LOW — Minor performance impact.

PySpark version

Compatible with PySpark 1.0 and later.

Information

Columns with a leading __ and trailing __ (e.g. __index__, __natural_order__) are reserved by the pandas API on Spark for internal use.

  • Pandas API on Spark uses such columns to manage internal behaviors like indexing and ordering
  • Using reserved column names is not guaranteed to work and may produce incorrect or undefined results
  • The behavior may silently change across Spark or pandas API on Spark versions

Affected methods: withColumn(), withColumnRenamed(), alias(), selectExpr().

Best practices

  • Never name a column with both a __ prefix and __ suffix
  • Use descriptive names without double-underscore wrappers: _internal_id instead of __internal_id__
  • If you are interoperating with the pandas API on Spark, check the reserved column list in the documentation

Example

Bad:

df.withColumn("__index__", lit(1))
df.withColumnRenamed("id", "__natural_order__")
col("value").alias("__metadata__")

Good:

df.withColumn("internal_index", lit(1))
df.withColumnRenamed("id", "natural_order")
col("value").alias("metadata")