CLI reference
Synopsis
<PATH> can be a single .py file or a directory (scanned recursively).
Options
All options are optional. When both a pyproject.toml section and a CLI flag
are present, the CLI flag wins.
--config
Path to the pyproject.toml that contains the [tool.pyspark-antipattern]
section. Useful in monorepos where the config lives outside the working
directory.
--select
Show only the listed rules — everything else is silenced. Accepts exact rule IDs or single-letter group prefixes.
# Show only F018
pyspark-antipattern check src/ --select=F018
# Show only driver-side rules
pyspark-antipattern check src/ --select=D
# Show a specific mix
pyspark-antipattern check src/ --select=D001,S002,F018
--warn
Downgrade rules from error to warning. Warnings are printed but do not cause exit code 1.
--ignore
Completely silence one or more rules. Accepts exact rule IDs or single-letter group prefixes. Violations for silenced rules produce no output and do not affect the exit code.
# Silence one rule
pyspark-antipattern check src/ --ignore=D001
# Silence an entire category
pyspark-antipattern check src/ --ignore=F
# Silence a mix
pyspark-antipattern check src/ --ignore=S,D001,L003
--show_best_practice
Print the Best practices section from the rule documentation below each violation.
--show_information
Print the Information section from the rule documentation below each violation.
--distinct_threshold
S004 fires when the weighted count of .distinct() calls in a file exceeds
this value. Loop-multiplied calls count more than once.
--explode_threshold
S008 fires when the weighted count of explode() / explode_outer() calls in
a file exceeds this value.
--loop_threshold
L001/L002/L003 fire when a for loop over range(N) exceeds this iteration
count. while loops always assume 99 iterations.
--max_shuffle_operations
PERF003 fires when more than N shuffle-inducing operations occur between two checkpoints (or between the start of the file and the first checkpoint).
--exclude_dirs
Directory names to skip during recursive scanning. Replaces (does not extend) the built-in default exclusion list.
--severity
Only report violations whose performance impact meets or exceeds this level. Rules below the threshold are silenced for this run (no output, no exit code impact).
| Value | Shows |
|---|---|
low |
🟢 LOW + 🟡 MEDIUM + 🔴 HIGH (same as default) |
medium |
🟡 MEDIUM + 🔴 HIGH |
high |
🔴 HIGH only |
# Focus on the most impactful issues only
pyspark-antipattern check src/ --severity=high
# Include moderate issues too
pyspark-antipattern check src/ --severity=medium
The severity of each rule is shown as a colored badge in the terminal output immediately after the rule ID:
--pyspark-version
Tells the linter which PySpark version your cluster runs. Rules that recommend APIs introduced in a newer version are silenced — they are irrelevant if your cluster cannot use those APIs yet.
# My cluster runs PySpark 3.3 — suppress rules requiring 3.4+
pyspark-antipattern check src/ --pyspark-version=3.3
# Pin to an exact patch release
pyspark-antipattern check src/ --pyspark-version=3.5.1
When not set, all rules are shown regardless of their minimum version requirement.
Combining options
All options can be combined freely:
pyspark-antipattern check src/pipelines/ \
--config pyproject.toml \
--pyspark-version=3.3 \
--severity=medium \
--ignore=F008,F011 \
--warn=S004,S008 \
--show_best_practice=true \
--max_shuffle_operations=5 \
--distinct_threshold=3 \
--exclude_dirs=tests,vendor
Exit codes
| Code | Meaning |
|---|---|
0 |
No error-level violations found |
1 |
One or more error-level violations found |
Warnings never cause a non-zero exit code.
Priority: CLI vs pyproject.toml
When the same option is set in both places, the CLI flag always takes precedence. This makes it easy to tighten or relax rules for a single run without editing config files — useful in CI matrix builds or one-off audits.
# pyproject.toml has warn = ["D001"]
# but this run shows only F018:
pyspark-antipattern check src/ --select=F018