PySpark with pip
Published on Jan 8, 2024
·
1 min read

Prelude
You want to run something on pyspark. You cannot use conda.
Prerequisites
- a machine
- a terminal
- Java 8/11 installed &
JAVA_HOMEset - Python 3.8+ including pip installed
Terminal
pip install pyspark==3.4
pyspark Addendum
I’m on Windows. It’s not working :(
You likely need the Hadoop .dll and some environment variables. To be continued in a future article…
This looks even easier than with conda. How so?
Is it, though? Both Python and Java need to be installed here. Doing this isolated would require additional tools, such as pyenv, venv, or jenv. Further pain when doing it on Windows (see above).
What about other Spark / Python / Java versions?
See requirements here.