PySpark with pip
Published on Jan 8, 2024
·
1 min read
Prelude
You want to run something on pyspark. You cannot use conda.
Prerequisites
- a machine
- a terminal
- Java 8/11 installed &
JAVA_HOME
set - Python 3.8+ including pip installed
Terminal
pip install pyspark==3.4
pyspark
Addendum
I’m on Windows. It’s not working :(
You likely need the Hadoop .dll
and some environment variables. To be continued in a future article…
This looks even easier than with conda. How so?
Is it, though? Both Python and Java need to be installed here. Doing this isolated would require additional tools, such as pyenv, venv, or jenv. Further pain when doing it on Windows (see above).
What about other Spark / Python / Java versions?
See requirements here.