PySpark with pip

Published on Jan 8, 2024

·

1 min read

Blogpost Main Image

Prelude

You want to run something on pyspark. You cannot use conda.

Prerequisites

  • a machine
  • a terminal
  • Java 8/11 installed & JAVA_HOME set
  • Python 3.8+ including pip installed

Terminal

pip install pyspark==3.4
pyspark

Addendum

I’m on Windows. It’s not working :(

You likely need the Hadoop .dll and some environment variables. To be continued in a future article…

This looks even easier than with conda. How so?

Is it, though? Both Python and Java need to be installed here. Doing this isolated would require additional tools, such as pyenv, venv, or jenv. Further pain when doing it on Windows (see above).

What about other Spark / Python / Java versions?

See requirements here.

Notice something wrong? Have an additional tip?

Contribute to the discussion here