Apache Spark Installation Ipython Notebook Integration Guide For Mac

Tested with Apache Spark 2.1.0, Python 2.7.13 and Java 1.8.0_112

Apache Spark installation + ipython/jupyter notebook integration guide for macOS. Tested with Apache Spark 2.1.0, Python 2.7.13 and Java 1.8.0112.

For older versions of Spark and ipython, please, see also previous version of text.

Serial

Install Java Development Kit

Download and install it from oracle.com

Add following code to your e.g. .bash_profile

Install Apache Spark

Apache Spark Installation Ipython Notebook Integration Guide For Mac

You can use Mac OS package manager Brew (http://brew.sh/)

Set up env variables

Add following code to your e.g. .bash_profile

You can check SPARK_HOME path using following brew command

Also check py4j version and subpath, it mau differ from version to version.

Ipython profile

Since profiles are not supported in jupyter and now you can see following deprecation warning

It seems that it is not possible to run various custom startup files as it was with ipython profiles. Thus, the easiest way will be to run pyspark init script at the beginning of your notebook manually or follow alternative way.

Run ipython

Initialize pyspark

sc variable should be available

Alternatively

You can also force pyspark shell command to run ipython web notebook instead of command line interactive interpreter. To do so you have to add following env variables:

and then simply run

which will open a web notebook with sc available automatically.

Tested with Apache Spark 1.3.1, Python 2.7.9 and Java 1.8.0_45 + workaround for Spark 1.4.x from @enahwe.

Install Java Development Kit

Download and install it from oracle.com

Add following code to your e.g. .bash_profile

Install Apache Spark

You can use Mac OS package manager Brew (http://brew.sh/)

Set up env variables

Add following code to your e.g. .bash_profile

You can check SPARK_HOME path using following brew command

Create ipython profile

Run

Create a startup file

UPD: for Spark 1.4.x

You can try the universal 00-pyspark-setup.py script from @enahwe for Spark 1.3.x and 1.4.x:

Run ipython

sc variable should be available