Skip to the content.

docker-livy

Check the article here: Building Real-time communication with Apache Spark through Apache Livy

Dockerizing and Consuming an Apache Livy environment

As you can see, In order to reproduce a real example we would need three components:

As an additional component I would add docker for a faster implementation, and a PostgreSQL database server to simulate an external data source available for Apache Spark.

In order to reproduce the experiment we would need to follow the next steps:

ramse@DESKTOP-K6K6E5A MINGW64 /c
$ git clone https://github.com/Wittline/docker-livy.git
ramse@DESKTOP-K6K6E5A MINGW64 /c
$ cd docker-livy

ramse@DESKTOP-K6K6E5A MINGW64 /c/docker-livy
$ cd code

ramse@DESKTOP-K6K6E5A MINGW64 /c/docker-livy/code
$ cd apps

@DESKTOP-K6K6E5A MINGW64 /c/docker-livy/code/apps
$ docker-compose up -d --build
docker ps

ramse@DESKTOP-K6K6E5A MINGW64 ~
jupyter notebook --NotebookApp.allow_origin='https://colab.research.google.com' --port=8888 --NotebookApp.port_retries=0

Note that among the actions taken in the test-livy.ipynb file, the postgres database is being populated with data using an external .csv file, this in order for apache spark to have data to interact with.

The Python Package livyc works well to submit pyspark scripts dynamically and asynchronously to the Apache Livy server, this in turn interacts with the Apache Spark Cluster in a transparent way, check this project and remember to check all the files before interacting with the jupyter notebook file.

Contributing and Feedback

Any ideas or feedback about this repository?. Help me to improve it.

Authors

License

This project is licensed under the terms of the MIT License.