SparkSQL with Python

This repository has some examples of using Spark and SparkSQL with Python through PySpark

Profeco

We will work with the Profeco dataset, which you can download here: Profeco , is a daily historical record of more than 2,000 products, as of 2015, in various establishments in Mexico

Check the code here

Countries airports

Check the code here

API to count the number of tweets in a radius of 1km

I will separate in another file “tweets_geo.csv” all the different tweets with their geographic data information, this will help in the manipulation of this data in a query with sparkSQL

Check the data preparation code here

The details of the code for the API REST is in the folder API in this repository

alt text

alt text

alt text

Contributing and Feedback

Any ideas or feedback about this repository?. Help me to improve it.

Authors

License

This project is licensed under the terms of the MIT license.