Martin Odersky, “Working Hard to Keep It Simple” - OSCON Java 2011

Youtube Video

A very old video, at the beginning, It talks about the fact that the world of mainstream software is changing is highlighted and a graph with the law of moore is shown, one of the most relevant conclusions of this introduction is that vertical scaling is needed for huge workloads and the popular parallel programming challenge.

Difference between parallelism and concurrency

Parallelism: Executes program faster on parallel hardware
Concurrency: Manage concurrent execution of threads.

Fundamental problems:

Scala is unifying several features:

Scala has many adoptions in different kind of businesses: Financial platforms, web platforms, trading platforms and simulations

Why Big Data Needs To Be Functional

Youtube Video

At the beginning of the video, Dean Wampler talks about what the term Big Data mean and about the needed for a change in the way the data is stored due to the large volume of information that currently exists and differentiates it from traditional storage techniques which is more expensive and slower.

  1. The size of the data increases exponentially
  2. Schemas are less formal and more relaxed due to unstructured data, various types of data must be accepted and managed.
  3. Data-Driven programs

Using Scala for Map Reduce

The mapper will receive each documents and the content of the document, the mapper will tokenize the documents in some useful way, the sort and shuffle process will sort by key and the keys will be distribuited to the reducers, all the values related with the key will always go to the same reducer, inside each reducer there is some kind of collection of all values.

Mapper
sort
reducers
Output

The Crunch_Java API is a framework for writing, and running MapReduce pipelines and workflows, and Scrunch is for Scala which is a container of Crunch.