A very old video, at the beginning, It talks about the fact that the world of mainstream software is changing is highlighted and a graph with the law of moore is shown, one of the most relevant conclusions of this introduction is that vertical scaling is needed for huge workloads and the popular parallel programming challenge.
Parallelism: Executes program faster on parallel hardware
Concurrency: Manage concurrent execution of threads.
Non-determinism caused by concurrent threads accessing shared mutable
state.
To get determinism processing, avoid the mutable state, which
means functional programming.
Scala has many adoptions in different kind of businesses: Financial platforms, web platforms, trading platforms and simulations
At the beginning of the video, Dean Wampler talks about what the term Big Data mean and about the needed for a change in the way the data is stored due to the large volume of information that currently exists and differentiates it from traditional storage techniques which is more expensive and slower.
The mapper will receive each documents and the content of the document, the mapper will tokenize the documents in some useful way, the sort and shuffle process will sort by key and the keys will be distribuited to the reducers, all the values related with the key will always go to the same reducer, inside each reducer there is some kind of collection of all values.
The Crunch_Java API is a framework for writing, and running MapReduce pipelines and workflows, and Scrunch is for Scala which is a container of Crunch.