Big Data Processing covers the new large-scale programming models that allow to easily create algorithms that process massive amounts of information with a cluster of computer nodes. These platforms hide the complexity of coordinating complex parallel computations across the cooperating nodes, instead providing developers with a high-level programming model.
The module is based on the MapReduce programming model. Lectures explain how multiple data analysis algorithms can be expressed under this model, and executed automatically over clusters of machines. The module also covers the internal mechanisms that a MapReduce framework uses to coordinate and execute the job among the infrastructure. Finally, additional related topics in the area of Big Data, such as alternative large-scale processing platforms, NoSQL data stores, and Cloud Computing execution infrastructure are presented. In addition to the lectures, weekly lab sessions and coursework exercises present multiple applications where real-world datasets are analysed using platforms such as Spark Framework.