Introduction to Apache Spark

Livecoding
Fri 12:10 - 12:45
You need a laptop
Atlas 2
Software Design

Summary

This livecoding session introduces Apache Spark and is aimed at seasoned developers with an interest in understanding the streaming data pipelines that power today’s real-time analytics engines.

Apache Spark is the open-source cluster computing framework that has largely replaced Hadoop in recent years. It features in-memory processing and streaming capabilities as well as an SQL interface and a mature set of tools for machine learning and graph processing workloads.

We’ll first take a look at how to build a few basic static pipelines using Spark’s new DataSet API. Towards the end, we’ll examine a relatively complex Kafka-Spark-Cassandra streaming pipeline that more closely mimicks a real-life high-load production setting.

Who is it for?

AlbertArchitect
 
ChrisCTO
 
DianaDevOps
 
MeganManager
 
TamaraTeam
Leader
DavidDevelope
BiancaBusiness
Analyst
TudorTester

Who is it for?

AlbertArchitect
 
ChrisCTO
 
DianaDevOps
 
MeganManager
 
TamaraTeam
Leader
DavidDevelope
BiancaBusiness
Analyst
TudorTester

Leave a Reply