Introduction to Apache Spark

Livecoding
Fri 12:10 - 12:45
You need a laptop
Atlas 2
Software Design

Summary

This livecoding session introduces Apache Spark and is aimed at seasoned developers with an interest in understanding the streaming data pipelines that power today’s real-time analytics engines.

Apache Spark is the open-source cluster computing framework that has largely replaced Hadoop in recent years. It features in-memory processing and streaming capabilities as well as an SQL interface and a mature set of tools for machine learning and graph processing workloads.

We’ll first take a look at how to build a few basic static pipelines using Spark’s new DataSet API. Towards the end, we’ll examine a relatively complex Kafka-Spark-Cassandra streaming pipeline that more closely mimicks a real-life high-load production setting.

Who is it for?

AlbertArchitect
 
CarolCTO
 
ChrisCraftsman
Programmer
CristinaTechnical
Co-Founder
DianaDevOps
 
MeganManager
 
TamaraTeam
Leader
TudorTechnical
Consultant

Leave a Reply

Your email address will not be published.

*