Big Data: Spark With Scala Certification Program

Home / Big Data: Spark With Scala Certification Program

Big Data Analytics: Spark With Scala Certification Program

Advance Big Data certification training would advance your expertise in Big Data Hadoop Ecosystem. With this Big Data Analytics Training & certification you will master the essential skills such as Spark Streaming, Spark SQL, Machine Learning Programming, GraphX Programming

Participants’ take away

  • Get clear understanding of the limitations of MapReduce and role of Spark in overcoming these limitations
  • Understand fundamentals of Scala Programming Language and it’s features
  • Explain & master the process of installing Spark as a standalone cluster
  • Expertise in using RDD for creating applications in Spark
  • Mastering SQL queries using SparkSQL
  • Gain thorough understanding of Spark Streaming features
  • Master & describe the features of Spark ML Programming in Scala
  • Who should do this course?

    With a number of opportunities in the field, the following job roles will get benefited from this course:

    • Software Developer / Analyst / Architect
    • Project Manager / Data Base Administrators
    • Big data developers

    Course Outline: Big Data: Spark with Scala

    Duration: 35 Hours

    Module 1: Core Scala

    • Getting Started with Scalable Language
    • Working with Data: Literals, Values, Variables & type
    • Expressions and Conditionals
    • Functions
    • First-Class Functions
    • Common Collections
    • More Collections
    • Hands-on lab

    Module 2: Object Oriented Scala

    • Classes
    • Objects, Case Classes and Traits
    • Advanced Typing
    • Hands-on lab

    Module 3: Introduction to Apache Spark
    What is Apache Spark?

    • A Unified Stack
    • Who uses Spark and for what?
    • A brief history of Spark
    • What is good and bad In MapReduce?
    • Spark Versions and Releases

    Module 4: Cloudera Quick Start VM Installation

    • Include Hadoop
    • Include Apache Spark
    • Include Hive
    • Include Sqoop
    • Include Hue
    • Hands-on lab

    Module 5: Programming with RDDs

    • RDD basics
    • Creating RDDs
    • RDD operations
    • Passing Function to Spark
    • Common transformations and actions
    • Persistence(Caching)
    • Hands-on lab

    Module 6: Playing with Pair RDDs

    • Core concepts of PairRDD
    • Creation of PairRDD
    • Aggregation in PairRDD
    • Aggregation functions understanding in depth
    • reduceByKey()
    • foldByKey()
    • combineByKey()
    • groupByKey()
    • aggregateByKey()
    • Hands-on lab

    Module 7: Loading and Saving Data

    • File Formats
    • Text File
    • JSON
    • Comma-Separated Values and Tab Separated Values
    • Sequence Files
    • Object Files
    • Hadoop input and output Formats
    • Hands-on lab

    Module 8: Build and Monitor Apache Spark Applications

    • Cluster Managers
    • Deploying Application with Spark Submit
    • A scala spark application built with SBT
    • Finding Information
    • Spark Web UI
    • Driver and Executor Logs
    • Hands-on-lab

    Module 9: Spark SQL

    • Linking with Spark SQl
    • Using Spark SQL in Applications
    • Loading and Saving Data
    • User Defined Functions
    • Spark SQL Performance tunning
    • Hands-on lab

    Module 10: Spark Advanced

    • Data Partitioning
    • What is Partitioning and why?
    • Data Partitioning example using Join (Hash Partitioning)
    • Understand Partitioning using Example for get Recommendations for Customer
    • Understand Partitioning code using Spark-Scala
    • Operations which create Partitioned RDD
    • Operation which get benefit of Partitioning
    • Operation that affect the partitioning
    • Accumulators
    • Broadcast Variables
    • Hands-on lab

    Module 11: Spark Streaming

    • Real/Near real time data processin
    • Streaming Sources and Sinks
    • DStream (Discretized Stream)
    • execution of Spark Streming
    • Spark Streaming Transformation (Stateless and Stateful)
    • Comining multiple DStream
    • Understanding transform() operator
    • Window Transformation
    • Window Duration and Sliding Duration
    • DStream Opeations
    • WordCount in DStream
    • Checkpointing
    • Hands-on lab

    Module 12: Machine Learning with MLlib and GraphX

    • Basics of ML and Data Science
    • Example of Machine Learning
    • Supervised and Unsupervised Learning
    • Key terminology e.g. features, training and testing
    • How to choose right algorithm
    • Common steps of Machine Learning
    • Collect data
    • Prepare Input data
    • Analyze Input data
    • Train the algorithm
    • Test the algorithm
    • Use the Algorithm
    • Graphx
    • Hands-on lab

    Module 13: Apache Spark – Scala Project + External Certification Preparation

    • Project 1: Movie Recommendation
    • Project 2: Real Time Stock market data processing
    • Project 3: Server log analysis
    • Spark developer Certification preparation for MapR and Databricks Distribution

    Why Data Brio Academy ?

    • Learn directly from industry practitioners with more than 22 years of corporate experience with companies like Dell R&D, Infosys, Perot System, Tektronics etc. etc.
    • Certification from WEBEL
    • Only institute with transparent faculty profiles directly from industry. Get the business perspective instead of learning just the tools & theories
    • Unique training methodology with hands-on sessions, real-time case studies, assignments with data sets and projects with end to end life cycle
    • End-to-end life cycle experience of real-time project. Internship provision in our parent company Business Brio. Business Brio (member of NASSCOM and CII -Confedration of Indian Industry) is an award-winning company that offers consulting and projects services in Big Data and Analytics to clients around the globe
    • 100% Placement assistance through dedicated placement cell (resume workshop, interview guidance and placement opportunities)



Email id



Please select location

Select Course

Select a Preferred Time to Call Back

Comments / Questions

Enter the characters displayed

Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.

Not readable? Change text. captcha txt

Start typing and press Enter to search