Apache Spark and Scala Certification Training:

Key features

  • 32 hours of instructor-led training
  • 15 hours of self-paced video
  • Includes topics on Spark streaming, Spark ML, and GraphX programming
  • 1 industry project for submission and 2 for hands-on practice
  • Includes downloadable ebooks and 30 demos

Exam & certification

How do I become a certified Apache Spark and Scala professional?

To become a Certified Apache Spark and Scala professional it is mandatory to fulfill both of the following criteria:



  • You must complete a project given by Digital Evolution Orbit that is evaluated by the lead trainer. Your project may be submitted through the learning management system (LMS). If you have any questions or difficulties while working on the project, you may get assistance and clarification from our experts at SimpliTalk. If you have any further issues you may look to our Online Classroom Training, where you may attend any of the ongoing batches of Apache Spark and Scala Certification Training classes to get help with your project.

  •  A minimum score of 80 percent is required to pass the online examination. If you don’t pass the online exam on the first attempt, you are allowed to retake the exam once.

  • At the end of the Scala course, you will receive an experience certificate stating that you have three months experience implementing Spark and Scala.

What are the prerequisites for the Scala course?

The prerequisites for the Apache Spark and Scala course are:



  • Fundamental knowledge of any programming language

  • Basic understanding of any database, SQL and query language for databases

  • Working knowledge of Linux- or Unix-based systems (not mandatory)

  • Certification training as a Big Data Hadoop developer (recommended)

What do I need to do to unlock my Digital Evolution Orbit certificate?

Online Classroom:



  • Attend one complete batch

  • Complete one project and one simulation test with a minimum score of 60 percent



Online Self-Learning:



  • Complete 85 percent of the course

  • Complete one project and one simulation test with a minimum score of 60 percent

Contact Us

GET IN TOUCH. WE LOVE TO HEAR FROM YOU.

Course Details

Course description

Digital Evolution Orbit’s Apache Spark and Scala certification training is designed to:



  • Advance your expertise in the Big Data Hadoop Ecosystem

  • Help you master essential Apache and Spark skills, such as: Spark Streaming, Spark SQL, machine learning programming, GraphX programming and Shell Scripting Spark

  •  Help you land a Hadoop developer job requiring Apache Spark expertise by giving you  a real-life industry project coupled with 30 demos

By completing this Apache Spark and Scala course you will be able to:





  • Understand the limitations of MapReduce and the role of Spark in overcoming these limitations




  • Understand the fundamentals of the Scala programming language and its features




  • Explain and master the process of installing Spark as a standalone cluster




  • Develop expertise in using Resilient Distributed Datasets (RDD) for creating applications in Spark




  • Master Structured Query Language (SQL) using SparkSQL




  • Gain a thorough understanding of Spark streaming features




  • Master and describe the features of Spark ML programming and GraphX programming




  • Professionals aspiring for a career in the field of real-time big data analytics

  • Analytics professionals

  • Research professionals

  • IT developers and testers

  • Data scientists

  • BI and reporting professionals

  • Students who wish to gain a thorough understanding of Apache Spark

This Apache Spark and Scala training course has one project. In this project scenario, a U.S.based university has collected datasets which represent reviews of movies from multiple reviewers. To gain in-depth insights from the research data collected, you must perform a series of tasks in Spark on the dataset provided.

Course Preview

  • 0.1 Introduction
  • 0.2 Course Objectives
  • 0.3 Course Overview
  • 0.4 Target Audience
  • 0.5 Course Prerequisites
  • 0.6 Value to the Professionals
  • 0.7 Value to the Professionals (contd.)
  • 0.8 Value to the Professionals (contd.)
  • 0.9 Lessons Covered
  • 0.10 Conclusion
  • 1.1 Introduction
  • 1.2 Objectives
  • 1.3 Evolution of Distributed Systems
  • 1.4 Need of New Generation Distributed Systems
  • 1.5 Limitations of MapReduce in Hadoop
  • 1.6 Limitations of MapReduce in Hadoop (contd.)
  • 1.7 Batch vs. Real-Time Processing
  • 1.8 Application of Stream Processing
  • 1.9 Application of In-Memory Processing
  • 1.10 Introduction to Apache Spark
  • 1.11 Components of a Spark Project
  • 1.12 History of Spark
  • 1.13 Language Flexibility in Spark
  • 1.14 Spark Execution Architecture
  • 1.15 Automatic Parallelization of Complex Flows
  • 1.16 Automatic Parallelization of Complex Flows-Important Points
  • 1.17 APIs That Match User Goals
  • 1.18 Apache Spark-A Unified Platform of Big Data Apps
  • 1.19 More Benefits of Apache Spark
  • 1.20 Running Spark in Different Modes
  • 1.21 Installing Spark as a Standalone Cluster-Configurations
  • 1.22 Installing Spark as a Standalone Cluster-Configurations
  • 1.23 Demo-Install Apache Spark
  • 1.24 Demo-Install Apache Spark
  • 1.25 Overview of Spark on a Cluster
  • 1.26 Tasks of Spark on a Cluster
  • 1.27 Companies Using Spark-Use Cases
  • 1.28 Hadoop Ecosystem vs. Apache Spark
  • 1.29 Hadoop Ecosystem vs. Apache Spark (contd.)
  • 1.30 Quiz
  • 1.31 Summary
  • 1.32 Summary (contd.)
  • 1.33 Conclusion
  • 2.1 Introduction
  • 2.2 Objectives
  • 2.3 Introduction to Scala
  • 2.4 Features of Scala
  • 2.5 Basic Data Types
  • 2.6 Basic Literals
  • 2.7 Basic Literals (contd.)
  • 2.8 Basic Literals (contd.)
  • 2.9 Introduction to Operators
  • 2.10 Types of Operators
  • 2.11 Use Basic Literals and the Arithmetic Operator
  • 2.12 Demo Use Basic Literals and the Arithmetic Operator
  • 2.13 Use the Logical Operator
  • 2.14 Demo Use the Logical Operator
  • 2.15 Introduction to Type Inference
  • 2.16 Type Inference for Recursive Methods
  • 2.17 Type Inference for Polymorphic Methods and Generic Classes
  • 2.18 Unreliability on Type Inference Mechanism
  • 2.19 Mutable Collection vs. Immutable Collection
  • 2.20 Functions
  • 2.21 Anonymous Functions
  • 2.22 Objects
  • 2.23 Classes
  • 2.24 Use Type Inference, Functions, Anonymous Function, and Class
  • 2.25 Demo Use Type Inference, Functions, Anonymous Function and Class
  • 2.26 Traits as Interfaces
  • 2.27 Traits-Example
  • 2.28 Collections
  • 2.29 Types of Collections
  • 2.30 Types of Collections (contd.)
  • 2.31 Lists
  • 2.32 Perform Operations on Lists
  • 2.33 Demo Use Data Structures
  • 2.34 Maps
  • 2.35 Maps-Operations
  • 2.36 Pattern Matching
  • 2.37 Implicits
  • 2.38 Implicits (contd.)
  • 2.39 Streams
  • 2.40 Use Data Structures
  • 2.41 Demo Perform Operations on Lists
  • 2.42 Quiz
  • 2.43 Summary
  • 2.44 Summary (contd.)
  • 2.45 Conclusion
  • 3.1 Introduction
  • 3.2 Objectives
  • 3.3 RDDs API
  • 3.4 Features of RDDs
  • 3.5 Creating RDDs
  • 3.6 Creating RDDs—Referencing an External Dataset
  • 3.7 Referencing an External Dataset—Text Files
  • 3.8 Referencing an External Dataset—Text Files (contd.)
  • 3.9 Referencing an External Dataset—Sequence Files
  • 3.10 Referencing an External Dataset—Other Hadoop Input Formats
  • 3.11 Creating RDDs—Important Points
  • 3.12 RDD Operations
  • 3.13 RDD Operations—Transformations
  • 3.14 Features of RDD Persistence
  • 3.15 Storage Levels Of RDD Persistence
  • 3.16 Choosing The Correct RDD Persistence Storage Level
  • 3.17 Invoking the Spark Shell
  • 3.18 Importing Spark Classes
  • 3.19 Creating the SparkContext
  • 3.20 Loading a File in Shell
  • 3.21 Performing Some Basic Operations on Files in Spark Shell RDDs
  • 3.22 Packaging a Spark Project with SBT
  • 3.23 Running a Spark Project With SBT
  • 3.24 Demo-Build a Scala Project
  • 3.25 Build a Scala Project
  • 3.26 Demo-Build a Spark Java Project
  • 3.27 Build a Spark Java Project
  • 3.28 Shared Variables—Broadcast
  • 3.29 Shared Variables—Accumulators
  • 3.30 Writing a Scala Application
  • 3.31 Demo-Run a Scala Application
  • 3.32 Run a Scala Application
  • 3.33 Demo-Write a Scala Application Reading the Hadoop Data
  • 3.34 Write a Scala Application Reading the Hadoop Data
  • 3.35 Demo-Run a Scala Application Reading the Hadoop Data
  • 3.36 Run a Scala Application Reading the Hadoop Data
  • 3.37 Scala RDD Extensions
  • 3.38 DoubleRDD Methods
  • 3.39 PairRDD Methods—Join
  • 3.40 PairRDD Methods—Others
  • 3.41 Java PairRDD Methods
  • 3.42 Java PairRDD Methods (contd.)
  • 3.43 General RDD Methods
  • 3.44 General RDD Methods (contd.)
  • 3.45 Java RDD Methods
  • 3.46 Java RDD Methods (contd.)
  • 3.47 Common Java RDD Methods
  • 3.48 Spark Java Function Classes
  • 3.49 Method for Combining JavaPairRDD Functions
  • 3.50 Transformations in RDD
  • 3.51 Other Methods
  • 3.52 Actions in RDD
  • 3.53 Key-Value Pair RDD in Scala
  • 3.54 Key-Value Pair RDD in Java
  • 3.55 Using MapReduce and Pair RDD Operations
  • 3.56 Reading Text File from HDFS
  • 3.57 Reading Sequence File from HDFS
  • 3.58 Writing Text Data to HDFS
  • 3.59 Writing Sequence File to HDFS
  • 3.60 Using GroupBy
  • 3.61 Using GroupBy (contd.)
  • 3.62 Demo-Run a Scala Application Performing GroupBy Operation
  • 3.63 Run a Scala Application Performing GroupBy Operation
  • 3.64 Demo-Run a Scala Application Using the Scala Shell
  • 3.65 Run a Scala Application Using the Scala Shell
  • 3.66 Demo-Write and Run a Java Application
  • 3.67 Write and Run a Java Application
  • 3.68 Quiz
  • 3.69 Summary
  • 3.70 Summary (contd.)
  • 3.71 Conclusion
  • 4.1 Introduction
  • 4.2 Objectives
  • 4.3 Importance of Spark SQL
  • 4.4 Benefits of Spark SQL
  • 4.5 DataFrames
  • 4.6 SQLContext
  • 4.7 SQLContext (contd.)
  • 4.8 Creating a DataFrame
  • 4.9 Using DataFrame Operations
  • 4.10 Using DataFrame Operations (contd.)
  • 4.11 Demo-Run SparkSQL with a Dataframe
  • 4.12 Run SparkSQL with a Dataframe
  • 4.13 Interoperating with RDDs
  • 4.14 Using the Reflection-Based Approach
  • 4.15 Using the Reflection-Based Approach (contd.)
  • 4.16 Using the Programmatic Approach
  • 4.17 Using the Programmatic Approach (contd.)
  • 4.18 Demo-Run Spark SQL Programmatically
  • 4.19 Run Spark SQL Programmatically
  • 4.20 Data Sources
  • 4.21 Save Modes
  • 4.22 Saving to Persistent Tables
  • 4.23 Parquet Files
  • 4.24 Partition Discovery
  • 4.25 Schema Merging
  • 4.26 JSON Data
  • 4.27 Hive Table
  • 4.28 DML Operation-Hive Queries
  • 4.29 Demo-Run Hive Queries Using Spark SQL
  • 4.30 Run Hive Queries Using Spark SQL
  • 4.31 JDBC to Other Databases
  • 4.32 Supported Hive Features
  • 4.33 Supported Hive Features (contd.)
  • 4.34 Supported Hive Data Types
  • 4.35 Case Classes
  • 4.36 Case Classes (contd.)
  • 4.37 Quiz
  • 4.38 Summary
  • 4.39 Summary (contd.)
  • 4.40 Conclusion
  • 5.1 Introduction
  • 5.2 Objectives
  • 5.3 Introduction to Spark Streaming
  • 5.4 Working of Spark Streaming
  • 5.5 Features of Spark Streaming
  • 5.6 Streaming Word Count
  • 5.7 Micro Batch
  • 5.8 DStreams
  • 5.9 DStreams (contd.)
  • 5.10 Input DStreams and Receivers
  • 5.11 Input DStreams and Receivers (contd.)
  • 5.12 Basic Sources
  • 5.13 Advanced Sources
  • 5.14 Advanced Sources-Twitter
  • 5.15 Transformations on DStreams
  • 5.16 Transformations on Dstreams (contd.)
  • 5.17 Output Operations on DStreams
  • 5.18 Design Patterns for Using ForeachRDD
  • 5.19 DataFrame and SQL Operations
  • 5.20 DataFrame and SQL Operations (contd.)
  • 5.21 Checkpointing
  • 5.22 Enabling Checkpointing
  • 5.23 Socket Stream
  • 5.24 File Stream
  • 5.25 Stateful Operations
  • 5.26 Window Operations
  • 5.27 Types of Window Operations
  • 5.28 Types of Window Operations Types (contd.)
  • 5.29 Join Operations-Stream-Dataset Joins
  • 5.30 Join Operations-Stream-Stream Joins
  • 5.31 Monitoring Spark Streaming Application
  • 5.32 Performance Tuning-High Level
  • 5.33 Performance Tuning-Detail Level
  • 5.34 Demo-Capture and Process the Netcat Data
  • 5.35 Capture and Process the Netcat Data
  • 5.36 Demo-Capture and Process the Flume Data
  • 5.37 Capture and Process the Flume Data
  • 5.38 Demo-Capture the Twitter Data
  • 5.39 Capture the Twitter Data
  • 5.40 Quiz
  • 5.41 Summary
  • 5.42 Summary (contd.)
  • 5.43 Conclusion
  • 6.1 Introduction
  • 6.2 Objectives
  • 6.3 Introduction to Machine Learning
  • 6.4 Common Terminologies in Machine Learning
  • 6.5 Applications of Machine Learning
  • 6.6 Machine Learning in Spark
  • 6.7 Spark ML API
  • 6.8 DataFrames
  • 6.9 Transformers and Estimators
  • 6.10 Pipeline
  • 6.11 Working of a Pipeline
  • 6.12 Working of a Pipeline (contd.)
  • 6.13 DAG Pipelines
  • 6.14 Runtime Checking
  • 6.15 Parameter Passing
  • 6.16 General Machine Learning Pipeline-Example
  • 6.17 General Machine Learning Pipeline-Example (contd.)
  • 6.18 Model Selection via Cross-Validation
  • 6.19 Supported Types, Algorithms, and Utilities
  • 6.20 Data Types
  • 6.21 Feature Extraction and Basic Statistics
  • 6.22 Clustering
  • 6.23 K-Means
  • 6.24 K-Means (contd.)
  • 6.25 Demo-Perform Clustering Using K-Means
  • 6.26 Perform Clustering Using K-Means
  • 6.27 Gaussian Mixture
  • 6.28 Power Iteration Clustering (PIC)
  • 6.29 Latent Dirichlet Allocation (LDA)
  • 6.30 Latent Dirichlet Allocation (LDA) (contd.)
  • 6.31 Collaborative Filtering
  • 6.32 Classification
  • 6.33 Classification (contd.)
  • 6.34 Regression
  • 6.35 Example of Regression
  • 6.36 Demo-Perform Classification Using Linear Regression
  • 6.37 Perform Classification Using Linear Regression
  • 6.38 Demo-Run Linear Regression
  • 6.39 Run Linear Regression
  • 6.40 Demo-Perform Recommendation Using Collaborative Filtering
  • 6.41 Perform Recommendation Using Collaborative Filtering
  • 6.42 Demo-Run Recommendation System
  • 6.43 Run Recommendation System
  • 6.44 Quiz
  • 6.45 Summary
  • 6.46 Summary (contd.)
  • 6.47 Conclusion
  • 7.001 Introduction
  • 7.002 Objectives
  • 7.003 Introduction to Graph-Parallel System
  • 7.004 Limitations of Graph-Parallel System
  • 7.005 Introduction to GraphX
  • 7.006 Introduction to GraphX (contd.)
  • 7.007 Importing GraphX
  • 7.008 The Property Graph
  • 7.009 The Property Graph (contd.)
  • 7.010 Features of the Property Graph
  • 7.011 Creating a Graph
  • 7.012 Demo-Create a Graph Using GraphX
  • 7.013 Create a Graph Using GraphX
  • 7.014 Triplet View
  • 7.015 Graph Operators
  • 7.016 List of Operators
  • 7.017 List of Operators (contd.)
  • 7.018 Property Operators
  • 7.019 Structural Operators
  • 7.020 Subgraphs
  • 7.021 Join Operators
  • 7.022 Demo-Perform Graph Operations Using GraphX
  • 7.023 Perform Graph Operations Using GraphX
  • 7.024 Demo-Perform Subgraph Operations
  • 7.025 Perform Subgraph Operations
  • 7.026 Neighborhood Aggregation
  • 7.027 mapReduceTriplets
  • 7.028 Demo-Perform MapReduce Operations
  • 7.029 Perform MapReduce Operations
  • 7.030 Counting Degree of Vertex
  • 7.031 Collecting Neighbors
  • 7.032 Caching and Uncaching
  • 7.033 Graph Builders
  • 7.034 Vertex and Edge RDDs
  • 7.035 Graph System Optimizations
  • 7.036 Built-in Algorithms
  • 7.037 Quiz
  • 7.038 Summary
  • 7.039 Summary (contd.)
  • 7.040 Conclusion