Thursday , March 28 2024
Breaking News

Spark Fundamentals I Cognitive Class Answers

Enroll Here: Spark Fundamentals I IBM Certification

Module 1: Introduction to Spark

Question 1: What gives Spark its speed advantage for complex applications?

  • Spark extends the MapReduce model
  • Various libraries provide Spark with additional functionality
  • Spark can cover a wide range of workloads under one system
  • Spark makes extensive use of in-memory computations
  • All of the above

Question 2: For what purpose would an Engineer use Spark? Select all that apply.

  • Analyzing data to obtain insights
  • Programming with Spark’s API
  • Transforming data into a useable form for analysis
  • Developing a data processing system
  • Tuning an application for a business use case

Question 3: Which of the following statements are true of the Resilient Distributed Dataset (RDD)? Select all that apply.

  • There are three types of RDD operations.
  • RDDs allow Spark to reconstruct transformations
  • RDDs only add a small amount of code due to tight integration
  • RDD action operations do not return a value
  • RDD is a distributed collection of elements parallelized across the cluster.

Module 2: Resilient Distributed Dataset and DataFrames

Question 1: Module 2: Resilient Distributed Dataset and DataFrames

Which of the following methods can be used to create a Resilient Distributed Dataset (RDD)? Select all that apply.

  • Creating a directed acyclic graph (DAG)
  • Parallelizing an existing Spark collection
  • Referencing a Hadoop-supported dataset
  • Using data that resides in Spark
  • Transforming an existing RDD to form a new one

Question 2: What happens when an action is executed?

  • Executors prepare the data for operation in parallel
  • The driver sends code to be executed on each block
  • A cache is created for storing partial results in memory
  • Data is partitioned into different blocks across the cluster
  • All of the above

Question 3: Which of the following statements is true of RDD persistence? Select all that apply.

  • Persistence through caching provides fault tolerance
  • Future actions can be performed significantly faster
  • Each partition is replicated on two cluster nodes
  • RDD persistence always improves space efficiency
  • By default, objects that are too big for memory are stored on the disk

Module 3: Spark application programming

  • Question 1: What is SparkContext?
  • An object that represents the connection to a Spark cluster
  • A tool for linking to nodes
  • A tool that provides fault tolerance
  • The built-in shell for the Spark engine
  • A programming language for applications

Question 2: Which of the following methods can be used to pass functions to Spark? Select all that apply.

  • Transformations and actions
  • Passing by reference
  • Static methods in a global singleton
  • Import statements
  • Anonymous function syntax

Question 3: Which of the following is a main component of a Spark application’s source code?

  • SparkContext object
  • Transformations and actions
  • Business Logic
  • Import statements
  • All of the above

Module 4: Introduction to the Spark libraries

Question 1: Which of the following is NOT an example of a Spark library?

  • Hive
  • MLlib
  • Spark Streaming
  • Spark SQL
  • GraphX

Question 2: From which of the following sources can Spark Streaming receive data? Select all that apply.

  • Kafka
  • JSON
  • Parquet
  • HDFS
  • Hive

Question 3: In Spark Streaming, processing begins immediately when an element of the application is executed. True or false?

  • True
  • False

Module 5: Spark configuration, monitoring and tuning

Question 1: Which of the following is a main component of a Spark cluster? Select all that apply.

  • Driver Program
  • SparkContext
  • Cluster Manager
  • Worker node
  • Cache

Question 2: What are the main locations for Spark configuration? Select all that apply.

  • The SparkConf object
  • The Spark Shell
  • Executor Processes
  • Environment variables
  • Logging properties

Question 3: Which of the following techniques can improve Spark performance? Select all that apply.

  • Scheduler Configuration
  • Memory Tuning
  • Data Serialization
  • Using Broadcast variables
  • Using nested structures

Final Exam Answers Spark Fundamentals I Cognitive Class

Question 1: Which of the following is a type of Spark RDD operation? Select all that apply.

  • Parallelization
  • Action
  • Persistence
  • Transformation
  • Evaluation

Question 2: Spark must be installed and run on top of a Hadoop cluster. True or false

  • True
  • False

Question 3: Which of the following operations will work improperly when using a Combiner?

  • Count
  • Maximum
  • Minimum
  • Average
  • All of the above operations will work properly

Question 4: Spark supports which of the following libraries?

  • GraphX
  • Spark Streaming
  • MLlib
  • Spark SQL
  • All of the above

Question 5: Spark supports which of the following programming languages?

  • C++ and Python
  • Scala, Java, C++, Python, Perl
  • Scala, Perl, Java
  • Scala, Python, Java, R
  • Java and Scala

Question 6: A transformation is evaluated immediately. True or false?

  • True
  • False

Question 7: Which storage level does the cache() function use?

  • MEMORY_AND_DISK_SER
  • MEMORY_AND_DISK
  • MEMORY_ONLY_SER
  • MEMORY_ONLY

Question 8: Which of the following statements does NOT describe accumulators?

  • They can only be read by the driver
  • Programmers can extend them beyond numeric types
  • They implement counters and sums
  • They can only be added through an associative operation
  • They are read-only

Question 9: You must explicitly initialize the SparkContext when creating a Spark application. True or false?

  • True
  • False

Question 10: The “local” parameter can be used to specify the number of cores to use for the application. True or false?

  • True
  • False

Question 11: Spark applications can ONLY be packaged using one, specific build tool. True or false?

  • True
  • False

Question 12: Which of the following parameters of the “spark-submit” script determine where the application will run?

  • –class
  • –master
  • –deploy-mode
  • –conf
  • None of the above

Question 13: Which of the following is NOT supported as a cluster manager?

  • YARN
  • Helix
  • Mesos
  • Spark
  • All of the above are supported

Question 14: Spark SQL allows relational queries to be expressed in which of the following?

  • HiveQL only
  • Scala, SQL, and HiveQL
  • Scala and SQL
  • Scala and HiveQL
  • SQL only

Question 15: Spark Streaming processes live streaming data in real-time. True or false?

  • True
  • False

Question 16: The MLlib library contains which of the following algorithms?

  • Dimensionality Reduction
  • Regression
  • Classification
  • Clustering
  • All of the above

Question 17: What is the purpose of the GraphX library?

  • To create a visual representation of the data
  • To generate data-parallel models
  • To create a visual representation of a directed acyclic graph (DAG)
  • To perform graph-parallel computations
  • To convert from data-parallel to graph-parallel algorithms

Question 18: Which list describes the correct order of precedence for Spark configuration, from highest to lowest?

  • Properties set on SparkConf, values in spark-defaults.conf, flags passed to spark-submit
  • Flags passed to spark-submit, values in spark-defaults.conf, properties set on SparkConf
  • Values in spark-defaults.conf, properties set on SparkConf, flags passed to spark-submit
  • Values in spark-defaults.conf, flags passed to spark-submit, properties set on SparkConf
  • Properties set on SparkConf, flags passed to spark-submit, values in spark-defaults.conf

Question 19: Spark monitoring can be performed with external tools. True or false?

  • True
  • False

Question 20: Which serialization libraries are supported in Spark? Select all that apply.

  • Apache Avro
  • Java Serialization
  • Protocol Buffers
  • Kyro Serialization
  • TPL

About Clear My Certification

Check Also

Controlling Hadoop Jobs using Oozie Cognitive Class Exam Quiz Answers

Controlling Hadoop Jobs using Oozie Cognitive Class Exam Quiz Answers

Enroll Here: Controlling Hadoop Jobs using Oozie Cognitive Class Exam Quiz Answers Controlling Hadoop Jobs …

No comments

  1. Yash Jitendra Salvi

    Is it possible to recieve exploring Spark’s GraphX course answers?

Leave a Reply

Your email address will not be published. Required fields are marked *