Spark Fundamentals II Cognitive Class Exam Quiz Answers

Clear My Certification January 12, 2024 Cognitive Class Leave a comment 1,377 Views

Enroll Here: Spark Fundamentals II Cognitive Class Exam Quiz Answers

Spark Fundamentals II Cognitive Class Certification Answers

Module 1: Introduction to Notebooks Quiz Answers – Cognitive Class

Question 1: Which of the following statements about Zeppelin Notebook is NOT true?

Zeppelin is open-source.
With Zeppelin, you can run code and create visualizations through a web interface.
Zeppelin comes configured with Scala, Spark, and Julia.
Zeppelin is an interactive data analytics tool started by NFLabs.

Question 2: Jupyter Notebook and Data Scientist Workbench are both open-source projects. True or false?

False
True

Question 3: Which notebook will you use in the lab section of this course?

DataBrick
Zeppelin
Watson Studio
Jupyter Notebook

Module 2: RDD Architecture Quiz Answers – Cognitive Class

Question 1: Which of the following statements is NOT true?

Partitioning is what enables parallel execution of Spark jobs.
An RDD is made up of multiple partitions.
Spark normally determines the number of partitions based on the size of the hard drives in your cluster.
Spark is able to read from many different data stores in addition to HDFS, including the local file system and cloud services like Cloudant, AWS, Google, and Azure.

Question 2: In the example of an RDD with 3 partitions and no partitioner, which of the following statements is true?

It is better not to partition an RDD if you need to join it multiple times.
Joining RDDs with no partitioner will cause each executor to shuffle all values with the same key to a single machine
Repeatedly joining on the same RDD is highly efficient.
The keys are co-located.

Question 3: Speculative execution handles slow tasks by re-launching them as necessary. True or false?

False
True

Module 3: Optimizing Transformations and Actions Quiz Answers – Cognitive Class

Question 1: Which of the following statements is true?

MapValues applies a map function to each value and performs repartitioning.
GroupByKey groups all values by key from all partitions into memory.
GroupByKey shuffles everything and it operates efficiently on large datasets.
CountByKey is designed to be used in production.

Question 2: AggregateByKey is better than GroupByKey when we want to calculate the average value for each key in an RDD. True or false?

False
True

Question 3: Which of the following statements is NOT true?

MapValues tells Spark that the hashed keys will remain in their partitions and we can keep the same partitioner across operations
In the example of a pair RDD with 2 partitions, running a map operation over all records will leave the keys of each record unchanged.
AggregateByKey splits the calculation into two steps. Only one pair per key, per partition is shuffled.
GroupByKey causes a shuffle of all values across the network, even if they are already co-located within a partition.

Module 4: Caching and Serialization Quiz Answers – Cognitive Class

Question 1: Which of the following statements is true?

When you no longer need the persisted RDD, Spark will automatically make room for new RDDs.
Persisting to disk would allow us to reconstitute the RDD in the event a partition is lost, instead of re-computing all the expensive operations for the lost partitions.
Ideally we want to persist before any pruning, filtering, or other transformations needed for downstream processing.
Persisting RDDs can help us save time re-computing partitions, and persistence is in-memory only.

Question 2: Which of the following statements is NOT true?

Serialization has the added benefit of helping with garbage collection, as you’ll be storing 1 object versus many small objects for each record.
The records of an RDD will be stored as one large byte array.
There is almost no CPU usage to deserialize the data.
Serialization helps by saving space that persisting RDDs occupy in memory.

Question 3: The Java serializer can store the entire RDD in less space than the original file. True or false?

False
True

Module 5: Develop and Testing Quiz Answers – Cognitive Class

Question 1: Which of the following statements is true?

We cannot use sbt for an Eclipse project.
We cannot create builds directly from the console using sbt.
sbt automatically finds source and library files using a conventional directory structure.
Maven is more powerful and customizable than sbt.

Question 2: IntelliJ fully supports sbt build files with no conversions required. True or false?

False
True

Question 3: Which of the following statements is NOT correct during unit testing?

The spark-testing-base package is handy for testing.
We want to test the code that is actually used in our application.
We should not use unit testing tools like scalatest.
We should put transformations for a given RDD in its own object or class.

Spark Fundamentals II Final Exam Answers – Cognitive Class

Question 1: Which of the following web-based notebooks is built around Jupyter and iPython?

Data Scientist Workbench
Spark Notebook
Databricks Cloud
Zeppelin

Question 2: What defines a stage boundary?

Repartition
Action
Transformation
Shuffle dependency

Question 3: What does RDD stand for?

Resilient Distributed Dataset
Reusable Distributed Dataset
Reusable Data Directory
None of the above

Question 4: Coalesce can reduce the number of partitions without causing a shuffle. True or false?

True
False

Question 5: Which operation should you use to map the values in a pair RDD without affecting the keys or partitions?

map
mapValues
map or mapValues
You cannot map a pair RDD without affecting the keys or partitions.

Question 6: How can you view the lineage of an RDD?

showLineage()
toDebugString()
printHistory()
printGraph()

Question 7: Adding a key to an RDD will automatically repartition it so that the keys are co-located. True or false?

True
False

Question 8: How can you reference an external class in a closure without serializing it?

define it as transient
define it as lazy
Both of the above
None of the above

Question 9: What does Spark do during speculative execution?

Spark looks for tasks it expects to be short and runs them first
Spark dynamically allocates more resources to large tasks
Spark identifies slow-running tasks and restarts them
None of the above

Question 10: What does the following code do?

val text = sc.textFile(“SomeText.txt”)

val counts = text.flatMap(_.split(” “)).map((_, 1)).reduceByKey(_ + _).collectAsMap()

Counts the total number of words in the document
Counts the number of distinct words in the document
Maps every word in the document to the number of times it occurs
None of the above

Question 11: Which operation has the highest chance of causing out-of-memory errors if the dataset is really large?

countByValue
groupByKey
reduceByKey
map

Question 12: What is the result of this code?

val pairs = sc.parallelize(List((“a”, 1), (“a”, 5), (“b”, 6), (“b”, 3), (“c”, 2)))

val results = pairs.reduceByKey((a, b) => {

a > b match {

case true => a

case false => b

}

}).collectAsMap()

(“a” -> 5, “b “ -> 6, “c” -> 2)
(“a” -> 6, “b “ -> 9, “c” -> 2)
(5, 6, 2)
None of the above.

Question 13: You can execute asynchronous actions with the default FIFO scheduler. True or false?

True
False

Question 14: Which of the following statements about broadcast variables is true?

They are read-only
They can eliminate shuffles
They are shared between workers via the peer-to-peer protocol
All of the above
None of the above

Question 15: With the MEMORY_ONLY storage level, what happens when an RDD can’t fit in memory?

Spark will automatically change the storage level to MEMORY_AND_DISK
Some of the partitions will not be cached
Some of the partitions will be spilled to disk
Spark will throw an OOM error
None of the above

Question 16: How can you reduce the amount of memory used by persisted RDDs?

Use primitive types instead of Java or Scala collections and nested classes
Enable compression
Use Kryo serialization instead of Java
All of the above
None of the above

Question 17: Which point in an RDD lineage is the best to persist?

Before a reduceByKey operation
After outputting to disk
After a lot of transformations for downstream computations, such as filtering or joining
At the root RDD
None of the above

Question 18: A pool can have its own scheduler. True or false?

True
False

Question 19: In the event of a failure, how can Spark recover a lost partition?

Find the last good state in the RDD lineage and recompute the lost partition.
Restart from the root RDD
Find the last good state in the RDD lineage and recompute every task.
Spark’s fail-safes ensure that failures will never occur.
None of the above.

Question 20: Which of the following IDEs fully supports SBT?

Eclipse
IntelliJ
Both Eclipse and IntelliJ
None of the above

Introduction to Spark Fundamentals II

“Spark Fundamentals II” builds upon the foundational knowledge introduced in the first part, delving deeper into Apache Spark’s capabilities and features. Here’s an overview of what you might expect to learn in this continuation:

Advanced RDD Operations: Building upon the basic RDD operations like transformations and actions, this module covers more sophisticated operations such as mapPartitions, flatMap, reduceByKey, groupByKey, and sortByKey. Understanding these operations is crucial for optimizing data processing workflows in Spark.
Pair RDDs and Key-Value Operations: In many real-world scenarios, data is organized as key-value pairs. Spark provides specialized operations for working with such data structures, allowing efficient aggregation, grouping, and manipulation. Students will learn about operations like reduceByKey, groupByKey, mapValues, flatMapValues, and more.
Broadcast Variables and Accumulators: These are advanced features that facilitate efficient and distributed computation in Spark. Broadcast variables allow efficient data sharing across all nodes in the Spark cluster, while accumulators enable aggregating values from worker nodes back to the driver program.
DataFrame API: DataFrames offer a higher-level abstraction than RDDs, providing a more intuitive and optimized interface for working with structured data. Students will learn how to create DataFrames from various data sources, perform transformations, execute SQL queries, and leverage built-in optimizations for better performance.
Spark SQL: Spark SQL allows seamless integration of SQL queries with Spark programs. This module covers how to execute SQL queries directly on DataFrames and RDDs, enabling users to leverage their SQL skills for data analysis and processing in Spark.
Performance Tuning and Optimization: As data volumes grow, optimizing Spark jobs becomes crucial for maintaining performance. This module covers techniques for optimizing Spark applications, including partitioning, caching, and tuning execution settings.
Introduction to Spark Streaming: Spark Streaming enables real-time processing of streaming data using the same programming model as batch processing. Students will learn how to create streaming applications, process data in micro-batches, and integrate with various streaming sources like Kafka, Flume, and more.
Integration with External Systems: Spark can seamlessly integrate with various external systems for data ingestion, storage, and processing. This module covers how to interact with databases, cloud storage services, message queues, and other external systems using Spark.
Machine Learning with Spark MLlib: Spark MLlib provides scalable machine learning algorithms for data analysis and modeling. Students will learn how to perform common machine learning tasks such as classification, regression, clustering, and collaborative filtering using Spark MLlib.
Graph Processing with GraphX: GraphX is Spark’s API for graph processing and analytics. This module covers graph construction, manipulation, and analysis using GraphX, enabling students to analyze large-scale graph data efficiently.

Priya Dogra – Certification | Jobs | Internships

Spark Fundamentals II Cognitive Class Exam Quiz Answers

Related Articles

Enroll Here: Spark Fundamentals II Cognitive Class Exam Quiz Answers

Spark Fundamentals II Cognitive Class Certification Answers

Module 1: Introduction to Notebooks Quiz Answers – Cognitive Class

Module 2: RDD Architecture Quiz Answers – Cognitive Class

Module 3: Optimizing Transformations and Actions Quiz Answers – Cognitive Class

Module 4: Caching and Serialization Quiz Answers – Cognitive Class

Module 5: Develop and Testing Quiz Answers – Cognitive Class

Spark Fundamentals II Final Exam Answers – Cognitive Class

Introduction to Spark Fundamentals II

About Clear My Certification

Check Also

Controlling Hadoop Jobs using Oozie Cognitive Class Exam Quiz Answers

Leave a Reply Cancel reply

Machine Learning A-Z™: Hands-On Python & R In Data Science Udemy 100% OFF Coupon Code

Latest Off Page SEO Techniques 2024 | How to Rank your Website in Search Engine

Download Video Marketing Blaster Pro 1.49 Free

Six Sigma Black Belt Certification Answers – GreyCampus

Metaverse Free Certification | Metaverse Quiz Questions and Answers

Data Management & Analytics Internship for Students | Apply before 3 Aug 2022

Indian Army Engineering Recruitment 2020 |Job Openings for Engineers

Niti Aayog Launched Zero Pollution Mobility Campaign Quiz | Win Rs 5000 Cash Prize | Free Government Certificate

Chief Minister Fellowship 2023 | Stipend Upto Rs. 75,000 Per Month, Any Graduate can Apply

Lean Foundations Professional Certification™ (LFPC™) SkillFront Exam Answers

Field Sales Trainee Hiring by Swiggy | Swiggy Jobs | Swiggy Internships

IBM SkillsBuild Training Program | Google Career Certificate Scholarship Program

Infosys Springboard Fundamentals of Information Security Free Certification Program

Infosys Springboard Fundamentals of Information Security Answers

Amazon Work From Home Job | Customer Service Jobs