Module 1: Introduction to MapReduce and YARN
Question 1 : Which phase of MapReduce is optional?
Question 2: Which node is responsible for assigning (key, value) pairs to different reducers?
- Shuffle node
- Reducer node
- Combiner node
- Mapper node
Question 3: Where are the output files of the Reducer task stored?
- A data warehouse
- Hadoop FS
- Within the Reducer node
- Linux FS
Module 2: Limitations of Hadoop v1 & MapReduce v1
Question 1 : What is an issue or limitation of the original MapReduce v1 paradigm?
- It’s not scalable
- It only has one TaskTracker
- It only supports Parquet file types
- It only has one JobTracker
Question 2: How is YARN an improvement over the MapReduce v1 paradigm?
- It’s completely open source
- It splits the JobTracker into two processes: ResourceManager and ApplicationManager
- It reduces multi-tenancy to improve performance
- It splits the TaskTracker into two processes: ResourceManager and ApplicationManager
Question 3: Existing applications can run on YARN without recompilation. True or False?
Module 3: The Architecture of YARN
Question 1 : The main change from Hadoop v1 to Hadoop v2 was the consolidation of both resource management and job processing. True or False?
Question 2: The NodeManager is a more generic and efficient version of the TaskTracker. True or False?
Question 3: A new ApplicationMaster is launched for each job and ends when the job completes. True or False?
Final Exam :
Question 1: Which of the following is the correct sequence of MapReduce flow?
- Reduce —> Combine —> Map
- Combine —> Reduce —> Map
- Map —> Reduce —> Combine
- Map —> Combine —> Reduce
Question 2 : Which of the following can be used to control the number of part files in a MapReduce program’s output directory?
- Shuffle parameters
- Number of Reducers
- Number of Mappers
- Duplicate of ‘Question 2’
Question 3: Which of the following operations will work improperly when using a Combiner?
Question 4 : Which of the following is true about MapReduce?
- Compression of input files is optional.
- Output from the Map phase is replicated.
- The programmer must write the Map code, the Shuffle code, and the Reduce code.
- MapReduce programs must be written in Java.
Question 5 : Input data to MapReduce is record-oriented and blocks of data contain the same number of full records. True or False?
Question 6 :Which statement is true about the Reduce phase of MapReduce?
- Output results are sent to the client program.
- Data arrives from the Shuffle phase already sorted by key.
- The Reducer phase sums up the values associated with each key.
- Each Reduce task processes all the data for one key only.
Question 7:Which statement is true about the Reduce phase of MapReduce?
- Containers are used instead of slots in MRv1, and can be used with either Map or Reduce tasks in MRv2.
- There is one JobTracker in the cluster.
- MapReduce jobs written in Java for MRv1 never require recompilation.
- Each job has an ApplicationManager that obtains Container IDs from the NodeManager.
Question 8: With YARN, long-running jobs acquire and retain fixed-size containers before execution starts. True or False?
Question 9: Which of the following statements is true?
- The NameNode in Hadoop 2 is fully fault-tolerant, whereas in Hadoop 1 it was a single point of failure.
- The NodeManager in Hadoop 2 replaces the TaskTracker in Hadoop 1.
- YARN requires a minimum of two nodes, one master and one slave, to run
- Both MapReduce and YARN can scale to any cluster size.
Question 10: The command provides the CLASSPATH needed for compiling Java programs written for MapReduce or YARN. True or False?
Question 11: Which statement is true about MapReduce’s use of replication in HDFS?
- Only one copy of each replicated block is processed by MapReduce in normal operation.
- Speculative execution is normally performed on all copies of each “split.”
- Each DataNode uses RAID to store its data.
- Multiple copies of each record are kept on each node.
Question 12: On which file system (FS) is the output of a Mapper task stored?
- Linux FS, and it is replicated 3 times.
- HDFS, and it is replicated 3 times.
- Linux FS, but it is not replicated.
- HDFS, but it is not replicated.
Question 13 : Which of the following statements is true?
- You can set the number of Reducers.
- The Shuffle phase is optional.
- You can set the number of Mappers and the number of Reducers.
- The number of Combiners is the same as the number of Reducers.
- You can set the number of Mappers.
Question 14 : What will a Hadoop job do if you try to run it with an output directory that is already present?
- It will create new files, but with a different suffix.
- It will create another directory to store the output.
- It will erase all files in that directory before running.
- It will not run.
Question 15 :What are the main components of the ResourceManager in YARN? Select two.