Module 1: Introduction to Hadoop
Q1) Hadoop is designed for Online Transactional Processing. True or False?
- True
- False
Q2) When is Hadoop useful for an application?
- When all of the application data is unstructured
- When work can be parallelized
- When the application requires low latency data access
- When random data access is required
Q3) With the help of InfoSphere Streams, Hadoop can be used with data-at-rest as well as data-in-motion. True or false?
- True
- False
Module 2: Hadoop Architecture & HDFS
Q1) Network bandwidth between any two nodes in the same rack is greater than bandwidth between two nodes on different racks. True or False?
- True
- False
Q2) Hadoop works best on a large data set. True or False?
- True
- False
Q3) HDFS is a fully POSIX compliant file system. True or False?
- True
- False
Module 3: Hadoop Administration
Q1) You can add or remove nodes from the open source Apache Ambari console. True or False?
- True
- False
Q2) It is recommended that you start all of the services in Ambari in order to speed up communications. True or False?
- True
- False
Q3) To remove a node using Ambari, you must first remove all of the services using that node. True or False?
- True
- False
Module 4: Hadoop Components
Q1) The output of the shuffle operation goes into the mapper before going into the reducer. True or False?
- True
- False
Q2) What is true about Pig and Hive in relation to the Hadoop ecosystem?
- HiveQL requires that you create the data flow
- PigLatin requires that the data have a schema
- Fewer lines of code are required compared to a Java program correct
- All of the above
Q3) Which of the following tools is designed to move data to and from a relational database?
- Pig
- Flume
- Oozie
- Sqoop
Final Exam Answers HADOOP Certification by IBM
1. HDFS is designed for:
- Large files, streaming data access, and commodity hardware
- Large files, low latency data access, and commodity hardware
- Large files, streaming data access, and high-end hardware
- Small files, streaming data access, and commodity hardware
- None of the options is correct
2. The Hadoop distributed file system (HDFS) is the only distributed file system supported by Hadoop. True or false?
- True
- False
3. The input to a mapper takes the form < k1, v1 > . What form does the mapper’s output take?
- < list(k2), v2 >
- list( < k2, v2 > )
- < k2, list(v2) >
- < k1, v1 >
- None of the options is correct
4. What is Flume?
- A service for moving large amounts of data around a cluster soon after the data is produced.
- A distributed file system.
- A programming language that translates high-level queries into map tasks and reduce tasks.
- A platform for executing MapReduce jobs.
- None of the options is correct
5. What is the purpose of the shuffle operation in Hadoop MapReduce?
- To pre-sort the data before it enters each mapper node.
- To distribute input splits among mapper nodes.
- To transfer each mapper’s output to the appropriate reducer node based on a partitioning function.
- To randomly distribute mapper output among reducer nodes.
- None of the options is correct
6. Which of the following is a duty of the DataNodes in HDFS?
- Control the execution of an individual map task or a reduce task.
- Maintain the file system tree and metadata for all files and directories.
- Manage the file system namespace.
- Store and retrieve blocks when told to by clients or the NameNode.
- None of the options is correct
7. Which of the following is a duty of the NameNode in HDFS?
- Control the MapReduce job from end-to-end
- Maintain the file system tree and metadata for all files and directories
- Store the block data
- Transfer block data from the data nodes to the clients
- None of the options is correct
8. Which component determines the specific nodes that a MapReduce task will run on?
- The NameNode
- The JobTracker
- The TaskTrackers
- The JobClient
- None of the options is correct
9). Which of the following characteristics is common to Pig, Hive, and Jaql?
- All translate high-level languages to MapReduce jobs
- All operate on JSON data structures
- All are data flow languages
- All support random reads/writes
None of the options is correct
10. Which of the following is NOT an open source project related to Hadoop?
- Pig
- UIMA
- Jackal
- Avro
- Lucene
11. During the replication process, a block of data is written to all specified DataNodes in parallel. True or false?
- True
- False
12. With IBM BigInsights, Hadoop components can be started and stopped from a command line and from the Ambari Console. True or false?
- True
- False
13. When loading data into HDFS, data is held at the NameNode until the block is filled and then the data is sent to a DataNode. True or false?
- True
- False
14. Which of the following is true about the Hadoop federation?
- Uses JournalNodes to decide the active NameNode
- Allows non-Hadoop programs to access data in HDFS
- Allows multiple NameNodes with their own namespaces to share a pool of DataNodes
- Implements a resource manager external to all Hadoop frameworks
15. Which of the following is true about Hadoop high availability?
- Uses JournalNodes to decide the active NameNode
- Allows non-Hadoop programs to access data in HDFS
- Allows multiple NameNodes with their own namespaces to share a pool of DataNodes
- Implements a resource manager external to all Hadoop frameworks
16. Which of the following is true about YARN?
- Uses JournalNodes to decide the active NameNode
- Allows non-Hadoop programs to access data in HDFS
- Allows multiple NameNodes with their own namespaces to share a pool of DataNodes
- Implements a resource manager external to all Hadoop frameworks
17. Which of the following sentences is true?
- Hadoop is good for OLTP, DSS, and big data
- Hadoop includes open source components and closed source components
- Hadoop is a new technology designed to replace relational databases
- All of the options are correct
- None of the options is correct
18. In which of these scenarios should Hadoop be used?
- Processing billions of email messages to perform text analytics correct
- Obtaining stock price trends on a per-minute basis
- Processing weather sensor information to predict a hurricane path
- Analyzing vital signs of a baby in real time
- None of the options is correct
COURSE SYLLABUS
Module 1 – Introduction to Hadoop
- Understand what Hadoop is
- Understand what Big Data is
- Learn about other open source software related to Hadoop
- Understand how Big Data solutions can work on the Cloud
Module 2 – Hadoop Architecture
- Understand the main Hadoop components
- Learn how HDFS works
- List data access patterns for which HDFS is designed
- Describe how data is stored in an HDFS cluster
Module 3 – Hadoop Administration
- Add and remove nodes from a cluster
- Verify the health of a clusterStart and stop a clusters components
- Modify Hadoop configuration parameters
- Setup a rack topology
Module 4 – Hadoop Components
- Describe the MapReduce philosophy
- Explain how Pig and Hive can be used in a Hadoop environment
- Describe how Flume and Sqoop can be used to move data into Hadoop
- Describe how Oozie is used to schedule and control Hadoop job execution