Get an overview of HBase, how to use the HBase API and Clients, its integration with Hadoop.
HBase is the open source Hadoop database used for random, real-time read/writes to your Big Data.
- Learn how to set it up as a source or sink for MapReduce jobs, and details about its architecture and administration, including labs for practice and hands-on learning.
- Learn how HBase runs on a distributed architecture on top of commodity hardware, including practice and hands-on learning of the following features:
- Linear and modular scalability
- Strictly consistent read and writes
- Automatic and configurable sharding of tables
- Automatic failover support between RegionServers
- Easy to use Java API for client access
- Module 1 – Introduction to HBase
- HBase Overview, CAP Theorem and ACID properties
- Roles of HBase and difference between RDBMS
- HBase Shell and Tables
- Module 2 – HBase Client API – The Basics
- Use of Java API for Batch, Scan, and Scan operations
- Module 3 – Client API: Administrative and Advance Features
- Use of administrative operations and schemas
- Use of Filters, Counters, and ImportTSV tool
- Module 4 – Available HBase Clients
- Understand how interactive and batch clients interact with HBase
- Module 5 – HBase and MapReduce Integration
- Understand how MapReduce works in the Hadoop framework
- How to setup HBase as a source and a sink
- Module 6 – HBase Configuration and Administration
- Configuration of HBase for various environmental optimization
- Architecture and administrative tasks
Using HBase for Real-time Access to your Big Data Cognitive Class Exam Answers :
Using HBase for Real-time Access to your Big Data
Module 1: Introduction to HBase
Question 1 :What are some of the key properties of HBase? Select all that apply.
- All HBase data is stored as bytes
- HBase can run up to 1000 queries per second at the most
- HBase is ACID compliant across all rows and tables
- HBase is a NoSQL technology
- HBase is an open source Apache project
Question 2 : Which HBase component is responsible for storing the rows of a table?
Question 3 : What is NOT a characteristic of an HBase table?
- Columns are grouped into column families
- Columns can have multiple timestamps
- Each row must have a unique row key
- NULL columns aren’t supported
- Columns can be added on the fly
Module 2: HBase Client API – The Basics
Question 1 :Which HBase command is used to update existing data in a table?
Question 2 :The batch command allows the user to determine the order of execution. True or false?
Question 3 : Which of the following statements are true of the scan operation? Select all that apply.
- Scanner caching is enabled by default
- The startRow and endRow parameters are both inclusive
- The addColumn() method can be used to restrict a scan
- Scanning is a resource-intensive operation
- Scan operations are used to iterate over HBase tables
Module 3: Client API: Administrative and Advance Features
Question 1 : Which statement about HBase tables is incorrect?
- HColumnDescriptor is used to describe columns, not column families
- A table requires two descriptor classes
- Performance may suffer if a table has more than three column families
- Everything in HBase is stored within tables
- Each table must contain at least one column family
Question 2 :When using a CompareFilter, you must specify what to include as part of the scan, rather than what to exclude. True or false?
Question 3 :What is an example of a Dedicated Filter? Select all that apply.
Module 4: Available HBase Clients
Question 1 :Which statements accurately describe the HBase interactive clients? Select all that apply.
- Thrift is included with Hbase
- Thrift and Avro both support C++
- With REST, data transport is always performed in binary
- Avro has a dynamic schema
- REST needs to be complied before it can run
Question 2 :Unlike an interactive client, a batch client is used to run a large set of operations in the background. True or false?
Question 3 :Which of the following is an example of a batch client?
Module 5: HBase and MapReduce Integration
Question 1 :HBase can act both as a source and a sink of a MapReduce job.
Question 2 : Which HBase class is responsible for splitting the source data?
Question 3 :Which of the following is NOT a component of the MapReduce framework?
- All of the above are part of the MapReduce framework
Module 6: HBase Configuration and Administration
Question 1 :Which of the following statements accurately describe the HBase run modes? Select all that apply.
- The standalone mode is suited for a production environment
- The pseudo-distributed mode is used for performance evaluation
- The standalone mode uses local file systems
- The distributed mode is suited for a production environment
- The distributed mode requires the HDFS
Question 2 :Which is NOT a component of a region server?
Question 3 : What is an example of an operational task? Select all that apply.
- Adding Servers
- Node decommissioning
- Import and export
Question 1 :Which statements accurately describe column families in HBase? Select all that apply.
- You aren’t required to specify any column families when declaring a table
- Each region contains multiple column families
- You typically want no more than two or three column families per table
- Column families have their own compression methods
- Column families can be defined dynamically after table creation
Question 2 :Which of the following is NOT a component of HBase?
- Region Server
Question 3 :Which programming language is supported by Thrift?
- All of the above
Question 4 :Which HBase command is used to retrieve data from a table?
Question 5 : The HBase Shell and the native Java API are the only available tools for interacting with HBase. True or false?
Question 6 :Without this filter, a scan will need to check every file to see if a piece of data exists.
Question 7 :What are the characteristics of the Avro client? Select all that apply.
- Avro is included with HBase
- Data transport is performed in binary
- Avro needs to be compiled before running
- Avro is a batch client
- Avro supports Python and PHP, among others
Question 8 :Deleting an internal table in Hive automatically deletes the corresponding HBase table. True or false?
Question 9 :What is the main purpose of an HBase Counter?
- To count the number of regions
- To increment column values for statistical data collection
- To count the number of region servers
- To count the number of column families
- All of the above
Question 10 :Which file is used to specify configurations for HBase, HDFS, and ZooKeeper?
Question 11 :Which HBase component manages the race to add a backup master?
- Primary master
- Region Server
Question 12 :Which component of a region server is the actual storage file of the data?
Question 13 :When the master node is updated, which file can be used to automatically update the other nodes in the cluster?
Question 14 : There is a single HLog for each region server. True or false?
Question 15 :What is the main purpose of the Write-Ahead log?
- To store HBase configuration details
- To store HDFS configuration details
- To flush data when the system reaches its capacity
- To prevent data loss in the event of a system crash
- To store performance details