Apply for this Certification: SQL Access for Hadoop Cognitive Class Exam Quiz Answers
SQL Access for Hadoop Cognitive Class Certification Answers
Module 1: Big SQL Overview SQL Access for Hadoop
Question 1: Which Big SQL architecture component is responsible for accepting queries?
- Hive Server
- Scheduler
- Worker Node
- DDL Processing Engine
- Master Node
Question 2: Big SQL differs from Big SQL v1 in which of the following ways? Select all that apply.
- Big SQL does not have support for HBase
- Big SQL v1 reserves double quotes for identifiers
- Big SQL requires the HADOOP keyword for table creation
- Big SQL v1 treats single and double quotes as the same
- DDL in Big SQL v1 is a superset of Big SQL
Question 3: In Big SQL, what is the term for the default directory in the distributed file system (DFS) where tables are stored?
- Schema
- Metastore
- Table
- Warehouse
- Partitioned Table
Module 2: Big SQL Data Types
Question 1: What are the main data type categories in Big SQL? Select all that apply.
- SQL
- INT
- Declared
- REAL
- Hive
Question 2: When creating a table, which keyword is used to specify the DFS directory for storing data files?
- EXTERNAL
- HADOOP
- USE
- CHECK
- LOCATION
Question 3: Which human-readable Big SQL file format uses a character to separate column values?
- Avro
- Parquet
- ORC
- Sequence
- Delimited
SQL Access for Hadoop Final Exam Answers – Cognitive Class
Question 1: In order to use Big SQL, you need to learn several new query languages. True or false?
- True
- False
Question 2: Which component serves as the main interface between Big SQL and Hadoop?
- Hive Metastore
- Big SQL Master Node
- Scheduler
- Big SQL Worker Node
- UDF FMP
Question 3: Officially, there are two different releases of Big SQL. True or false?
- True
- False
Question 4: Which of the following statements is true of a partitioned table?
- Query predicates can be used to avoid scanning every partition
- A table may be partitioned on one or more rows
- Data is stored in multiple directories for each partition
- The partitions are specified only when data is inserted
- All of the above
Question 5: Which of the following statements is true of JSqsh?
- JSqsh supports multiple active sessions
- JSqsh is an open source command client
- JSqsh can be used to work with Big SQL
- The term JSqsh derives from “Java SQL Shell”
- All of the above
Question 6: Which of the following statements is true of the SQL data type?
- The database engine supports the SQL data type
- There are more declared data types than SQL data types
- SQL data types are provided in the CREATE statement
- SQL data types tell SerDe how to encode and decode values
- All of the above
Question 7: In Big SQL, the STRING and VARCHAR types are equivalent and can be used interchangeably. True or false?
- True
- False
Question 8: What is the default Big SQL schema?
- “admin”
- Your login name
- “warehouse”
- “default”
- The schema that was previously used
Question 9: Which of the following statements are true of Parquet files? Select all that apply.
- Parquet files are supported by the native I/O engine
- Parquet files provide a columnar storage format
- Parquet files support the DATE and TIMESTAMP data types
- Parquet is a high-performance file format
- Parquet files are good for data interchange outside of Hadoop
Question 10: Which of the following statements are true of ORC files? Select all that apply.
- ORC files are supported by the native I/O engine
- ORC files are good for data interchange outside of Hadoop
- Individual columns can be retrieved efficiently
- ORC files can be efficiently compressed
- Big SQL can exploit every advanced ORC feature
Question 11: Which of the following statements is NOT true of the Native I/O processing engine?
- There is a high-speed interface for common file formats
- The native engine supports the delimited file format, among others
- The native engine is highly optimized and parallelized
- The native engine is written in Java
- All of the above statements are true
Question 12: Which of the following statements about Big SQL are true? Select all that apply.
- Big SQL comes with comprehensive SQL support
- Big SQL provides a powerful SQL query rewriter
- Big SQL currently doesn’t support subqueries
- Big SQL queries can only be written for one data source
- Big SQL supports all the standard join operations
Question 13: Which keyword indicates that the data in a table is not managed by the database manager?
- USE
- LOCATION
- EXTERNAL
- HADOOP
- CHECK
Question 14: The Avro file format is more efficient than Parquet and ORC. True or false?
- True
- False
Question 15: Which statement accurately characterizes the Big SQL data types?
- Sequence files are the fastest format
- Delimited files are the most efficient format
- ORC files can be efficiently compressed
- Avro is human readable
- RC files replaced ORC files
Introduction to SQL Access for Hadoop
SQL access for Hadoop typically involves using tools or frameworks that allow you to query and analyze data stored in Hadoop Distributed File System (HDFS) or Hadoop-compatible file systems like Apache HBase, Apache Hive, or Apache Hadoop itself. Here are some common methods for SQL access in Hadoop:
- Apache Hive: Apache Hive provides a SQL-like interface to Hadoop. It allows you to query data using a SQL dialect called HiveQL, which gets translated into MapReduce, Tez, or Spark jobs, depending on the Hive execution engine you choose. Hive is widely used for batch processing and is suitable for scenarios where data is already structured or semi-structured.
- Apache Impala: Impala is an open-source massively parallel processing (MPP) SQL query engine for data stored in Hadoop clusters. It provides low-latency SQL queries directly on Hadoop, bypassing MapReduce, by using its own execution engine. Impala is suitable for interactive queries and is often used for ad-hoc analysis.
- Apache Drill: Apache Drill is a distributed SQL query engine designed to enable users to query structured and semi-structured data from a variety of data sources, including Hadoop, NoSQL databases, and cloud storage. Drill provides a schema-free SQL query engine that can handle complex data types and nested data.
- Presto: Presto is an open-source distributed SQL query engine developed by Facebook. It is designed for interactive analytics queries on large-scale datasets stored in Hadoop, HDFS, or other data sources like relational databases or cloud storage. Presto is known for its high performance and flexibility.
- Spark SQL: Apache Spark includes a module called Spark SQL, which provides a DataFrame API and SQL interface for working with structured data. Spark SQL allows you to run SQL queries on data stored in Hadoop HDFS or other data sources supported by Spark, such as Apache Hive tables or Parquet files.
- Hadoop MapReduce with custom input/output formats: While less common for SQL access due to its complexity and verbosity, it’s possible to write custom MapReduce programs using Hadoop’s Java API to process and analyze data stored in HDFS. You can define custom input/output formats to handle SQL-like queries, but this approach requires more manual effort compared to higher-level tools like Hive or Impala.
These tools provide SQL access to data stored in Hadoop, allowing organizations to leverage existing SQL skills and tools for data analysis and processing in big data environments. The choice of tool depends on factors such as performance requirements, data formats, query complexity, and existing infrastructure.