Tuesday , June 18 2024
Breaking News

Apache Pig 101 Cognitive Class Exam Quiz Answers

Apache Pig 101 Cognitive Class Certification Answers

Question 1: What are the five ways to invoke Pig?

  • Script, Interactive Mode, Java Command, Interactive Local Mode, Interactive MapReduce Mode
  • Interactive External Mode, Interactive Mode, Script, Java Command, Interactive MapReduce Mode
  • Interactive Service Mode, Interactive Local Mode, Interactive External Mode, Interactive MapReduce Mode, Java Command
  • Interactive Local Mode, Interactive MapReduce Mode, Interactive External Mode, Interactive Mode, Script

Question 2: Bags are groups of tuples, tuples are groups of fields, and fields are composed of scalar data types. True or false?

  • True
  • False

Question 3: Which of the following statements is true?

  • Names of relations and fields, as well as keywords and operators, are case sensitive. However, function names are case insensitive.
  • Keywords and operator names are case sensitive.
  • Function names are case sensitive.
  • Names of relations are case sensitive, but names of fields are case insensitive.

Question 1: For the tuples (3,5,2) (5,2,1) (3,7,3) (3,6,1), using the GROUP operator on the third field produces the following: (2,{(3,5,2)}), (1,{(5,2,1),(3,6,1)}), (3,{(3,7,3)}). True or false? Disregard order when answering.

  • True
  • False

Question 2: UNION, GROUP, and COGROUP can be used interchangeably without creating different outputs. True or false?

  • True
  • False

Question 3: Which operators can be used within a nested FOREACH block?

  • LIKE, COUNT, LIMIT, ORDER BY
  • COUNT, ORDER BY, AVG, DISTINCT
  • AVG, LIMIT, FILTER, LIKE
  • LIMIT, DISTINCT, ORDER BY, FILTER

Question 1: The COUNT operator does NOT require the use of the GROUP BY operator. True or false?

  • True
  • False

Question 2: The TOKENIZE() function splits a string and outputs a bag of words. True or false?

  • True
  • False

Question 3: The two types of UDFs are DEFINE and REGISTER. True or false?

  • True
  • False

Question 1: What is the primary purpose of Pig in the Hadoop architecture?

  • To provide logging support for Hadoop jobs
  • To support the execution of workflows consisting of a collection of actions
  • To provide a high-level programming language so that developers can simplify the task of writing MapReduce applications
  • To move data into HDFS

Question 2: When executing Pig in local mode, the process runs locally, but all of the data files are accessed via HDFS. True or false?

  • True
  • False

Question 3: Data can be loaded into Pig with or without defining a schema. True or false?

  • True
  • False

Question 4: In Pig, you can specify the delimiter used to load data by

  • doing nothing. Pig can automatically detect the delimiter used in your data file
  • adding a schema definition to your LOAD statement
  • adding ‘using PigStorage(delimiter)’ to your LOAD statement
  • All of the above

Question 5: Which of the following can be used to pass parameters into a Pig Script? Select all that apply.

  • Command line parameters
  • A parameter file
  • JSON
  • Web Services

Question 6: Which Pig Operator is used to save data into a file?

  • SAVE
  • LOAD
  • STORE
  • DUMP

Question 7: In Pig, all tuples in a relation must have the same number of fields. True or false?

  • True
  • False

Question 8: Which Pig relational operator is used to select tuples from a relation based on some criteria?

  • transform
  • filter
  • group
  • order by

Question 9: Which Pig relational operator is used to combine all the tuples in a relation that have the same key?

  • union
  • transform
  • filter
  • group
  • join

Question 10: Which Pig relational operator is used to combine two or more relations using one or more common field values?

  • union
  • transform
  • filter
  • group
  • join

Question 11: The Pig Tokenize evaluation operator splits a string and outputs a bag of words. True or false?

  • True
  • False

Question 12: When using the Pig Count evaluation operator, you must also use either the Group All or the Group By operator. True or false?

  • True
  • False

Question 13: Which of the following Pig operators can be used to review the logical, physical, and MapReduce execution plans?

  • Verbose
  • Dump
  • Store
  • Explain

Question 14: Which of the following is a valid Pig evaluation operator?

  • isempty
  • count_star
  • diff
  • count
  • All of the Above

Question 15: You can extend Pig via user defined functions. True or false?

  • True
  • False

Introduction to Apache Pig 101

Apache Pig is a high-level platform for analyzing large datasets in a distributed computing environment. It provides a high-level language, Pig Latin, which allows developers to write complex data transformations without having to worry about the underlying complexities of distributed computing frameworks like Apache Hadoop.

Here’s a quick overview of some key concepts in Apache Pig:

  1. Pig Latin: This is the language used to write scripts in Apache Pig. It’s a data flow language that allows you to describe data transformations using a series of operations like LOAD, FILTER, GROUP, JOIN, FOREACH, etc.
  2. Execution Model: Pig Latin scripts are executed in a distributed fashion using Apache Hadoop. Pig translates Pig Latin scripts into a series of MapReduce jobs that run on a Hadoop cluster.
  3. Data Types: Pig supports various data types including scalar types (int, long, float, double, chararray, bytearray), complex types (tuple, bag, map), and atomic types (null).
  4. Relational Model: Pig treats data as collections of tuples (rows) and bags (unordered collections of tuples). This relational model makes it easy to perform SQL-like operations on structured data.
  5. LOAD and STORE: These are the primary operations for loading data into Pig from various sources (such as HDFS, local file system, HBase, etc.) and storing the results of computations back to the file system.
  6. Transformations: Pig provides a wide range of transformations to manipulate and analyze data. Some common transformations include FILTER (to select tuples based on a condition), GROUP (to group data based on a key), JOIN (to join two or more datasets), FOREACH (to apply transformations to each tuple), and many more.
  7. UDFs (User Defined Functions): Pig allows you to define custom functions in Java, Python, or other languages, which can be used within Pig Latin scripts to perform specialized computations.
  8. Schema On Read: Unlike traditional databases, Pig follows a “schema on read” approach, meaning it does not enforce a schema when data is loaded. Instead, it infers the schema when data is accessed, allowing for more flexibility in handling diverse datasets.

By leveraging these concepts, developers can efficiently process and analyze large-scale data using Apache Pig, abstracting away much of the complexity of distributed computing.

About Clear My Certification

Check Also

Google Adwords Certification – Google Adwords MCQ (Muliple Choice Questions) | The Digital ADDA

AdWords is an advertising system Google developed to help businesses reach online target markets through …

Leave a Reply

Your email address will not be published. Required fields are marked *