Tuesday , February 27 2024
Breaking News

Text Analysis 101 Cognitive Class Exam Quiz Answers

Text Analysis 101 Cognitive Class Certification Answers

Text Analysis 101 Cognitive Class Exam Quiz Answers

Question 1: Your client would like to know how its advertising campaign impressed customers. Which IE task would provide this data?

  • Event Extraction
  • Co-reference resolution
  • Sentiment Extraction
  • Relation Extraction

Question 2: Which extraction phase can turn a dictionary match of a common first name plus an adjacent regular expression into a “potential person name” entity?

  • None of these
  • Entity Resolution
  • Named Entity Recognition
  • Feature Selection

Question 3: Consider a set of news articles that contains 20 mentions of person names. From this source, an extractor extracts 15 entities, 3 of which are incorrect. What are the Precision (P) and Recall (R) values?

  • P = 0.30, R = 0.70
  • P = 0.70, R = 0.30
  • P = 0.80, R = 0.60
  • P = 1.00, R = 1.30

Question 1: Which of the following poses huge demands on the IE engine?

  • Complex IE tasks
  • Heterogeneous text inputs
  • Different types of data
  • All of the above

Question 2: A typical IE grammar-based workflow:

  • Targets the step closest to the nature of the rule applied.
  • Follows a unidirectional sequence of steps.
  • Targets each step based on the nature of the input data.
  • Creates a branched path according to the input and the desired output.

Question 3: We can overcome IE performance limitations by:

  • Separating extractor semantics from execution strategy.
  • Coupling extractor semantics with execution strategy.
  • Parallel processing.
  • Making faster finite state transducers.

Question 1: What outputs do extractors generate in System T?

  • Extractors
  • Regular Expressions
  • Annotations
  • None of above

Question 2: An output refiner helps you to:

  • Define multiple filters.
  • Union multiple extractors.
  • Define multiple extractors.
  • None of above

Question 3: By selecting the Mapping Table checkbox in the Dictionary extractor, you can:

  • Map dictionary terms against categories.
  • Create a two-column dictionary.
  • Add a column of metadata.
  • All of the above.

Question 1: Which of the following statements describes AQL?

  • AQL has a syntax that is similar to SQL.
  • AQL has expressive power of algebra.
  • AQL separates semantics from implementation.
  • All of the above.

Question 2: What are the main advantages of SystemT’s approach towards Information Extraction?

  • Richer and cleaner rule semantics
  • Better performance through optimization
  • Improved quality of results
  • A and B

Question 3: Which of the following files can be part of an AQL module?

  • Dictionary file
  • AQL file
  • UDF jar
  • All of the above

Question 1: Which factor is essential for the Union All statement to work?

  • The tuples should be from the same input text.
  • The schemas of the tuples should be different.
  • The schemas of the tuples should be from a single view.
  • The schemas of the tuples should be same.

Question 2: Which of the following options is a valid consolidate policy?

  • ContainsButNotEqual
  • RightToLeft
  • ExactEqual
  • ContainedInside

Question 3: When is the Minus statement useful?

  • When the two sets of input tuples have different schemas
  • When you want to find matches for a sequence pattern
  • When you want to subtract a set of tuples from another set of tuples
  • All of the above

Question 1: Which type of text can be extracted using the Detag statement?

  • Semi-structured text
  • Unstructured text
  • Structured text
  • None of above

Question 2: When should you use a standard tokenizer?

  • When token boundaries are defined by punctuation and whitespace.
  • When extraction of person names from Chinese text is needed.
  • When extraction of parts of speech is required.
  • All of the above.

Question 3: Which best practices should you use when developing an AQL module?

  • Place large dictionaries and tables in separate modules.
  • Avoid using the output view statement when developing extractor libraries.
  • Document the source code using AQL Doc.
  • All of the above.

Question 1: Which of the following leads to mistakes when two rules match the same region of text?

  • Limited expressivity
  • Lossy sequencing
  • Rigid matching priority
  • None of the above

Question 2: Which of the following strategies can overcome lossy sequencing?

  • Expand rule patterns to include features such as aggregation.
  • Impose modular tokenization.
  • Include matching regimes that increase flexibility on priority.
  • Use grammar rules that operate on graphs rather than sequences of annotations.

Question 3: In which stage of the SystemT optimizer do you merge block plans into a single operator graph?

  • Post-processor
  • Planner
  • Pre-processor
  • None of the above

Question 1: Why is it that the first document in a collection is often at the top of the AQL Profiler’s “hot” document’s view?

  • The optimizer is trying to produce plans that are sensitive to each input document.
  • This is because of how Java implements regex.
  • This is due to the Java compiler.
  • System T sorts documents by length for processing, so the first document is the longest.

Question 2: Which of the following is NOT a best practice for writing AQL?

  • Use the AQL profiler to find and address hot spots.
  • Follow simple rules of thumb when writing AQL.
  • Don’t hand-tune while writing AQL.
  • Always ignore throughput levels when designing extractors.

Question 3: Why is it necessary to be selective about performance tuning?

  • It might adversely affect code readability
  • It might reduce the quality of your results
  • It might make your code more difficult to maintain
  • A and B.
  • A and C.

Question 1: Identify the logical sequence of phases in an IE system.

  • Entity Identification > Feature Selection > Entity Resolution
  • Entity Identification > Entity Resolution > Feature Selection
  • Feature Selection > Entity Resolution > Entity Identification
  • Feature Selection > Entity Identification > Entity Resolution

Question 2: Consider a set of news articles that contains 100 mentions of organizations. From this source, an extractor extracts 75 entities, 50 of which are correct. What are the Precision (P) and Recall (R) values of this extractor?

  • P = 0.75, R = 0.50
  • P = 0.67, R = 1.50
  • P = 0.67, R = 0.50
  • P = 0.50, R = 0.67

Question 3: What problem is caused by an IE system having a rigid matching priority?

  • Regular expressions cannot be used when specifying rules.
  • There is no support for matching strings spanning more than one token.
  • The system cannot express aggregation operations.
  • When multiple rules match the same region of text, mistakes are likely to occur.

Question 4: The System T consolidate policy:

  • Applies a filtering predicate to output tuples.
  • Specified how to handle tuples with overlapping spans.
  • Specifies which tuple columns to group on.
  • Specifies a tuple ordering.

Question 5: Which of the following AQL statements uses expressions, dictionaries, and sequence patterns to perform extraction?

  • Relational style statement.
  • Extract statement.
  • Create table statement.
  • Select statement.

Question 6: Which of the following statements are part of an AQL file?

  • Create external table statements.
  • Import statements.
  • Create external dictionary statements.
  • Export statements.
  • All of the above.

Question 7: Which of the following types is a return value for table UDFs?

  • Tuples.
  • Integer.
  • Span.
  • Boolean.

Question 8: Which predicate would you use to check if a span is exactly equal to one of a predefined set of words?

  • FollowsTok.
  • MatchesRegex.
  • MatchesDict.
  • ContainsDict.

Question 9: Why is correct text tokenization important?

  • Dictionary evaluation and many extraction operators, such as regex, are done on token boundaries, and incorrect tokenization will lead to incorrect results.
  • Several built-in predicates and functions are token sensitive.
  • AQL extract statements will not compile if tokenization is incorrect.
  • A and B.
  • A and C.

Question 10: Which of the following is NOT a best practice rule of thumb to follow when writing AQL?

  • Use dictionaries instead of regex whenever possible.
  • Make sure each module has its own copy of every dictionary.
  • Avoid using UDFs as join predicates.
  • Avoid Cartesian products.

Introduction to Text Analysis 101

Text analysis, also known as text mining or natural language processing (NLP), involves extracting meaningful insights and patterns from unstructured text data. This process enables computers to understand, interpret, and derive valuable information from textual content. Text analysis can be applied to various domains, including sentiment analysis, named entity recognition, topic modeling, and more. Here are some key aspects of text analysis:

1. Tokenization:

  • Definition: Tokenization is the process of breaking down a text into individual units, usually words or phrases, known as tokens.
  • Purpose: It facilitates further analysis by converting text into a format that can be easily processed.

2. Part-of-Speech Tagging (POS):

  • Definition: POS tagging involves labeling each word in a sentence with its grammatical category, such as noun, verb, adjective, etc.
  • Purpose: Helps in understanding the syntactic structure of sentences and extracting valuable information about the roles of words.

3. Named Entity Recognition (NER):

  • Definition: NER identifies and classifies entities (e.g., names of people, organizations, locations) in a text.
  • Purpose: Useful for extracting structured information and identifying key entities within a document.

4. Sentiment Analysis:

  • Definition: Sentiment analysis, or opinion mining, determines the sentiment expressed in a piece of text, whether it is positive, negative, or neutral.
  • Purpose: Commonly used for understanding public opinion, customer feedback, and social media sentiment.

5. Text Classification:

  • Definition: Text classification involves categorizing text into predefined categories or classes based on its content.
  • Purpose: Used for tasks such as spam detection, topic categorization, and sentiment classification.

6. Topic Modeling:

  • Definition: Topic modeling algorithms identify topics present in a collection of texts and assign each document a distribution over these topics.
  • Purpose: Useful for organizing and summarizing large text corpora, such as news articles or research papers.

7. Word Embeddings:

  • Definition: Word embeddings represent words as dense vectors in a continuous vector space, capturing semantic relationships between words.
  • Purpose: Enhances the understanding of word semantics and improves the performance of various NLP tasks.

8. Text Summarization:

  • Definition: Text summarization algorithms generate concise and coherent summaries of longer texts while preserving key information.
  • Purpose: Aids in quickly grasping the main points of a document and is valuable for information retrieval.

9. Information Extraction:

  • Definition: Information extraction involves identifying and extracting specific pieces of information from text, such as dates, numbers, or relationships.
  • Purpose: Enables the extraction of structured data from unstructured text sources.

Tools and Libraries for Text Analysis:

  • Natural Language Toolkit (NLTK): A powerful library for working with human language data in Python.
  • spaCy: An open-source library for advanced NLP in Python.
  • Stanford NLP: A suite of natural language processing tools for Java, Python, and other languages.
  • scikit-learn: A machine learning library in Python that includes tools for text analysis.
  • Gensim: A library for topic modeling and document similarity analysis.

Text analysis is a vast and evolving field with applications in various industries, including finance, healthcare, marketing, and more. As technology advances, the capabilities of text analysis continue to grow, enabling more sophisticated and nuanced insights from unstructured textual data.

About Clear My Certification

Check Also

Controlling Hadoop Jobs using Oozie Cognitive Class Exam Quiz Answers

Controlling Hadoop Jobs using Oozie Cognitive Class Exam Quiz Answers

Enroll Here: Controlling Hadoop Jobs using Oozie Cognitive Class Exam Quiz Answers Controlling Hadoop Jobs …

Leave a Reply

Your email address will not be published. Required fields are marked *