Home Certification Data Science Methodology Cognitive Class Answers

Data Science Methodology Cognitive Class Answers

2519
0
Data Science Methodology Cognitive Class Answers

Enroll Here: Data Science Methodology

Module 1 – From Problem to Approach

Question 1: Select the correct statement.

  • A methodology is an application for a computer program.
  • A methodology is a set of instructions.
  • A methodology is a system of methods used in a particular area of study or activity.
  • All of the above statements are correct.

Question 2: Select the correct statement.

  • The data science methodology described in this course is only used by certified data scientists.
  • The data science methodology described in this course is outlined by John Rollins from IBM.
  • The data science methodology described in this course is limited to IBM.
  • None of the above statements are correct.

Question 3: Select the correct statement.

  • The first stage of the data science methodology is data understanding.
  • The first stage of the data science methodology is modeling.
  • The first stage of the data science methodology is business understanding.
  • The first stage of the data science methodology is data collection.

Module 2 – From Requirements to Collection

Question 1: Select the correct statement.

  • If a problem is a dish, then data is an answer.
  • If a problem is a dish, then data is an ingredient.
  • If a problem is a dish, then data is a list of information.
  • None of the above statements are correct.

Question 2: Select the correct statement.

  • A data requirement is never refined.
  • A data requirement is set in stone.
  • A data requirement is the initial set of ingredients.
  • None of the above statements are correct.

Question 3: Select the correct statement.

  • Data scientists determine how to prepare the data.
  • Data scientists identify the data that is required for data modeling.
  • Data scientists determine how to collect the data.
  • All of the above.

Module 3 – From Understanding to Preparation

Question 1: Select the correct statement about data preparation.

  • Data preparation involves properly formatting the data.
  • Data preparation involves correcting invalid values and addressing outliers.
  • Data preparation involves removing duplicate data.
  • Data preparation involves addressing missing values.
  • All of the above statements are correct.

Question 2: Select the correct statement about data understanding.

  • Data understanding encompasses removing redundant data.
  • Data understanding encompasses all activities related to constructing the dataset.
  • Data understanding encompasses sorting the data.
  • All of the above statements about data understanding are correct.

Question 3: Select the correct statement about what data scientists and database administrators (DBAs) do during data preparation.

  • During data preparation, data scientists and DBAs identify missing data.
  • During data preparation, data scientists and DBAs determine the timing of events.
  • During data preparation, data scientists and DBAs aggregate the data and merge them from different sources.
  • During data preparation, data scientists and DBAs define the variables to be used in the model.
  • All of the above statements are correct.

Module 4 – From Modeling to Evaluation

Question 1: Select the correct statement.

  • A training set is used for data visualization.
  • A training set is used for predictive modeling.
  • A training set is used for statistical analysis.
  • A training set is used for descriptive modeling.
  • None of the above statements are correct.

Question 2: A statistician calls a false-negative, a type I error, and a false-positive, a type II error.

  • True
  • False

Question 3: Select the correct statement about model evaluation.

  • Model evaluation can include statistical significance testing.
  • Model evaluation includes ensuring that the data are properly handled and interpreted.
  • Model evaluation includes ensuring the model is designed as intended.
  • Model evaluation includes ensuring that the model is working as intended.
  • All of the above statements are correct.

Module 5 – From Deployment to Feedback

Question 1: The final stages of the data science methodology are an iterative cycle between modelling, evaluation, deployment, and feedback.

  • True
  • False

Question 2: What is model evaluation used for?

  • Assessing the model after getting deployed.
  • Assessing the model before getting deployed.
  • Determining if the model is good for other uses.
  • All of the above.
  • None of the above.

Question 3: Select the correct statement about the feedback stage of the data science methodology.

  • Feedback is essential to the long term viability of the model.
  • Feedback is not helpful and gets in the way.
  • Feedback is not required once launched.
  • None of the above statements are correct.

Data Science Methodology Final Exam Answers

Question 1: Select the correct sentence about the data science methodology explained in the course.

  • Data science methodology is not an iterative process – one does not go back and forth between methodological steps.
  • Data science methodology is a specific strategy that guides processes and activities relating to data science only for text analytics.
  • Data science methodology always starts with data collection.
  • Data science methodology provides the data scientist with a framework for how to proceed to obtain answers.
  • Data science methodology depends on a specific set of technologies or tools.

Question 2: Business understanding is important in the data science methodology stage. Why?

  • Because it shapes the rest of the methodological steps.
  • Because it clearly defines the problem and the needs from a business perspective.
  • Because it ensures that the work generates the intended solution.
  • Because it involves domain expertise.
  • All of the above.

Question 3: A data scientist determines that building a recommender system is the solution for a particular business problem at hand. What stage of the data science methodology does this represent?

  • Modeling
  • Deployment
  • Model evaluation
  • Analytic approach
  • Data understanding

Question 4: Which of the following represent the two important characteristics of the data science methodology?

  • It is a highly iterative process and immediately ends when the model is deployed.
  • It is not an iterative process and it never ends.
  • It has no endpoint because data collection occurs before identifying the data requirements.
  • It immediately ends when the model is deployed because no feedback is required.
  • It is a highly iterative process and it never ends.

Question 5: What do data scientists typically use for exploratory analysis of data and to get acquainted with them?

  • They use support vector machines and neural networks as feature extraction techniques.
  • They begin with regression, classification, or clustering.
  • They use deep learning.
  • They use descriptive statistics and data visualization techniques.
  • All of the above.

Question 6: Select the correct statement about data preparation.

  • Data preparation cannot be accelerated through automation.
  • Data preparation involves dealing with missing improperly coded data and can include using text analysis to structure unstructured or semi-structured text data.
  • Data preparation is typically the least time-consuming methodological step.
  • All of the above.
  • None of the above.

Question 7: Which statement best describes the modeling stage of the data science methodology.

  • Modeling is followed by the analytic approach stage.
  • Modeling may require testing multiple algorithms and parameters.
  • Modeling is always based on predictive models.
  • Modeling always uses training and test sets.
  • All of the above.

Question 8: Which of the following statements best describe the model evaluation stage of the data science methodology?

  • Model evaluation may entail statistical significance tests, particularly when additional proof is necessary to justify some of the emerging recommendations.
  • Model evaluation is important because it examines how well the model performs in the context of the business problem.
  • Model evaluation entails computing graphs and/or various diagnostic measures such as a confusion matrix.
  • Model evaluation is done using a test set if the model is a predictive one.
  • All of the above.

Question 9: What does deploying a model into production represent?

  • It represents the end of the iterative process that includes feedback, model refinement, and redeployment.
  • It represents the beginning of an iterative process that includes feedback, model refinement and redeployment and requires the input of additional groups, such as marketing personnel and business owners.
  • It represents the final data science product.
  • None of the above.

Question 10: A data scientist, John, was asked to help reduce readmission rates at a local hospital. After some time, John provided a model that predicted which patients were more likely to be readmitted to the hospital and declared that his work was done. Which of the following best describes this scenario?

  • John only provided one model as a solution and he should have provided multiple models.
  • The scenario is already optimal.
  • Even though John only submitted one solution, it might be a good one. However, John needed feedback on his model from the hospital to confirm that his model was able to address the problem appropriately and sufficiently.
  • John’s mistake is that he lied in the analytic approach step of the data science methodology.
  • John still needed to collect more data.

Question 11: A car company asked a data scientist to determine what type of customers are more likely to purchase their vehicles. However, the data comes from several sources and is in a relatively “raw format”. What kind of processing can the data scientist perform on the data to prepare it for modeling?

  • Feature engineering.
  • Transforming the data into more useful variables.
  • Combining the data from the various sources.
  • Addressing missing/invalid values.
  • All of the above.

Question 12: High-performance, massively parallel systems can be used to facilitate the following methodological steps.

  • Data preparation and Modeling.
  • Modeling only.
  • Deployment.
  • Business understanding.
  • All of the above.

Question 13: Data scientists may use either a “top-down” approach or a “bottom-up” approach to data science. These two approaches refer to:

  • “Top-down” approach – the data, when sorted, is modeled from the “top” of the data towards the “bottom”. “Bottom-up” approach – the data is modeled from the “bottom” of the data to the “top”.
  • “Top-down” approach – models are fit before the data is explored. “Bottom-up” approach – data is explored, and then a model is fit.
  • “Top-down” approach – first defining a business problem then analyzing the data to find a solution. “Bottom-up” approach – starting with the data, and then coming up with a business problem based on the data.
  • “Top-down” approach – using massively parallel, warehouses with huge data volumes as the data source. “Bottom-up” approach – using a sample of small data before using large data.
  • All of the above.

Question 14: The following are all examples of rapidly evolving technologies that affect data science methodology EXCEPT for?

  • Data sampling.
  • Automation.
  • Text analysis.
  • Platform growth.
  • In-database analytics.

Question 15: Data understanding involves all of the following EXCEPT for?

  • Discovering initial insights about the data.
  • Visualizing the data.
  • Assessing data quality.
  • Understanding the content of the data.
  • Gathering and analyzing feedback for assessment of the model’s performance.

Question 16: For predictive models, a test set, which is similar to – but independent of – the training set, is used to determine how well the model predicts outcomes. This is an example of what step in the methodology?

  • Data preparation.
  • Deployment.
  • Analytic approach.
  • Model evaluation.
  • Data requirements.

Question 17: “When ______ data is available (such as customer call center logs or physicians’ notes in unstructured or semi-structured format), _______ analytics can be useful in deriving new structured variables to enrich the set predictors and improve model accuracy.” Which of the following most appropriately fills in the blanks?

  • text; text
  • market; statistical
  • big; digital
  • highly structured; text
  • text; predictive

Question 18: Typically in a predictive model, the training set and the test set are very different and independent, such as having a different set of variables or structure.

  • True
  • False

Question 19: Data scientists may frequently return to a previous stage to make adjustments, as they learn more about the data and the modeling.

  • True
  • False

Question 20: Why should data scientists maintain continuous communication with business sponsors throughout a project?

  • So that business sponsors can provide domain expertise.
  • So that business sponsors can ensure the work remains on track to generate the intended solution.
  • So that business sponsors can review intermediate findings.
  • All of the above.
  • None of the above.

LEAVE A REPLY

Please enter your comment!
Please enter your name here