Friday , November 22 2024
Breaking News

Data Science Methodology Cognitive Class Exam Quiz Answers

Data Science Methodology Cognitive Class Certification Answers

Question 1: Select the correct statement.

  • A methodology is an application for a computer program.
  • A methodology is a set of instructions.
  • A methodology is a system of methods used in a particular area of study or activity.
  • All of the above statements are correct.

Question 2: Select the correct statement.

  • The data science methodology described in this course is only used by certified data scientists.
  • The data science methodology described in this course is outlined by John Rollins from IBM.
  • The data science methodology described in this course is limited to IBM.
  • None of the above statements are correct.

Question 3: Select the correct statement.

  • The first stage of the data science methodology is data understanding.
  • The first stage of the data science methodology is modeling.
  • The first stage of the data science methodology is business understanding.
  • The first stage of the data science methodology is data collection.

Question 1: Select the correct statement.

  • If a problem is a dish, then data is an answer.
  • If a problem is a dish, then data is an ingredient.
  • If a problem is a dish, then data is a list of information.
  • None of the above statements are correct.

Question 2: Select the correct statement.

  • A data requirement is never refined.
  • A data requirement is set in stone.
  • A data requirement is the initial set of ingredients.
  • None of the above statements are correct.

Question 3: Select the correct statement.

  • Data scientists determine how to prepare the data.
  • Data scientists identify the data that is required for data modeling.
  • Data scientists determine how to collect the data.
  • All of the above.

Question 1: Select the correct statement about data preparation.

  • Data preparation involves properly formatting the data.
  • Data preparation involves correcting invalid values and addressing outliers.
  • Data preparation involves removing duplicate data.
  • Data preparation involves addressing missing values.
  • All of the above statements are correct.

Question 2: Select the correct statement about data understanding.

  • Data understanding encompasses removing redundant data.
  • Data understanding encompasses all activities related to constructing the dataset.
  • Data understanding encompasses sorting the data.
  • All of the above statements about data understanding are correct.

Question 3: Select the correct statement about what data scientists and database administrators (DBAs) do during data preparation.

  • During data preparation, data scientists and DBAs identify missing data.
  • During data preparation, data scientists and DBAs determine the timing of events.
  • During data preparation, data scientists and DBAs aggregate the data and merge them from different sources.
  • During data preparation, data scientists and DBAs define the variables to be used in the model.
  • All of the above statements are correct.

Question 1: Select the correct statement.

  • A training set is used for data visualization.
  • A training set is used for predictive modeling.
  • A training set is used for statistical analysis.
  • A training set is used for descriptive modeling.
  • None of the above statements are correct.

Question 2: A statistician calls a false-negative, a type I error, and a false-positive, a type II error.

  • True
  • False

Question 3: Select the correct statement about model evaluation.

  • Model evaluation can include statistical significance testing.
  • Model evaluation includes ensuring that the data are properly handled and interpreted.
  • Model evaluation includes ensuring the model is designed as intended.
  • Model evaluation includes ensuring that the model is working as intended.
  • All of the above statements are correct.

Question 1: The final stages of the data science methodology are an iterative cycle between modelling, evaluation, deployment, and feedback.

  • True
  • False

Question 2: What is model evaluation used for?

  • Assessing the model after getting deployed.
  • Assessing the model before getting deployed.
  • Determining if the model is good for other uses.
  • All of the above.
  • None of the above.

Question 3: Select the correct statement about the feedback stage of the data science methodology.

  • Feedback is essential to the long term viability of the model.
  • Feedback is not helpful and gets in the way.
  • Feedback is not required once launched.
  • None of the above statements are correct.

Question 1: Select the correct sentence about the data science methodology explained in the course.

  • Data science methodology is not an iterative process – one does not go back and forth between methodological steps.
  • Data science methodology is a specific strategy that guides processes and activities relating to data science only for text analytics.
  • Data science methodology always starts with data collection.
  • Data science methodology provides the data scientist with a framework for how to proceed to obtain answers.
  • Data science methodology depends on a specific set of technologies or tools.

Question 2: Business understanding is important in the data science methodology stage. Why?

  • Because it shapes the rest of the methodological steps.
  • Because it clearly defines the problem and the needs from a business perspective.
  • Because it ensures that the work generates the intended solution.
  • Because it involves domain expertise.
  • All of the above.

Question 3: A data scientist determines that building a recommender system is the solution for a particular business problem at hand. What stage of the data science methodology does this represent?

  • Modeling
  • Deployment
  • Model evaluation
  • Analytic approach
  • Data understanding

Question 4: Which of the following represent the two important characteristics of the data science methodology?

  • It is a highly iterative process and immediately ends when the model is deployed.
  • It is not an iterative process and it never ends.
  • It has no endpoint because data collection occurs before identifying the data requirements.
  • It immediately ends when the model is deployed because no feedback is required.
  • It is a highly iterative process and it never ends.

Question 5: What do data scientists typically use for exploratory analysis of data and to get acquainted with them?

  • They use support vector machines and neural networks as feature extraction techniques.
  • They begin with regression, classification, or clustering.
  • They use deep learning.
  • They use descriptive statistics and data visualization techniques.
  • All of the above.

Question 6: Select the correct statement about data preparation.

  • Data preparation cannot be accelerated through automation.
  • Data preparation involves dealing with missing improperly coded data and can include using text analysis to structure unstructured or semi-structured text data.
  • Data preparation is typically the least time-consuming methodological step.
  • All of the above.
  • None of the above.

Question 7: Which statement best describes the modeling stage of the data science methodology.

  • Modeling is followed by the analytic approach stage.
  • Modeling may require testing multiple algorithms and parameters.
  • Modeling is always based on predictive models.
  • Modeling always uses training and test sets.
  • All of the above.

Question 8: Which of the following statements best describe the model evaluation stage of the data science methodology?

  • Model evaluation may entail statistical significance tests, particularly when additional proof is necessary to justify some of the emerging recommendations.
  • Model evaluation is important because it examines how well the model performs in the context of the business problem.
  • Model evaluation entails computing graphs and/or various diagnostic measures such as a confusion matrix.
  • Model evaluation is done using a test set if the model is a predictive one.
  • All of the above.

Question 9: What does deploying a model into production represent?

  • It represents the end of the iterative process that includes feedback, model refinement, and redeployment.
  • It represents the beginning of an iterative process that includes feedback, model refinement and redeployment and requires the input of additional groups, such as marketing personnel and business owners.
  • It represents the final data science product.
  • None of the above.

Question 10: A data scientist, John, was asked to help reduce readmission rates at a local hospital. After some time, John provided a model that predicted which patients were more likely to be readmitted to the hospital and declared that his work was done. Which of the following best describes this scenario?

  • John only provided one model as a solution and he should have provided multiple models.
  • The scenario is already optimal.
  • Even though John only submitted one solution, it might be a good one. However, John needed feedback on his model from the hospital to confirm that his model was able to address the problem appropriately and sufficiently.
  • John’s mistake is that he lied in the analytic approach step of the data science methodology.
  • John still needed to collect more data.

Question 11: A car company asked a data scientist to determine what type of customers are more likely to purchase their vehicles. However, the data comes from several sources and is in a relatively “raw format”. What kind of processing can the data scientist perform on the data to prepare it for modeling?

  • Feature engineering.
  • Transforming the data into more useful variables.
  • Combining the data from the various sources.
  • Addressing missing/invalid values.
  • All of the above.

Question 12: High-performance, massively parallel systems can be used to facilitate the following methodological steps.

  • Data preparation and Modeling.
  • Modeling only.
  • Deployment.
  • Business understanding.
  • All of the above.

Question 13: Data scientists may use either a “top-down” approach or a “bottom-up” approach to data science. These two approaches refer to:

  • “Top-down” approach – the data, when sorted, is modeled from the “top” of the data towards the “bottom”. “Bottom-up” approach – the data is modeled from the “bottom” of the data to the “top”.
  • “Top-down” approach – models are fit before the data is explored. “Bottom-up” approach – data is explored, and then a model is fit.
  • “Top-down” approach – first defining a business problem then analyzing the data to find a solution. “Bottom-up” approach – starting with the data, and then coming up with a business problem based on the data.
  • “Top-down” approach – using massively parallel, warehouses with huge data volumes as the data source. “Bottom-up” approach – using a sample of small data before using large data.
  • All of the above.

Question 14: The following are all examples of rapidly evolving technologies that affect data science methodology EXCEPT for?

  • Data sampling.
  • Automation.
  • Text analysis.
  • Platform growth.
  • In-database analytics.

Question 15: Data understanding involves all of the following EXCEPT for?

  • Discovering initial insights about the data.
  • Visualizing the data.
  • Assessing data quality.
  • Understanding the content of the data.
  • Gathering and analyzing feedback for assessment of the model’s performance.

Question 16: For predictive models, a test set, which is similar to – but independent of – the training set, is used to determine how well the model predicts outcomes. This is an example of what step in the methodology?

  • Data preparation.
  • Deployment.
  • Analytic approach.
  • Model evaluation.
  • Data requirements.

Question 17: “When ______ data is available (such as customer call center logs or physicians’ notes in unstructured or semi-structured format), _______ analytics can be useful in deriving new structured variables to enrich the set predictors and improve model accuracy.” Which of the following most appropriately fills in the blanks?

  • text; text
  • market; statistical
  • big; digital
  • highly structured; text
  • text; predictive

Question 18: Typically in a predictive model, the training set and the test set are very different and independent, such as having a different set of variables or structure.

  • True
  • False

Question 19: Data scientists may frequently return to a previous stage to make adjustments, as they learn more about the data and the modeling.

  • True
  • False

Question 20: Why should data scientists maintain continuous communication with business sponsors throughout a project?

  • So that business sponsors can provide domain expertise.
  • So that business sponsors can ensure the work remains on track to generate the intended solution.
  • So that business sponsors can review intermediate findings.
  • All of the above.
  • None of the above.

Introduction to Data Science Methodology

Data science methodology is a structured approach used by data scientists to tackle various problems and extract insights from data. While specific methodologies may vary depending on the organization or project, they generally follow a similar framework. Here’s a typical data science methodology:

  1. Problem Definition: Clearly define the problem you want to solve or the question you want to answer. This involves understanding the business context, identifying stakeholders, and establishing objectives.
  2. Data Collection: Gather relevant data from various sources, such as databases, APIs, files, or web scraping. Ensure the data is clean, complete, and relevant to the problem at hand.
  3. Data Preparation: Clean and preprocess the data to make it suitable for analysis. This may involve handling missing values, removing outliers, transforming variables, and normalizing data.
  4. Exploratory Data Analysis (EDA): Explore the data to gain insights and understand its characteristics. This typically involves summarizing the main characteristics of the data, visualizing distributions, detecting patterns, and identifying relationships between variables.
  5. Feature Engineering: Create new features or transform existing ones to improve the performance of machine learning models. This may involve techniques such as one-hot encoding, feature scaling, or dimensionality reduction.
  6. Model Selection: Choose the most appropriate machine learning algorithms or statistical models for the problem at hand. Consider factors such as the type of problem (e.g., classification, regression), the size and complexity of the data, and the interpretability of the model.
  7. Model Training: Train the selected model(s) on the training data, using techniques such as cross-validation to evaluate performance and tune hyperparameters.
  8. Model Evaluation: Evaluate the performance of the trained model(s) on unseen data using appropriate metrics. This may involve comparing multiple models and selecting the best-performing one.
  9. Deployment: Deploy the model into production, making it available for use in real-world applications. This may involve integrating the model into existing systems, creating APIs, or building user interfaces.
  10. Monitoring and Maintenance: Continuously monitor the performance of the deployed model and update it as necessary. This may involve monitoring for concept drift, retraining the model with new data, or improving the model based on user feedback.

Throughout the entire process, it’s important to iterate and refine the steps as needed based on insights gained and results obtained. Communication with stakeholders is also crucial to ensure that the results are actionable and aligned with business objectives.

About Clear My Certification

Check Also

Controlling Hadoop Jobs using Oozie Cognitive Class Exam Quiz Answers

Enroll Here: Controlling Hadoop Jobs using Oozie Cognitive Class Exam Quiz Answers Controlling Hadoop Jobs …

Leave a Reply

Your email address will not be published. Required fields are marked *