Home Cognitive Class Cognitive Class: Machine Learning with Python Exam Answers 2020|Machine Learning with...

# Cognitive Class: Machine Learning with Python Exam Answers 2020|Machine Learning with Python Course Certificate Exam Answers

5336
0 #### Module 1: Machine Learning

1) Machine Learning uses algorithms that can learn from data without relying on explicitly programmed methods.

• True
• False

2)Which are the two types of Supervised learning techniques?

• Classification and Clustering
• Classification and K-Means
• Regression and Clustering
• Regression and Partitioning
• Classification and Regression

3)Which of the following statements best describes the Python scikit library?

• A library for scientific and high-performance computation.
• A collection of algorithms and tools for machine learning.
• A popular plotting package that provides 2D plotting as well as 3D plotting.
• A library that provides high-performance, easy to use data structures.
• A collection of numerical algorithms and domain-specific toolboxes.

#### Module 2: Regression

1)Train and Test on the Same Dataset might have a high training accuracy, but its out-of-sample accuracy can be low.

• True
• False

2)Which of the following matrices can be used to show the results of model accuracy evaluation or the model’s ability to correctly predict or separate the classes?

• Confusion matrix
• Evaluation matrix
• Accuracy matrix
• Error matrix
• Identity matrix

3)When we should use Multiple Linear Regression?

• When we would like to identify the strength of the effect that the independent variables have on a dependent variable.
• When there are multiple dependent variables.

#### Module 3: Classification

1)In K-Nearest Neighbors, which of the following is true:

• A very high value of K (ex. K = 100) produces an overly generalised model, while a very low value of k (ex. k = 1) produces a highly complex model.
• A very high value of K (ex. K = 100) produces a model that is better than a very low value of K (ex. K = 1)
• A very high value of k (ex. k = 100) produces a highly complex model, while a very low value of K (ex. K = 1) produces an overly generalized model.

2)A classifier with lower log loss has better accuracy.

• True
• False

3)When building a decision tree, we want to split the nodes in a way that decreases entropy and increases information gain.

• True
• False

#### Module4: Clustering

1)Which one is NOT TRUE about k-means clustering??

• k-means divides the data into non-overlapping clusters without any cluster-internal structure.
• The objective of k-means, is to form clusters in such a way that similar samples go into a cluster, and dissimilar samples fall into different clusters.
• As k-means is an iterative algorithm, it guarantees that it will always converge to the global optimum.

2)Customer Segmentation is a supervised way of clustering data, based on the similarity of customers to each other.

• True
• False

3)How is a center point (centroid) picked for each cluster in k-means?

• We can randomly choose some observations out of the data set and use these observations as the initial means.
• We can select the centroid through correlation analysis.

#### Module 5: Recommender System

1)Collaborative filtering is based on relationships between products and people’s rating patterns.

• True
• False

2)Which one is TRUE about Content-based recommendation systems?

• Content-based recommendation system tries to recommend items to the users based on their profile.
• In content-based approach, the recommendation process is based on similarity of users.
• In content-based recommender systems, similarity of users should be measured based on the similarity of the actions of users.

3)Which one is correct about user-based and item-based collaborative filtering?

• In item-based approach, the recommendation is based on profile of a user that shows interest of the user on specific item
• In user-based approach, the recommendation is based on users of the same neighborhood, with whom he/she shares common preferences.

## Final Exam

Question 1)You can define Jaccard as the size of the intersection divided by the size of the union of two label sets.

• True
• False

Question 2) When building a decision tree, we want to split the nodes in a way that increases entropy and decreases information gain.

• True
• False

Question 3) Which of the following statements are true? (Select all that apply.)

• K needs to be initialized in K-Nearest Neighbor.
• Supervised learning works on labelled data.
• A high value of K in KNN creates a model that is over-fit
• KNN takes a bunch of unlabelled points and uses them to predict unknown points.
• Unsupervised learning works on unlabelled data.

Question 4) To calculate a model’s accuracy using the test set, you pass the test set to your model to predict the class labels, and then compare the predicted values with actual values.

• True
• False

Question 5) Which is the definition of entropy?

• The purity of each node in a decition tree.
• Information collected that can increase the level of certainty in a particular prediction.
• The information that is used to randomly select a subset of data.
• The amount of information disorder in the data.

• Average linkage is the average distance of each point in one cluster to every point in another cluster
• Complete linkage is the shortest distance between a point in two clusters
• Centroid linkage is the distance between two randomly generated centroids in two clusters
• Single linkage is the distance between any points in two clusters

Question 7) The goal of regression is to build a model to accurately predict the continues value of a dependent variable for an unknown case.

• True
• False

Question 8) Which of the following statements are true about linear regression? (Select all that apply)

• With linear regression, you can fit a line through the data.
• y=a+b_x1 is the equation for a straight line, which can be used to predict the continuous value y.
• In y=θ^T.X, θ is the feature set and X is the “weight vector” or “confidences of the equation”, with both of these terms used interchangeably.

Question 9) The Sigmoid function is the main part of logistic regression, where Sigmoid of 𝜃^𝑇.𝑋, gives us the probability of a point belonging to a class, instead of the value of y directly.

• True
• False

Question 10) In comparison to supervised learning, unsupervised learning has:

• Less tests (evaluation approaches)
• More models
• A better controlled environment
• More tests (evaluation approaches), but less models

Question 11) The points that are classified by Density-Based Clustering and do not belong to any cluster, are outliers.

• True
• False

Question 12) Which of the following is false about Simple Linear Regression?

• It does not require tuning parameters
• It is highly interpretable
• It is fast
• It is used for finding outliers

Question 13) Which one of the following statements is the most accurate?

• Machine Learning is the branch of AI that covers the statistical and learning part of artificial intelligence.
• Deep Learning is a branch of Artificial Intelligence where computers learn by being explicitely programmed.
• Artificial Intelligence is a branch of Machine Learning that covers the statistical part of Deep Learning.
• Artificial Intelligence is the branch of Deep Learning that allows us to create models.

Question 14) Which of the following are types of supervised learning?

• Classification
• Regression
• KNN
• K-Means
• Clustering

Question 15) A Bottom-Up version of hierarchical clustering is known as Divisive clustering. It is a more popular method than the Agglomerative method.

• True
• False

Question 16) Select all the true statements related to Hierarchical clustering and K-Means.

• Hierarchical clustering does not require the number of clusters to be specified.
• Hierarchical clustering always generates different clusters, whereas k-Means returns the same clusters each time it is run.
• K-Means is more efficient than Hierarchical clustering for large datasets.

Question 17) What is a content-based recommendation system?

• Content-based recommendation system tries to recommend items to the users based on their profile built upon their preferences and taste.
• Content-based recommendation system tries to recommend items based on similarity among items.
• Content-based recommendation system tries to recommend items based on the similarity of users when buying, watching, or enjoying something.

Question 18) Before running Agglomerative clustering, you need to compute a distance/proximity matrix, which is an n by n table of all distances between each data point in each cluster of your dataset.

• True
• False

Question 19) Which of the following statements are true about DBSCAN? (Select all that apply)

• DBSCAN can be used when examining spatial data.
• DBSCAN can be applied to tasks with arbitrary shaped clusters, or clusters within clusters.
• DBSCAN is a hierarchical algorithm that finds core and border points.
• DBSCAN can find any arbitrary shaped cluster without getting affected by noise.

Question 20) In recommender systems, “cold start” happens when you have a large dataset of users who have rated only a limited number of items.

• True
• False