**UPSC Fever Free Certificate | Machine learning Deep Learning Quiz answers**

Hello everyone! Here’s a great opportunity for all the students to grab a free online certification from UPSC Fever . The Certificate answers are available in this post related to machine learning and deep learning. Check them Out. All the best!

**About the Quiz :**

- The quiz is conducted online in which participants have only once attempt.
- Total 20 MCQs has been Given and E-Certificate will be issued only to those who secure a minimum of 50% marks.
- Enter Your details carefully as some URL be rejected in the e-certificate.
- The registration is free.
- E-Certificate will be provided to the registered participants and above in the quiz

**Here is Deep Learning QUiz answers :**

**Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False? ***

- True
**False**

**Consider the two following random arrays “a” and “b”: What will be the shape of “c”? a = np.random.randn(2, 3) and a.shape = (2, 3) b = np.random.randn(2, 1) with b.shape = (2, 1) then c = a + b will be ***

**c.shape = (2, 3)**- c.shape = (3, 2)

**You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer? ***

- ReLU
- Leaky ReLU
**sigmoid**

**Images for cat recognition is an example of “structured” data, because it is represented as a structured array in a computer. True/False? ***

- True
**False**

**Among the following, which ones are “hyperparameters”? ***

**size of the hidden layer**- activation values
- weight matrices

**The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. True/False? ***

**True**- False

**Suppose img is a (32,32,3) array, representing a 32×32 image with 3 color channels red, green and blue. How do you reshape this into a column vector? ***

- x = img.reshape((3,32
*32))* *x = img.reshape((32*32*3,1))

**During forward propagation, in the forward function for a layer l you need to know what is the activation function in a layer (Sigmoid, tanh, ReLU, etc.). During backpropagation, the corresponding backward function also needs to know what is the activation function for layer l, since the gradient depends on it. True/False? ***

**True**- False

**What does a neuron compute? ***

- A neuron computes a function g that scales the input x linearly (Wx + b)
**A neuron computes a linear function (z = Wx + b) followed by an activation function**- A neuron computes the mean of all features before applying the output to an activation function

**When an experienced deep learning engineer works on a new problem, they can usually use insight from previous problems to train a good model on the first try, without needing to iterate multiple times through different models. True/False? ***

- True
**False**

**You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(..,..)*1000. What will happen? ***

- It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.
**This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.**- This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.
- This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set \alphaα to be very small to prevent divergence; this will slow down learning.

**Which of the following statements is true? ***

**The deeper layers of a neural network are typically computing more complex features of the input than the earlier layers.**- The earlier layers of a neural network are typically computing more complex features of the input than the deeper layers.

**Consider the two following random arrays “a” and “b”: What will be the shape of “c”? a = np.random.randn(4, 3) and a.shape = (4, 3) b = np.random.randn(3, 2) and b.shape = (3, 2) then c = a*b will be ***

- c.shape = (4, 3)
**The computation cannot happen because the sizes don’t match. It’s going to be “Error”!**

**There are certain functions with the following properties: (i) To compute the function using a shallow network circuit, you will need a large network (where we measure size by the number of logic gates in the network), but (ii) To compute it using a deep network circuit, you need only an exponentially smaller network. True/False? ***

**True**- False

**Which of these is NOT a reason for Deep Learning recently taking off? ***

- We have access to a lot more data.
**Neural Networks are a brand new field.**- We have access to a lot more computational power.
- Deep learning has resulted in significant improvements in important applications such as online advertising, speech recognition, and image recognition.

**Consider the following code:What will be c? (If you’re not sure, feel free to run this in python to find out). a = np.random.randn(3, 3) and b = np.random.randn(3, 1) then c = a*b will be ***

**This will invoke broadcasting, so b is copied three times to become (3,3), and *∗ is an element-wise product so c.shape will be (3, 3)**- This will invoke broadcasting, so b is copied three times to become (3, 3), and *∗ invokes a matrix multiplication operation of two 3×3 matrices so c.shape will be (3, 3)

**Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true? ***

**Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.**

- Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.

**Vectorization allows you to compute forward propagation in an L-layer neural network without an explicit for-loop (or any other explicit iterative loop) over the layers l=1, 2, … ,L. True/False? ***

- True
**False**

**Recall that “np.dot(a,b)” performs a matrix multiplication on a and b, whereas “a*b” performs an element-wise multiplication. Consider the two following random arrays “a” and “b”: a = np.random.randn(12288, 150) a.shape = (12288, 150) and b = np.random.randn(150, 45) with b.shape = (150, 45) then c = np.dot(a,b) will be ***

- c.shape = (150,150)
**c.shape = (12288, 45)**

**A demographic dataset with statistics on different cities’ population, GDP per capita, economic growth is an example of “unstructured” data because it contains data coming from different sources. True/False? ***

- True
**False**

**Why is an RNN (Recurrent Neural Network) used for machine translation, say translating English to French? ***

- It can be trained as a supervised learning problem.
**It is strictly more powerful than a Convolutional Neural Network (CNN).**

**What does the analogy “AI is the new electricity” refer to? ***

- AI runs on computers and is thus powered by electricity, but it is letting computers do things not possible before.
- AI is powering personal devices in our homes and offices, similar to electricity.
**Similar to electricity starting about 100 years ago, AI is transforming multiple industries.**- Through the “smart grid”, AI is delivering a new wave of electricity.

**APPLY FOR THIS QUIZ CERTIFICATION**

**Here is Machine Learning QUiz answers :**

**You are carrying out error analysis and counting up what errors the algorithm makes. Which of these datasets do you think you should manually go through and carefully examine, one image at a time? ***

- 500 randomly chosen images
- 10,000 randomly chosen images
- 10,000 images on which the algorithm made a mistake
**500 images on which the algorithm made a mistake**

**The main options to address the issue of overfitting ***

**Reduce the number of features**- Increase the number of features
**Regularization**- No Regularization

**You’re running a company, and you want to develop learning algorithms to address each of two problems. Problem 1:You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months. Problem 2: You’d like software to examine individual customer accounts, and for each account decide if it has been hacked/compromised. Should you treat these as classification or as regression problems? ***

- Treat both as classification problems.
- Treat problem 1 as a classification problem, problem 2 as a regression problem.
**Treat problem 1 as a regression problem, problem 2 as a classification problem.**- Treat both as regression problems.

**Techniques to fix high bias ***

- Trying smaller sets of features
- Getting more training examples
**Adding features****Adding polynomial features****Decreasing regularization parameter**- Increasing regularization parameter

**Techniques to fix high variance ***

**Trying smaller sets of features****Getting more training examples**- Adding features
- Adding polynomial features
- Decreasing regularization parameter
**Increasing regularization parameter**

**Structuring your data Before implementing your algorithm, you need to split your data into train/dev/test sets. Which of these do you think is the best choice? ***

- Train – 3,333,334; Dev – 3,333,333 ; Test – 3,333,333
**Train – 9,500,000; Dev – 250,000 ; Test – 250,000**

When the form of our hypothesis function maps poorly to the trend of the data *

**Underfitting**- Overfitting
- Correlating features
- Covariance
- Low bias
**High bias**

If the difference between human-level error and the training error is bigger than the difference between the training error and the development error. The focus should be on *

- bias enhancing
**bias reducing**- variance enchancing
- variance reducing

**A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.” Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting? ***

**Classify emails as spam or not spam.**- Watching you label emails as spam or not spam.
- The number (or fraction) of emails correctly classified as spam/not spam.
- None of the above, this is not a machine learning algorithm

**The field of study that gives computers the ability to learn without being explicitly learned. ***

- Neural networks
- Recurrent network
- Convolutional Networks
**Machine Learning**- Dynamic Programming

**Over time, as you keep training the algorithm, maybe bigger and bigger models on more and more data, the performance approaches but never surpasses some theoretical limit, which is called the ***

- landmark error
- Goldilocks zone
**Bayes error**- Distribution error
- Bernoullis error

**You train a system, and its errors are as follows (error = 100%-Accuracy): Training set error 4.0% Dev set error 4.5% This suggests that one good avenue for improving performance is to train a bigger network so as to drive down the 4.0% training error. Do you agree? ***

- Yes, because having 4.0% training error shows you have high bias.
- Yes, because this shows your bias is higher than your variance.
- No, because this shows your variance is higher than your bias.
**No, because there is insufficient information to tell.**

**It involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1. ***

**Feature scaling**- Mean normalization
- Correlating features
- Covariance

**It involves subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero ***

- Feature scaling
**Mean normalization**- Correlating features
- Covariance

**We are given an unlabeled data set and we would like to have an algorithm automatically group the data into coherent subsets or into coherent clusters for us ***

- Classification
- Regression
**Clustering**- Data mining
- Exploratory data analysis

**a system design property that assures that modifying an instruction or a component of an algorithm will not create or propagate side effects to other components of the system ***

- use case analysis
- waterfall model
**orthogonality**- optimization

**Method to assure that our backpropagation works as intended ***

- Gradient boosting
**Gradient checking**- Regularization
- Gradient descent

**simplification of a processing or learning systems into one neural network ***

- Transfer learning
- Fine tuning
- Weight shifting
- Gradient boosting
**End-to-end deep learning**- Data synthesis
- Multi-task learning

**One of the most powerful ideas in deep learning is that sometimes you can take knowledge the neural network has learned from one task and apply that knowledge to a separate task ***

**Transfer learning**- Fine tuning
- End-to-end deep learning
- Weight shifting
- Gradient boosting
- Data synthesis
- Multi-task learning

**If the difference between training error and the development error is bigger than the difference between the human-level error and the training error. The focus should be on ***

- bias enhancing
- bias reducing
- variance enchancing
**variance reducing**

**Of the following examples, which would you address using an unsupervised learning algorithm? (Check all that apply.) ***

- Given email labeled as spam/not spam, learn a spam filter.
**Given a set of news articles found on the web, group them into set of articles about the same story****Given a database of customer data, automatically discover market segments and group customers into different market segments**- Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not

**The λ, or lambda, is the regularization parameter. If lambda is chosen to be too large, it may ***

**cause underfitting**- cause overfitting
- no effect

**When a supervised learning system is design, these are the which assumptions that needs to be true ***

**Fit training set well in cost function**- Use of a bigger development set
**Fit development set well in cost function**- Regularization or using bigger training set
**Fit test set well on cost function**

**Your goal is to detect road signs (stop sign, pedestrian crossing sign, construction ahead sign) and traffic signals (red and green lights) in images. The goal is to recognize which of these objects appear in each image. You plan to use a deep neural network with ReLU units in the hidden layers. For the output layer, a softmax activation would be a good choice for the output layer because this is a multi-task learning problem. True/False? ***

- true
**False**

**Having one neural network do simultaneously several tasks ***

- Transfer learning
- Fine tuning
- End-to-end deep learning
- Weight shifting
- Gradient boosting
- Data synthesis
**Multi-task learning**

**APPLY FOR THIS CERTIFICATION**

**FOLLOW MY SOCIAL MEDIA CHANNELS, TELEGRAM CHANNEL, AND WHATSAPP GROUP FOR THE LATEST UPDATES ON FREE COURSES, CERTIFICATIONS, SCHOLARSHIPS, INTERNSHIPS, AND JOBS**