Thursday , August 15 2024
Breaking News

# RapidMinder Machine Learning Professional Certification Quiz Exam Answers

## Topic: Auto Model

• is able to use GPU processors in your computer to speed up the modeling process.
• encourages users to do feature selection which is often overlooked.
• follows many data science ‘best practices’.
• uses modeling algorithms that are not available as individual operators.
• Generalized Linear Model
• Deep Learning
• Support Vector Machine
• Naïve Bayes
• Logistic Regression
• Decision Tree
• Fast Large Margin

## Topic: Unsupervised Techniques

• iteratively improving the position of k centroids in the sample space until an optimal placement is found.
• starting with one point in the sample space, finding more points in the space within a neighborhood ℇ until no more points can be found, and then repeating this process for k-1 points.
• iteratively determining the Gaussian distribution (via its mean and standard deviation) of k clusters until the probabilities of all points in the sample space are maximized.
• pairing each point with another point such that their distance is minimized, and then repeating this process with larger groups of points until there are only k clusters remaining.
• attributes a2 and a4 partition the data set well between cluster_0 and cluster_1.
• attributes a1 and a3 do not partition the data set well between cluster_0 and cluster_1.
• attributes a2 and a4 partition the data set well between cluster_0 and cluster_2.
• attributes a2 and a4 do not partition the data set between cluster_0 and cluster_2.
• ensure that a regression model is not overfitting the data.
• find attributes that may have a relationship to one another.
• eliminate data that do not fit a particular model.
• computing the accuracy of a linear regression model.
• Wind
• Play
• Outlook
• Temperature
• Humidity
• Decision Tree
• k-Means clustering
• Support Vector Machine
• FP-Growth
• If you are not sure, then use the default value, 5. It is almost always optimal.
• Start with X-Means instead of k-Means; it will find an optimal k according to a heuristic.
• Start with a value of k that is large relative to the number of attributes that you have and apply k-Means. Then visualize the results with a scatter plot and set k to the number of distinct clusters.
• There is no method that is consistent across all applications.
• Year 1
• Year 2
• Year 3
• Year 4
• Year 5

## Topic: Classification & Regression

• The bias and variance both increase.
• The bias and variance both decrease.
• The bias increases and the variance decreases.
• The bias decreases and the variance increases.
• the attributes individually follow a Gaussian conditional probability distribution, given the class.
• the attributes individually follow a Gaussian probability distribution, independent of the class.
• the value of any attribute is statistically independent of the value of any other attribute (given the class value).
• the value of any attribute is statistically dependent of the value of any other attribute (given the class value).
• The model will likely overfit the data.
• Building the model will likely take a very long time on a standard laptop.
• The model will likely be too complex to interpret by humans.
• Building the model will require multiple GPU processors installed on a large server.
• For which examples will the model predict “yes”? (Select ALL correct answers)
• Outlook=rain, Wind=true, Humidity=60
• Outlook=overcast, Wind=false, Humidity=90
• Outlook=sunny, Wind=true, Humidity=60
• none of the examples above predict ‘true’
• you have a numerical label and numerical attributes.
• you have a binominal label and numerical attributes.
• you have a numerical label and polynominal attributes.
• the data is from a logistics use case.
• increasing the number of training cycles
• increasing the learning rate
• increasing the momentum
• GLM
• Naïve Bayes
• k-NN
• Decision Tree
• you have polynominal attributes with many values.
• you need to get the fastest runtime (Gain Ratio always has a shorter runtime than Information Gain).
• you have a relatively small data set (they will both take similar time to run but Gain Ratio always gives better performance over Information Gain).
• you want a criterion that takes Information Gain, and adjusts it for each attribute based on the number of possible values.
• This model had 67 false positive predictions.
• This model had 67 false negative predictions.
• This model was able to correctly predict 705 “BAD” values out of a total of 772 “BAD” values in the ExampleSet.
• Data scientists would consider this a ‘balanced’ data set.
• Linear Regression
• Naive Bayes
• k-NN
• GLM

## Topic: Validation & Scoring

• Label 1 points to the training set wire.
• Label 1 points to the testing set wire.
• Operator 2 is the operator that builds the model (e.g. Decision Tree, SVM, etc…)
• Operator 3 is the operator that builds the model (e.g. Decision Tree, SVM, etc…)
• ranking the performances of more than one model to choose the best one.
• applying a model to unseen data.
• using a model in production.
• determining whether or not a model is overfit.