Home Certification Data Science with Scala Cognitive Class Exam Answers

Data Science with Scala Cognitive Class Exam Answers

Data Science with Scala cognitive class exam answers

Enroll Now : Data Science with Scala Cognitive Class

Course : Data Science with Scala

Module 1: Basic Statistics and Data Types

Question 1 : You import MLlib’s vectors from ?

  • org.apache.spark.mllib.TF
  • org.apache.spark.mllib.numpy
  • org.apache.spark.mllib.linalg
  • org.apache.spark.mllib.pandas

Question 2 :Select the types of distributed Matrices :

  • RowMatrix
  • IndexedRowMatrix
  • CoordinateMatrix

Question 3 :How would you caculate the mean of the following ?

val observations: RDD[Vector] = sc.parallelize(Array(

Vectors.dense(1.0, 2.0),

Vectors.dense(4.0, 5.0),

Vectors.dense(7.0, 8.0)))

val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)

  • summary.normL1
  • summary.numNonzeros
  • summary.mean
  • summary.normL2

Question 4 :what task does the following lines of code?

import org.apache.spark.mllib.random.RandomRDDs._

val million = poissonRDD(sc, mean=1.0, size=1000000L, numPartitions=10)

  • Calculate the variance
  • calculate the mean
  • generate random samples
  • Calculate the variance

Question 5 : MLlib uses the compressed sparse column format for sparse matrices, as Such it only keeps the non-zero entrees?

  • True
  • False

Module 2: Preparing Data

Question 1 : WFor a dataframe object the method describe calculates the ?

  • count
  • mean
  • standard deviation
  • max
  • min
  • all of the above

Question 2:What line of code drops the rows that contain null values, select the best answer ?

  • val dfnan = df.withColumn(“nanUniform”, halfTonNaN(df(“uniform”)))
  • dfnan.na.replace(“uniform”, Map(Double.NaN -> 0.0))
  • dfnan.na.drop(minNonNulls = 3)
  • dfnan.na.fill(0.0)

Question 3:What task does the following lines of code perform ?

val lr = new LogisticRegression()


val model1 = lr.fit(training)

  • perform one hot encoding
  • Train a linear regression model
  • Train a Logistic regression model
  • Perform PCA on the data

Question 4: The StandardScaleModel transforms the data such that ?

  • each feature has a max value of 1
  • each feature is Orthogonal
  • each feature to have a unit standard deviation and zero mean
  • each feature has a min value of -1

Module 3: Feature Engineering

Question 1: Spark ML works with?

  • tensors
  • vectors
  • dataframes
  • lists

Question 2:the function IndexToString() performs One hot encoding?

  • True
  • False

Question 3: Principal Component Analysis is Primarily used for ?

  • to convert categorical variables to integers
  • to predict discrete values
  • dimensionality reduction

Question 4: one import set prior to using PCA is ?

  • normalizing your data
  • making sure every feature is not correlated
  • taking the log for your data
  • subtracting the mean

 Module 4: Fitting a Model

  Question 1 : You can use decision trees for ?

  • regression
  • classification
  • classification and regression
  • data normalization

 Question 2 : the following lines of code: val Array(trainingData, testData) = data.randomSplit(Array(0.7, 0.3))

  • split the data into training and testing data
  • train the model
  • use 70% of the data for testing
  • use 30% of the data for training
  • make a prediction

 Question 3 : in the Random Forest Classifier constructor .setNumTrees() ?

  • sets the max depth of trees
  • sets the minimum number of classes before a split
  • set the number of trees

 Question 4 : Elastic net regularization uses ?

  • L0-norm
  • L1-norm
  • L2-norm
  • a convex combination of the L1 norm and L2 norm

 Module 5: Pipeline and Grid Search

Question 1 : what task does the following code perform: withColumn(“paperscore”, data(“A2”) * 4 + data(“A”) * 3) ?

  • add 4 colunms to A2
  • add 3 colunms to A1
  • add 4 to each elment in colunm A2
  • assign a higher weight to A2 and A journals

Question 2:In an estimator ?

  • there is no need to call the method fit
  • fit function is called
  • transform fuction is only called

Question 3: Which is not a valid type of Evaluator in MLlib?

  • RegressionEvaluator
  • MultiClassClassificationEvaluator
  • MultiLabelClassificationEvaluator
  • BinaryClassificationEvaluator
  • All are valid

Question 4: In the following lines of code, the last transform in the pipeline is a:

val rf = new RandomForestClassifier().setFeaturesCol(“assembled”).setLabelCol(“status”).setSeed(42)

import org.apache.spark.ml.Pipeline

val pipeline = new Pipeline().setStages(Array(value_band_indexer,category_indexer,label_indexer,assembler,rf))

  • principal component analysis
  • Vector Assembler
  • String Indexer
  • Vector Assembler
  • Random Forest Classifier

Final Exam Answers

Question 1

What is not true about labeled points?

  • They associate sparse vectors with a corresponding label/response
  • They associate dense vectors with a corresponding label/response
  • They are used in unsupervised machine learning algorithms
  • All are true
  • None are true

Question 2

Which is true about column pointers in sparse matrices?

  • By themselves, they do not represent the specific physical location of a value in the matrix
  • They never repeat values
  • They have the same number of values as the number of columns
  • All are true
  • None are true

Question 3

What is the name of the most basic type of distributed matrix?

  • CoordinateMatrix
  • IndexedRowMatrix
  • SparseMatrix
  • SimpleMatrix
  • RowMatrix

Question 4

A perfect correlation is represented by what value?

  • 3
  • 1
  • -1
  • 100
  • 0

Question 5

A MinMaxScaler is a transformer which:

  • Rescales each feature to a specific range
  • Takes no parameters
  • Makes zero values remain untransformed
  • All are true
  • None are true

Question 6

Which is not a supported Random Data Generation distribution?

  • Poisson
  • Uniform
  • Exponential
  • Delta
  • Normal

Question 7

Sampling without replacement means:

  • The expected number of times each element is chosen is randomized
  • The expected size of the sample is a fraction of the RDDs size
  • The expected number of times each element is chosen
  • The expected size of the sample is unknown
  • The expected size of the sample is the same as the RDDs size

Question 8

What are the supported types of hypothesis testing?

  • Pearson’s Chi-Squared Test for goodness of fit
  • Pearson’s Chi-Squared Test for independence
  • Kolmogorov-Smirnov test for equality of distribution
  • All are supported
  • None are supported

Question 9

For Kernel Density Estimation, which kernel is supported by Spark?

  • KDEMultivariate
  • KDEUnivariate
  • Gaussian
  • KernelDensity
  • All are supported

Question 10

Which DataFrames statistics method computes the pairwise frequency table of the given columns?

  • freqItems()
  • cov()
  • crosstab()
  • pairwiseFreq()
  • corr()

Question 11

Which is not true about the fill method for DataFrame NA functions?

  • It is used for replacing NaN values
  • It is used for replacing nil values
  • It is used for replacing null values
  • All are true
  • None are true

Question 12

Which transformer listed below is used for Natural Language processing?

  • StandardScaler
  • OneHotEncoder
  • ElementwiseProduct
  • Normalizer
  • None are used for Natural Language processing

Question 13

Which is true about the Mahalanobis Distance?

  • It is a scale-variant distance
  • It does not take into account the correlations of the dataset
  • It is measured along each Principle Component axis
  • It is a multi-dimensional generalization of measuring how many standard deviations a point is away from the median
  • It has units of distance

Question 14

Which is true about OneHotEncoder?

  • It must be told which column to create for its output
  • It creates a Sparse Vector
  • It must be told which column is its input
  • All are true
  • None are true

Question 15

Principle Component Analysis is:

  • Is never used for feature engineering
  • Used for supervised machine learning
  • A dimension reduction technique
  • All are true
  • None are true

Question 16

MLlib’s implementation of decision trees:

  • Supports only multiclass classification
  • Does not support regressions
  • Partitions data by rows, allowing distributed training
  • Supports only continuous features
  • None are true

Question 17

Which is not a tunable of SparkML decision trees?

  • maxBins
  • maxMemoryInMB
  • minInstancesPerNode
  • minDepth
  • minInfoGain

Question 18

Which is true about Random Forests?

  • They support non-categorical features
  • They combine many decision trees in order to reduce the risk of overfitting
  • They do not support regression
  • They only support binary classification
  • None are true

Question 19

When comparing Random Forest versus Gradient-Based Trees, what must you consider?

  • How the number of trees affects the outcome
  • Depth of Trees
  • Parallelization abilities
  • All of these
  • None of these

Question 20

Which is not a valid type of Evaluator in MLlib?

  • MultiLabelClassificationEvaluator
  • RegressionEvaluator
  • BinaryClassificationEvaluator
  • MultiClassClassificationEvaluator
  • All are valid


Please enter your comment!
Please enter your name here