Data Science with Scala Cognitive Class Exam Quiz Answers

Clear My Certification February 4, 2021 Certification, Cognitive Class, Featured Leave a comment 2,419 Views

Enroll Now: Data Science with Scala Cognitive Class Exam Quiz Answers

Data Science with Scala Cognitive Class Certification Answersw

Module 1: Basic Statistics and Data Types

Question 1: You import MLlib’s vectors from?

org.apache.spark.mllib.TF
org.apache.spark.mllib.numpy
org.apache.spark.mllib.linalg
org.apache.spark.mllib.pandas

Question 2: Select the types of distributed Matrices:

RowMatrix
IndexedRowMatrix
CoordinateMatrix

Question 3: How would you caculate the mean of the following?

val observations: RDD[Vector] = sc.parallelize(Array(

Vectors.dense(1.0, 2.0),

Vectors.dense(4.0, 5.0),

Vectors.dense(7.0, 8.0)))

val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)

summary.normL1
summary.numNonzeros
summary.mean
summary.normL2

Question 4: What task does the following lines of code?

import org.apache.spark.mllib.random.RandomRDDs._

val million = poissonRDD(sc, mean=1.0, size=1000000L, numPartitions=10)

Calculate the variance
calculate the mean
generate random samples
Calculate the variance

Question 5: MLlib uses the compressed sparse column format for sparse matrices, as Such it only keeps the non-zero entrees?

True
False

Module 2: Preparing Data

Question 1: For a dataframe object the method describe calculates the?

count
mean
standard deviation
max
min
all of the above

Question 2: What line of code drops the rows that contain null values, select the best answer?

val dfnan = df.withColumn(“nanUniform”, halfTonNaN(df(“uniform”)))
dfnan.na.replace(“uniform”, Map(Double.NaN -> 0.0))
dfnan.na.drop(minNonNulls = 3)
dfnan.na.fill(0.0)

Question 3: What task does the following lines of code perform?

val lr = new LogisticRegression()

lr.setMaxIter(10).setRegParam(0.01)

val model1 = lr.fit(training)

perform one hot encoding
Train a linear regression model
Train a Logistic regression model
Perform PCA on the data

Question 4: The StandardScaleModel transforms the data such that?

each feature has a max value of 1
each feature is Orthogonal
each feature to have a unit standard deviation and zero mean
each feature has a min value of -1

Module 3: Feature Engineering

Question 1: Spark ML works with?

tensors
vectors
dataframes
lists

Question 2: The function IndexToString() performs One hot encoding?

True
False

Question 3: Principal Component Analysis is Primarily used for?

to convert categorical variables to integers
to predict discrete values
dimensionality reduction

Question 4: One import set prior to using PCA is?

normalizing your data
making sure every feature is not correlated
taking the log for your data
subtracting the mean

Module 4: Fitting a Model

Question 1: You can use decision trees for?

regression
classification
classification and regression
data normalization

Question 2: The following lines of code: val Array(trainingData, testData) = data.randomSplit(Array(0.7, 0.3))

split the data into training and testing data
train the model
use 70% of the data for testing
use 30% of the data for training
make a prediction

Question 3: In the Random Forest Classifier constructor .setNumTrees()?

sets the max depth of trees
sets the minimum number of classes before a split
set the number of trees

Question 4: Elastic net regularization uses?

L0-norm
L1-norm
L2-norm
a convex combination of the L1 norm and L2 norm

Module 5: Pipeline and Grid Search

Question 1: What task does the following code perform: withColumn(“paperscore”, data(“A2”) * 4 + data(“A”) * 3)?

add 4 colunms to A2
add 3 colunms to A1
add 4 to each elment in colunm A2
assign a higher weight to A2 and A journals

Question 2: In an estimator?

there is no need to call the method fit
fit function is called
transform fuction is only called

Question 3: Which is not a valid type of Evaluator in MLlib?

RegressionEvaluator
MultiClassClassificationEvaluator
MultiLabelClassificationEvaluator
BinaryClassificationEvaluator
All are valid

Question 4: In the following lines of code, the last transform in the pipeline is a:

val rf = new RandomForestClassifier().setFeaturesCol(“assembled”).setLabelCol(“status”).setSeed(42)

import org.apache.spark.ml.Pipeline

val pipeline = new Pipeline().setStages(Array(value_band_indexer,category_indexer,label_indexer,assembler,rf))

principal component analysis
Vector Assembler
String Indexer
Vector Assembler
Random Forest Classifier

Data Science with Scala Final Exam Answers – Cognitive Class

Question 1: What is not true about labeled points?

They associate sparse vectors with a corresponding label/response
They associate dense vectors with a corresponding label/response
They are used in unsupervised machine learning algorithms
All are true
None are true

Question 2: Which is true about column pointers in sparse matrices?

By themselves, they do not represent the specific physical location of a value in the matrix
They never repeat values
They have the same number of values as the number of columns
All are true
None are true

Question 3: What is the name of the most basic type of distributed matrix?

CoordinateMatrix
IndexedRowMatrix
SparseMatrix
SimpleMatrix
RowMatrix

Question 4: A perfect correlation is represented by what value?

3
1
-1
100
0

Question 5: A MinMaxScaler is a transformer which:

Rescales each feature to a specific range
Takes no parameters
Makes zero values remain untransformed
All are true
None are true

Question 6: Which is not a supported Random Data Generation distribution?

Poisson
Uniform
Exponential
Delta
Normal

Question 7: Sampling without replacement means:

The expected number of times each element is chosen is randomized
The expected size of the sample is a fraction of the RDDs size
The expected number of times each element is chosen
The expected size of the sample is unknown
The expected size of the sample is the same as the RDDs size

Question 8: What are the supported types of hypothesis testing?

Pearson’s Chi-Squared Test for goodness of fit
Pearson’s Chi-Squared Test for independence
Kolmogorov-Smirnov test for equality of distribution
All are supported
None are supported

Question 9: For Kernel Density Estimation, which kernel is supported by Spark?

KDEMultivariate
KDEUnivariate
Gaussian
KernelDensity
All are supported

Question 10: Which DataFrames statistics method computes the pairwise frequency table of the given columns?

freqItems()
cov()
crosstab()
pairwiseFreq()
corr()

Question 11: Which is not true about the fill method for DataFrame NA functions?

It is used for replacing NaN values
It is used for replacing nil values
It is used for replacing null values
All are true
None are true

Question 12: Which transformer listed below is used for Natural Language processing?

StandardScaler
OneHotEncoder
ElementwiseProduct
Normalizer
None are used for Natural Language processing

Question 13: Which is true about the Mahalanobis Distance?

It is a scale-variant distance
It does not take into account the correlations of the dataset
It is measured along each Principle Component axis
It is a multi-dimensional generalization of measuring how many standard deviations a point is away from the median
It has units of distance

Question 14: Which is true about OneHotEncoder?

It must be told which column to create for its output
It creates a Sparse Vector
It must be told which column is its input
All are true
None are true

Question 15: Principle Component Analysis is:

Is never used for feature engineering
Used for supervised machine learning
A dimension reduction technique
All are true
None are true

Question 16: MLlib’s implementation of decision trees:

Supports only multiclass classification
Does not support regressions
Partitions data by rows, allowing distributed training
Supports only continuous features
None are true

Question 17: Which is not a tunable of SparkML decision trees?

maxBins
maxMemoryInMB
minInstancesPerNode
minDepth
minInfoGain

Question 18: Which is true about Random Forests?

They support non-categorical features
They combine many decision trees in order to reduce the risk of overfitting
They do not support regression
They only support binary classification
None are true

Question 19: When comparing Random Forest versus Gradient-Based Trees, what must you consider?

How the number of trees affects the outcome
Depth of Trees
Parallelization abilities
All of these
None of these

Question 20: Which is not a valid type of Evaluator in MLlib?

MultiLabelClassificationEvaluator
RegressionEvaluator
BinaryClassificationEvaluator
MultiClassClassificationEvaluator
All are valid

Introduction to Data Science with Scala

Data science with Scala can be quite powerful, leveraging Scala’s strong functional programming capabilities and compatibility with the Java ecosystem. While Python is more commonly associated with data science due to its extensive libraries and ease of use, Scala can be a great choice for certain use cases, especially when working with large-scale distributed systems or when integration with existing Java codebases is important.

Here are some key aspects of doing data science with Scala:

Libraries: Scala has several libraries for data manipulation, analysis, and machine learning. Apache Spark is perhaps the most prominent, providing a distributed computing framework that is highly scalable and efficient. Scala also has libraries like Breeze for numerical computing and ScalaNLP for natural language processing tasks.
Functional Programming: Scala’s functional programming features, such as immutability and higher-order functions, can make code more concise and maintainable. This can be particularly useful when dealing with complex data transformations and analysis pipelines.
Integration with Java: Scala runs on the Java Virtual Machine (JVM), which means it seamlessly interoperates with Java libraries and frameworks. This can be advantageous when working in environments where Java is already heavily used, or when leveraging existing Java code for data processing tasks.
Type Safety: Scala’s static typing system can help catch errors at compile time, reducing the likelihood of runtime errors in data processing pipelines. This can be especially valuable when working with large, complex datasets where errors can be costly.
Concurrency and Parallelism: Scala provides powerful abstractions for concurrent and parallel programming, which can be beneficial when dealing with large-scale data processing tasks. This is particularly important in the context of distributed computing frameworks like Apache Spark, where efficient parallelism is crucial for performance.
Tooling: While Python has a richer ecosystem of data science libraries and tools, Scala is supported by popular integrated development environments (IDEs) like IntelliJ IDEA and Scala-specific tools like sbt for build automation. Additionally, there are emerging tools and libraries aimed at making data science workflows in Scala more productive.

Overall, while Scala may not be as commonly associated with data science as Python, it offers a compelling set of features and capabilities for building robust and scalable data processing pipelines, especially in environments where integration with existing Java codebases or efficient distributed computing is required.

Priya Dogra – Certification | Jobs | Internships

Data Science with Scala Cognitive Class Exam Quiz Answers

Related Articles

Enroll Now: Data Science with Scala Cognitive Class Exam Quiz Answers

Data Science with Scala Cognitive Class Certification Answersw

Module 1: Basic Statistics and Data Types

Module 2: Preparing Data

Module 3: Feature Engineering

Module 4: Fitting a Model

Module 5: Pipeline and Grid Search

Data Science with Scala Final Exam Answers – Cognitive Class

Introduction to Data Science with Scala

About Clear My Certification

Check Also

Microsoft Free Certification | 100% Discount Voucher

Leave a Reply Cancel reply

Download Video Marketing Blaster Pro 1.49 Free

Latest Off Page SEO Techniques 2026 | How to Rank your Website in Search Engine

Six Sigma Black Belt Certification Answers – GreyCampus

Best CCNA Training in Chandigarh | ITRONIX SOLUTIONS

Vodafone Idea Foundation & VOIS CSR Cybersecurity Internship 2026

Multi-Cloud Network Architecture Quiz Answers

I Pledge to Fight Against The Stigma of Covid-19 Pledge Certificate

Youtube Music Assessment Answers 2026

Sell More Online Quiz Answers – Google Fundamentals of Digital Marketing

Accenture Artificial Intelligence Free Course with Certificate

Vodafone Idea Foundation & VOIS CSR Cybersecurity Internship 2026

Microsoft Elevate Emerging Tech Internship 2026 – AI, ML, Azure & Automation | 4–6 Week Virtual Program

RBI Internship 2026 Notification | ₹45,000 Monthly Stipend, 6 Months Duration

Microsoft Free Certification | 100% Discount Voucher

SRIP IIT Gandhinagar 2026 | Apply for Summer Research Internship Programme