Machine learning with Apache SystemML

 Module 1 – What is SystemML?

Question1: In machine learning, as analytical models are exposed to new data, they are able to independently adapt. True or false?

  • True
  • False

Question2: Which of the following are types of alternatives to SystemML?

  • R
  • MLlib
  • Spark R
  • Mahout
  • All of the above

Question3: The R language was designed for machine learning and works great for big data. True or false?

  • True
  • False

Module 2 – SystemML and the Spark MLContext

Question1: What the ways you can use SystemML’s Spark MLContext?

  • spark-shell
  • Through an application using the API
  • Through the SystemML console
  • A notebook interface
  • None of the above

Question 2: You must pass in the reference of the SparkContext to the MLContext constructor. True or false?

  • True
  • False

Question 3: Why would you use the Spark MLContext?

  • Programmatic interface into SystemML’s libraries
  • To benefit from the optimizations that come with SystemML
  • When you need to convert the data to a binary block matrix
  • A and B only
  • None of the above

 Module 3 – SystemML algorithms

Question1: The Classification algorithm of ensemble learning method that creates a model composed of a set of tree models for classification. True or false?

  • True
  • False

Question2: K-means is an unsupervised learning algorithm used to assign a category label to each record so that each similar record tend to get the same label. True or false.

  • True
  • False

Question3: The Kaplan-Meier algorithm predicts how likely it is someone will purchase a product of similar category. True or false?

  • True
  • False

 Module 4 – Declarative Machine Learning (DML)

Question 1: What does DML stand for?

  • Data manipulation language
  • Data machine language
  • Declarative machine learning
  • Declarative machine language

Question 2: To run a DML script, which of the following jar file is required at runtime?

  • MLContext.jar
  • DML.jar
  • SystemML.jar
  • spark-context.jar

Question 3: Which of the following way to pass command-line arguments is recommended?

  • positional arguments
  • named arguments
  • a comma separated list
  • a file

Module 5 – SystemML architecture and optimization

Question 1 : In the ALS performance comparison, at which dataset does the MLlib code run out of memory??

  • Large
  • Medium
  • Small
  • None

Question 2 : Which of the following does NOT belong to the SystemML Optimizer stack?

  • Create the RDDs for the high level algorithm
  • Compute memory estimates
  • Generate runtime program
  • Live variable analysis

Question 3 : How does SystemML know it is better to run the code on one machine?

  • Advanced Rewrites
  • Propagation of statistics
  • Live variable analysis
  • Efficient runtime
  • The developer tells it to

 Final Exam :

 Question 1 : What is machine learning?

  • Artificial intelligence for machines to make decisions
  • Same as data science to gather insight using machines
  • Enable computers to learn without being explicitly programmed
  • Learning about how machines operate

Question 2 : What is the purpose of SystemML?

  • Programming language for big data
  • In-memory analytics engine
  • Machine learning for spark
  • Machine learning on hadoop
  • All of the above

Question 3 : What are the challenges of machine learning on big data using R?

  • Programmers are needed to convert the high level code to low level code for parallel computing
  • Each iteration of the code takes time to be rewritten and recompile
  • Chances for errors are higher during the translation of the algorithms
  • All of the above

Question 4 : What is the vision of SystemML?

  • Run the same algorithm developed for small data on big data
  • Provide flexible algorithm of ML algorithms
  • Automatic generation of hybrid runtime plans
  • All of the above

Question 5 : Which of the following languages is SystemML most similar?

  • R
  • Python
  • Java
  • Scala
  • Perl
  • R and Python
  • Java and Scala

Question 6 : Which of the following line of code will launch the Spark shell with SystemML?

  • ./bin/spark-shell –jars SystemML.jar
  • ./bin/spark-shell –executor-memory 4G –jars SystemML.jar
  • ./bin/spark-shell –driver-memory 4G –jars SystemML.jar
  • ./bin/spark-shell –executor-memory 4G –driver-memory 4G –jars SystemML.jar
  • All of the above

Question 7 : Why would you convert a DataFrame to a binary-block matrix?

  • To enable parallelization within the Spark engine
  • To use the rich set of APIs provided by the binary-block matrix
  • Allows algorithm performance to be measured separately from data conversion time
  • Allows a more efficient runtime processing of the data

Question 8 : Which of the following is TRUE with regards to helper methods in SystemML?

  • SystemML’s output is encapsulated in the MLContext object
  • SystemML’s output is encapsulated in the MLOutput object
  • Helper methods retrieves the values from the MLOutput object
  • Helper methods retrieves the values from the MLContext object
  • A and D only
  • B and C only

Question 9 : Which is NOT a benefit of using SystemML algorithms?

  • Run in parallel
  • It is faster than all other algorithms
  • No need for translation into a lower level language
  • Algorithms are optimized based on data and cluster characteristics

Question 10 : Which of the following classes of algorithms provide a recommendation?

  • Regression
  • Classification
  • Matrix Factorization
  • Descriptive statistics

Question 11 : Which of the following algorithm can group a set of data into known categories?

  • Regression
  • Clustering
  • Survival Analysis
  • Classification

Question 12 : Which of the following algorithm can be used for prediction, forecasting, or error reduction?

  • Clustering
  • Regression
  • Survival Analysis
  • Descriptive statistics

Question 13 : Which of the following value typesis NOT supported in the DML language?

  • String
  • Double
  • Varchar
  • Boolean

Question 14 : Matrix-vector operations avoids the need for creating replicated matrix for a certain subset of operations. True or false?

  • True
  • False

Question 15 : Global variables cannot be access within a function. True or false?

  • True
  • False

Question 16 : Which of the following are NOT types of categories of built-in functions in DML?

  • Derivative built-in functions
  • Matrix built-in functions
  • Statistical built-in functions
  • Casting built-in functions

Question 17 : In the statistics propagation phase of the SystemML optimizer, what exactly is happening?

  • To determine the confidence level of the computed results
  • All the statistics is propagated to the top node to determine the most efficient runtime for query execution
  • To determine of probability of the operation succeeding within a given period of time
  • Find the widest matrix required and determine if it all fits into the heap.

Question 18 : What is the benefit of doing the matrix rewrite?

  • Reduce the line of code needed to represent the matrix
  • To determine the confidence level of the computed results
  • Clean up and unused memory from the matrix
  • To enable parallelization of the given matrixithin a given period of time
  • Represent the final matrix without computing the intermediate matrices

Question 19 : Which is NOT part of the SystemML runtime for Spark?

  • Automates critical performance decisions
  • Distributed vs. local runtime
  • Efficient linear algebra optimizations
  • Automated RDD caching
  • None of the above

Question 20 : SystemML is an Apache open source project. True or false

  • True
  • False

