Monday , September 16 2024
Breaking News

Data Analysis with Python Cognitive Class Exam Quiz Answers

Data Analysis with Python Cognitive Class Certification Answers

Question 1: What does CSV stand for?

  • Comma-separated values
  • Car sold values
  • Car state values
  • None of the above

Question 2: In the data set, which of the following represents an attribute or feature?

  • Row
  • Column
  • Each element in the dataset

Question 3: What is the name of what we want to predict?

  • Target
  • Feature
  • Dataframe

Question 4: What is the command to display the first five rows of a dataframe df?

  • df.head()
  • df.tail()

Question 5: What command do you use to get the data type of each row of the dataframe df?

  • df.dtypes
  • df.head()
  • df.tail()

Question 6: How do you get a statistical summary of a dataframe df?

  • df.describe()
  • df.head()
  • df.tail()

Question 7: If you use the method describe() without changing any of the arguments, you will get a statistical summary of all the columns of type “object”.

  • False
  • True

Question 1: Consider the dataframe df. What is the result of the following operation: df[‘symbolling’] = df[‘symbolling’] + 1?

  • Every element in the column “symbolling” will increase by one.
  • Every element in the row “symbolling” will increase by one.
  • Every element in the dataframe will increase by one.

Question 2: Consider the dataframe df. What does the command df.rename(columns={‘a’:’b’}) change about the dataframe df?

  • Renames column “a” of the dataframe to “b”.
  • Renames row “a” to “b”.
  • Nothing. You must set the parameter “inplace = True”.

Question 3: Consider the dataframe “df”. What is the result of the following operation df[‘price’] = df[‘price’].astype(int)?

  • Convert or cast the row ‘price’ to an integer value.
  • Convert or cast the column ‘price’ to an integer value.
  • Convert or cast the entire dataframe to an integer value.

Question 4: Consider the column of the dataframe df[‘a’]. The column has been standardized. What is the standard deviation of the values as a result of applying the following operation: df[‘a’].std()?

  • 1
  • 0
  • 3

Question 5: Consider the column of the dataframe, df[‘Fuel’], with two values: ‘gas’ and’ diesel’. What will be the name of the new columns pd.get_dumies(df[‘Fuel’]) ?

  • 1 and 0
  • Just ‘diesel’
  • Just ‘gas’
  • ‘gas’ and ‘diesel’

Question 6: What are the values of the new columns from part?

  • 1 and 0
  • Just ‘diesel’
  • Just ‘gas’
  • ‘gas’ and ‘diesel’

Question 1: Consider the dataframe “df”. What method provides the summary statistics?

  • df.describe()
  • df.head()
  • df.tail()

Question 2: Consider the following dataframe:

df_test = df[‘body-style’, ‘price’]

The following operation is applied:

df_grp = df_test.groupby([‘body-style’], as_index=False).mean()

What are resulting values of df_grp[‘price’]?

  • The average price for each body style.
  • The average price.
  • The average body style.

Question 3: Correlation implies causation:

  • False
  • True

Question 4: What is the minimum possible value of Pearson’s Correlation?

  • 1
  • -100
  • -1

Question 5: What is the Pearson correlation between variables X and Y if X=Y:

  • -1
  • 1
  • 0

Question 1: Let X be a dataframe with 100 rows and 5 columns. Let y be the target with 100 samples. Assuming all the relevant libraries and data have been imported, the following line of code has been executed:

LR = LinearRegression()

LR.fit(X, y)

yhat = LR.predict(X)

How many samples does yhat contain?

  • 5
  • 500
  • 100
  • 0

Question 2: What value of R^2 (coefficient of determination) indicates your model performs best?

  • -100
  • -1
  • 0
  • 1

Question 3: Which statement is true about polynomial linear regression?

  • Polynomial linear regression is not linear in any way.
  • Although the predictor variables of polynomial linear regression are not linear, the relationship between the parameters or coefficients is linear.
  • Polynomial linear regression uses wavelets.

Question 4: The larger the mean squared error, the better your model performs:

  • False
  • True

Question 5: Assume all the libraries are imported. y is the target and X is the features or dependent variables. Consider the following lines of code:

Input=[(‘scale’,StandardScaler()),(‘model’,LinearRegression())]

pipe=Pipeline(Input)

pipe.fit(X,y)

ypipe=pipe.predict(X)

What is the result of ypipe?

  • Polynomial transform, standardize the data, then perform a prediction using a linear regression model.
  • Standardize the data, then perform prediction using a linear regression model.
  • Polynomial transform, then standardize the data.

Question 1: In the following plot, the vertical axis shows the mean square error and the horizontal axis represents the order of the polynomial. The red line represents the training error the blue line is the test error. What is the best order of the polynomial given the possible choices in the horizontal axis?

  • 2
  • 8
  • 16

Question 2: What is the correct use of the “train_test_split” function such that 40% of the data samples will be utilized for testing; the parameter “random_state” is set to zero; and the input variables for the features and targets are_data, y_data respectively?

  • train_test_split(x_data, y_data, test_size=0, random_state=0.4)
  • train_test_split(x_data, y_data, test_size=0.4, random_state=0)
  • train_test_split(x_data, y_data)

Question 3: What is the output of cross_val_score(lre, x_data, y_data, cv=2)?

  • The predicted values of the test data using cross-validation.
  • The average R^2 on the test data for each of the two folds.
  • This function finds the free parameter alpha.

Question 4: What is the code to create a ridge regression object “RR” with an alpha term equal 10?

  • RR=LinearRegression(alpha=10)
  • RR=Ridge(alpha=10)
  • RR=Ridge(alpha=1)

Question 5: What dictionary value would we use to perform a grid search for the following values of alpha: 1,10, 100? No other parameter values should be tested.

  • alpha=[1,10,100]
  • [{‘alpha’: [1,10,100]}]
  • [{‘alpha’: [0.001,0.1,1, 10, 100, 1000,10000,100000,100000],’normalize’:[True,False]} ]

Question 1: What does the following command do?

df.dropna(subset=[“price”], axis=0)

  • Drop the “not a number” values from the column “price”.
  • Drop the row “price”.
  • Rename the dataframe “price”.

Question 2: How would you provide many of the summary statistics for all the columns in the dataframe “df”?

  • df.describe(include = “all”)
  • df.head()
  • type(df)
  • df.shape

Question 3: How would you find the shape of the dataframe df?

  • df.describe()
  • df.head()
  • type(df)
  • df.shape

Question 4: What task does the following command, df.to_csv(“A.csv”), perform:

  • Change the name of the column to “A.csv”.
  • Load the data from a csv file called “A” into a dataframe.
  • Save the dataframe df to a csv file called “A.csv”.

Question 5: What task does the following line of code perform?

result = np.linspace(min(df[“city-mpg”]), max(df[“city-mpg”]), 5)

  • Builds a bin array ranging from the smallest value to the largest value of “city-mpg” in order to build 4 bins of equal length.
  • Builds a bin array ranging from the smallest value to the largest value of “city-mpg” in order to build 5 bins of equal length.
  • Determines which bin each value of “city-mpg” belongs to.

Question 6: What task does the following line of code perform:

df[‘peak-rpm’].replace(np.nan, 5,inplace=True)

  • Replace the “not a number” values with 5 in the column ‘peak-rpm’.
  • Rename the column ‘peak-rpm’ to 5.
  • Add 5 to the dataframe.

Question 7: How do you “one-hot encode” the column ‘fuel-type’ in the dataframe df?

  • pd.get_dummies(df[“fuel-type”])
  • df.mean([“fuel-type”])
  • df[df[“fuel-type”])==1 ]=1

Question 8: What does the vertical axis on a scatterplot represent?

  • Independent variable
  • Dependent variable

Question 9: What does the horizontal axis on a scatterplot represent?

  • Independent variable
  • Dependent variable

Question 10: If we have 10 columns and 100 samples, how large is the output of df.corr()?

  • 10 x 100
  • 10 x 10
  • 100×100
  • 100×100

Question 11: What is the largest possible element resulting in the following operation “df.corr()”?

  • 100
  • 1000
  • 1

Question 12: If the Pearson Correlation of two variables is zero:

  • The two variable have zero mean.
  • The two variables are not correlated.

Question 13: If the p-value of the Pearson Correlation is 1:

  • The variables are correlated.
  • The variables are not correlated.
  • None of the above.

Question 14: What does the following line of code do: lm = LinearRegression()?

  • Fit a regression object “lm”.
  • Create a linear regression object.
  • Predict a value.

Question 15: If the predicted function is:

Yhat = a + b1 X1 + b2 X2 + b3 X3 + b4 X4

The method is:

  • Polynomial Regression
  • Multiple Linear Regression

Question 16: What steps do the following lines of code perform:

Input=[(‘scale’,StandardScaler()),(‘model’,LinearRegression())]

pipe=Pipeline(Input)

pipe.fit(Z,y)

ypipe=pipe.predict(Z)

  • Standardize the data, then perform a polynomial transform on the features Z.
  • Find the correlation between Z and y.
  • Standardize the data, then perform a prediction using a linear regression model using the features Z and targets y.

Question 17: What is the maximum value of R^2 that can be obtained?

  • 10
  • 1
  • 0

Question 18: We create a polynomial feature PolynomialFeatures(degree=2). What is the order of the polynomial?

  • 0
  • 1
  • 2

Question 19: You have a linear model. The average R^2 value on your training data is 0.5. You perform a 100th order polynomial transform on your data, then use these values to train another model. Your average R^2 is 0.99. Which comment is correct?

  • 100th order polynomial will work better on unseen data.
  • You should always use the simplest model.
  • The results on your training data is not the best indicator of how your model performs. You should use your test data to get a beter idea.

Question 20: You train a ridge regression model. You get a R^2 of 1 on your validation data and you get a R^2 of 0.5 on your training data. What should you do?

  • Nothing. Your model performs flawlessly on your validation data.
  • Your model is under fitting perform a polynomial transform.
  • Your model is overfitting, increase the parameter alpha.

Introduction to Data Analysis with Python

Data analysis is the process of inspecting, cleaning, transforming, and modeling data to discover useful information, infer conclusions, and support decision-making. Python, with its rich ecosystem of libraries, is widely used for data analysis tasks due to its simplicity, versatility, and the availability of powerful libraries such as Pandas, NumPy, Matplotlib, and Seaborn.

Here’s a step-by-step guide to get started with data analysis in Python:

  1. Installing Python: If you haven’t already, download and install Python from the official website (python.org). Optionally, you can use distributions like Anaconda or Miniconda, which come pre-packaged with many useful data analysis libraries.
  2. Installing Libraries: After installing Python, you’ll need to install the necessary libraries. The primary libraries for data analysis in Python are:
    • Pandas: For data manipulation and analysis.
    • NumPy: For numerical computing and working with arrays.
    • Matplotlib: For creating static, interactive, and animated visualizations.
    • Seaborn: For statistical data visualization built on top of Matplotlib.
  3. Getting Data: Before analyzing data, you need some data to work with. Data can come from various sources like CSV files, Excel files, SQL databases, web APIs, or online repositories. Pandas provides functions to read data from different formats, such as read_csv(), read_excel(), read_sql(), etc.
  4. Exploratory Data Analysis (EDA): EDA involves understanding your data by summarizing its main characteristics using statistical and visual methods. Explore the data’s structure, distribution, relationships, and anomalies. Pandas provides functions like head(), describe(), info(), and various plotting functions to aid in EDA.
  5. Data Cleaning: Data cleaning involves handling missing values, removing duplicates, correcting erroneous data, and transforming data into a suitable format for analysis. Pandas provides methods like isna(), dropna(), fillna(), drop_duplicates(), and many more for data cleaning tasks.
  6. Data Visualization: Visualizations help in understanding patterns and relationships within the data. Matplotlib and Seaborn are powerful libraries for creating various types of plots such as line plots, scatter plots, histograms, bar plots, box plots, and more.
  7. Data Analysis: Perform data analysis tasks such as aggregation, filtering, grouping, and computation of summary statistics using Pandas. You can also apply machine learning algorithms using libraries like Scikit-learn for predictive analytics or statistical modeling.
  8. Communication: Communicate your findings effectively through reports, dashboards, or presentations. Jupyter Notebooks are widely used for creating interactive documents that combine code, visualizations, and text explanations.
  9. Practice and Learn: Data analysis is a skill that improves with practice. Keep exploring new datasets, experimenting with different techniques, and learning from tutorials, books, and online courses to enhance your proficiency in data analysis with Python.

About Clear My Certification

Check Also

Controlling Hadoop Jobs using Oozie Cognitive Class Exam Quiz Answers

Enroll Here: Controlling Hadoop Jobs using Oozie Cognitive Class Exam Quiz Answers Controlling Hadoop Jobs …

Leave a Reply

Your email address will not be published. Required fields are marked *