#1
What is the first step in data analysis?
Data collection
ExplanationCollecting relevant data sources is the initial stage of data analysis.
#2
Which of the following is NOT a common data visualization technique?
Matrix multiplication
ExplanationMatrix multiplication is a mathematical operation and not a data visualization technique.
#3
Which of the following is a key component of the CRISP-DM framework for data mining?
Data Preprocessing
ExplanationData preprocessing involves transforming raw data into a format suitable for analysis, often including cleaning, normalization, and feature selection.
#4
What is the primary goal of data aggregation in data analysis?
To simplify data manipulation
ExplanationData aggregation involves combining and summarizing data to provide a more concise view, simplifying further analysis and interpretation.
#5
What is the primary goal of feature engineering in machine learning?
To create new features from existing ones
ExplanationFeature engineering involves creating new input features from existing ones to improve model performance.
#6
Which of the following is a data management technique for handling missing values?
All of the above
ExplanationThe given options (mean imputation, forward filling, and backward filling) are techniques to manage missing values in a dataset.
#7
What does 'EDA' stand for in data analysis?
Exploratory Data Analysis
ExplanationEDA involves analyzing datasets to summarize their main characteristics, often with visual methods.
#8
Which statistical measure is used to describe the spread or dispersion of a dataset?
Standard deviation
ExplanationStandard deviation indicates the extent of deviation for a group as a whole.
#9
What is the purpose of a pivot table in data analysis?
To arrange and summarize data
ExplanationA pivot table is a data summarization tool used in spreadsheet programs that allows you to arrange and summarize selected columns and rows of data.
#10
In linear regression, what does the 'R-squared' value indicate?
The goodness of fit of the regression model
ExplanationR-squared measures the proportion of the variance in the dependent variable that is predictable from the independent variables.
#11
What is the purpose of normalization in data analysis?
To scale data to a standard range
ExplanationNormalization adjusts values in a dataset to a common scale without distorting differences in the ranges of values.
#12
What is the purpose of cross-validation in machine learning?
To evaluate model performance on unseen data
ExplanationCross-validation is a technique used to assess how well a predictive model generalizes to an independent dataset.
#13
Which statistical test is used to determine if there is a significant difference between the means of two groups?
t-test
ExplanationThe t-test is a statistical test used to determine if there is a significant difference between the means of two groups.
#14
What is the primary purpose of a decision tree in data analysis?
To represent complex decision logic
ExplanationDecision trees are used to model and visualize the decision-making process in a tree-like structure, showing possible outcomes of decisions.
#15
Which algorithm is commonly used for imputing missing values in a dataset?
K-nearest Neighbors (KNN)
ExplanationK-nearest Neighbors is a simple algorithm that imputes missing values based on the values of its k-nearest neighbors in the feature space.