#1
What is the primary goal of data preprocessing in data analytics?
To clean and transform data
ExplanationPreparing raw data for analysis by removing noise and inconsistencies.
#2
What is the purpose of a scatter plot in data visualization?
To display the relationship between two variables
ExplanationVisualizing the correlation or association between two variables.
#3
What is the main advantage of using the Naive Bayes algorithm in text classification?
It requires minimal training time
ExplanationEfficient algorithm based on simple probabilistic assumptions.
#4
Which of the following is a common technique for outlier detection in a dataset?
Z-score normalization
ExplanationStandardizing data to identify data points significantly different from the mean.
#5
What is the purpose of the SQL GROUP BY clause in data analysis?
To group rows based on a column's values
ExplanationAggregating data by common attributes to perform analysis.
#6
Which statistical measure provides a central tendency for a dataset?
Mean
ExplanationAverage value of a dataset, indicating its typical value.
#7
Which algorithm is commonly used for classification tasks in machine learning?
Decision Trees
ExplanationTree-like models to classify data points based on features.
#8
What is the purpose of the term 'dimensionality reduction' in data analytics?
To decrease the number of features while retaining key information
ExplanationReducing the number of input variables to simplify analysis.
#9
In machine learning, what does the term 'overfitting' refer to?
Model is too complex and fits noise in the training data
ExplanationModel capturing noise or random fluctuations in training data.
#10
What is the purpose of regularization techniques in machine learning?
To penalize complex models and prevent overfitting
ExplanationIntroducing a penalty term to prevent model complexity.
#11
What is the primary purpose of the 'ELT' (Extract, Load, Transform) process in data analytics?
To clean and transform data
ExplanationProcessing raw data to make it suitable for analysis.
#12
What is the role of the 'Hadoop Distributed File System (HDFS)' in big data analytics?
To store and manage large datasets across a distributed environment
ExplanationDistributed file system for storing and processing big data.
#13
Which statistical test is suitable for comparing means of two independent groups in data analytics?
T-test
ExplanationAssessing if there's a significant difference between group means.