#1
Which of the following measures is affected by extreme values?
Range
ExplanationRange is sensitive to extreme values as it is the difference between the maximum and minimum values in a dataset.
#2
What is the primary purpose of a scatter plot?
To compare two variables
ExplanationScatter plots are primarily used to visually compare the relationship or correlation between two variables.
#3
Which of the following is a measure of central tendency?
Mean
ExplanationMean is a measure of central tendency that represents the average of a dataset.
#4
What does the interquartile range measure in a dataset?
The spread of the middle 50% of the data
ExplanationInterquartile range (IQR) measures the spread of the central 50% of the data, providing insights into the variability within the dataset.
#5
Which graph is most effective for displaying the relationship between two categorical variables?
Bar chart
ExplanationBar charts are ideal for visualizing the relationship between categorical variables by displaying the frequency or proportion of each category.
#6
What is the null hypothesis in a statistical test?
There is no significant difference between groups
ExplanationThe null hypothesis assumes that there is no significant difference or relationship between groups or variables being compared in a statistical test.
#7
How is the mode defined in a dataset?
The value that appears most frequently
ExplanationThe mode in a dataset is the value that occurs most frequently, representing the highest frequency or occurrence.
#8
What does the standard deviation measure in a dataset?
Variability or spread of the data
ExplanationStandard deviation measures the dispersion or spread of data points around the mean, providing insights into the variability within the dataset.
#9
What does a p-value less than 0.05 typically indicate?
There is a significant difference
ExplanationA p-value less than 0.05 indicates that there is significant evidence against the null hypothesis, suggesting a significant difference.
#10
Which test would you use to compare means from two related samples?
Paired samples t-test
ExplanationPaired samples t-test is employed to assess the difference between two means from related samples.
#11
What is the purpose of using a box plot in data analysis?
To show the distribution of data
ExplanationBox plots are used to display the distribution of a dataset and to identify outliers and quartiles.
#12
In hypothesis testing, what is Type I error?
Rejecting the null hypothesis when it is true
ExplanationType I error occurs when the null hypothesis is incorrectly rejected, indicating that there is a significant difference when there isn't.
#13
What statistical test is used to compare the means of more than two groups?
ANOVA
ExplanationAnalysis of Variance (ANOVA) is used to compare means across multiple groups.
#14
Which of the following correlation coefficients represents the strongest relationship between two variables?
-0.8
ExplanationA correlation coefficient of -0.8 indicates a strong negative linear relationship between two variables.
#15
What does a confidence interval represent in statistical analysis?
The range in which the true mean of the population is likely to fall
ExplanationA confidence interval provides a range of values within which the true population parameter, such as the mean, is likely to lie at a certain level of confidence.
#16
What is the effect of outliers on the mean of a dataset?
Outliers can significantly increase or decrease the mean
ExplanationOutliers, being extreme values, can heavily influence the mean of a dataset by pulling it towards themselves.
#17
Which statistical test is appropriate for testing the relationship between two categorical variables?
Chi-square test
ExplanationChi-square test is used to assess the association or independence between two categorical variables by comparing observed and expected frequencies.
#18
In a normal distribution, what percentage of data falls within one standard deviation of the mean?
Approximately 68%
ExplanationIn a normal distribution, approximately 68% of the data falls within one standard deviation of the mean, according to the empirical rule.
#19
In a linear regression model, what does R-squared represent?
The proportion of the variance in the dependent variable that is predictable from the independent variable
ExplanationR-squared measures the proportion of the variance in the dependent variable that is explained by the independent variable(s) in a regression model.
#20
Which of the following is not an assumption of linear regression?
All variables are categorical
ExplanationNot all variables in linear regression need to be categorical; they can also be continuous.
#21
What does the term 'multicollinearity' refer to in multiple regression analysis?
A strong correlation between two or more predictor variables
ExplanationMulticollinearity occurs when two or more predictor variables in a regression model are highly correlated.
#22
In a regression analysis, what does the beta coefficient represent?
The slope of the regression line, indicating the change in the dependent variable for a one-unit change in the independent variable
ExplanationBeta coefficient represents the change in the dependent variable for a one-unit change in the independent variable, indicating the slope of the regression line.
#23
What is the main purpose of principal component analysis (PCA)?
To reduce the dimensionality of the data while preserving as much variability as possible
ExplanationPrincipal Component Analysis (PCA) is used to reduce the dimensionality of data while retaining most of its variability by transforming variables into linearly uncorrelated components.
#24
In time series analysis, what does seasonality refer to?
Fluctuations in the data that are dependent on seasonal factors
ExplanationSeasonality refers to recurring patterns or fluctuations in data that occur at regular intervals and are influenced by seasonal factors like time of year.
#25
What is the primary use of logistic regression?
Predicting the probability of a categorical outcome
ExplanationLogistic regression is primarily used to model the probability of a categorical outcome based on one or more predictor variables.