#1
Which of the following measures is affected by extreme values?
Median
Mode
Range
All of the above
#2
What is the primary purpose of a scatter plot?
To compare two variables
To track changes over time
To display distribution of a single variable
To show how a variable is affected by categories
#3
Which of the following is a measure of central tendency?
Variance
Standard deviation
Mean
Range
#4
What does the interquartile range measure in a dataset?
Variability around the median
The range between the minimum and maximum values
The average value of the dataset
The spread of the middle 50% of the data
#5
Which graph is most effective for displaying the relationship between two categorical variables?
Histogram
Scatter plot
Bar chart
Line graph
#6
What is the null hypothesis in a statistical test?
There is a significant difference between groups
There is no significant difference between groups
The sample data does not represent the population
The observed data is due to chance
#7
How is the mode defined in a dataset?
The average of all values
The middle value when data is sorted
The value that appears most frequently
The difference between the highest and lowest values
#8
What does the standard deviation measure in a dataset?
Central tendency
Variability or spread of the data
Skewness of the data
Peakness of the data distribution
#9
What does a p-value less than 0.05 typically indicate?
The null hypothesis is true
There is no significant difference
There is a significant difference
The sample size is too large
#10
Which test would you use to compare means from two related samples?
Independent samples t-test
ANOVA
Paired samples t-test
Chi-square test
#11
What is the purpose of using a box plot in data analysis?
To show the distribution of data
To display the mean of the data
To plot the correlation between two variables
To represent categorical data
#12
In hypothesis testing, what is Type I error?
Accepting the null hypothesis when it is false
Rejecting the null hypothesis when it is true
Failing to reject the null hypothesis when it is false
Both A and C
#13
What statistical test is used to compare the means of more than two groups?
ANOVA
Independent samples t-test
Paired samples t-test
Chi-square test
#14
Which of the following correlation coefficients represents the strongest relationship between two variables?
#15
What does a confidence interval represent in statistical analysis?
The range in which the true mean of the population is likely to fall
The level of significance for hypothesis testing
The probability of Type I error
The interval within which all data points lie
#16
What is the effect of outliers on the mean of a dataset?
Outliers do not affect the mean
Outliers can significantly increase or decrease the mean
Outliers only affect the median, not the mean
Outliers make the mean equal to the median
#17
Which statistical test is appropriate for testing the relationship between two categorical variables?
ANOVA
Chi-square test
Independent samples t-test
Linear regression
#18
In a normal distribution, what percentage of data falls within one standard deviation of the mean?
Approximately 68%
Approximately 95%
Approximately 99.7%
Approximately 50%
#19
In a linear regression model, what does R-squared represent?
The proportion of the variance in the dependent variable that is predictable from the independent variable
The correlation between the dependent and independent variables
The average distance of the data points from the regression line
The slope of the regression line
#20
Which of the following is not an assumption of linear regression?
Homoscedasticity
Normality of residuals
Independence of observations
All variables are categorical
#21
What does the term 'multicollinearity' refer to in multiple regression analysis?
A strong correlation between two or more predictor variables
A linear relationship between the predictor and outcome variables
Multiple regression models used in parallel
Collinearity within a single predictor variable
#22
In a regression analysis, what does the beta coefficient represent?
The intercept of the regression line
The slope of the regression line, indicating the change in the dependent variable for a one-unit change in the independent variable
The degree of spread in the data
The correlation between the dependent and independent variables
#23
What is the main purpose of principal component analysis (PCA)?
To classify the data into distinct groups
To reduce the dimensionality of the data while preserving as much variability as possible
To predict the outcome of a dependent variable based on independent variables
To test the hypothesis of no association between two variables
#24
In time series analysis, what does seasonality refer to?
The trend of the data over time
Cyclic patterns that repeat at irregular intervals
Fluctuations in the data that are dependent on seasonal factors
Random variations in the data
#25
What is the primary use of logistic regression?
Predicting a continuous outcome
Estimating the mean of a population
Predicting the probability of a categorical outcome
Comparing means across multiple groups