#1
What is the mean of the following dataset: {3, 5, 7, 9, 11}?
7
ExplanationThe mean is the sum of all values divided by the number of values, which in this case is (3+5+7+9+11)/5 = 7.
#2
Which measure of central tendency is most affected by extreme values or outliers?
Mean
ExplanationThe mean is most affected by extreme values because it considers all data points equally, so even one extreme value can significantly alter its value.
#3
Which statistical measure is most affected by outliers?
Mean
ExplanationThe mean is most affected by outliers because it uses all data points in its calculation, so extreme values can heavily influence its value.
#4
What does a p-value represent in hypothesis testing?
Probability of observing the given data if the null hypothesis is true
ExplanationThe p-value indicates the probability of obtaining the observed data or more extreme results when the null hypothesis is true. A low p-value suggests that the observed data is unlikely under the null hypothesis, leading to rejection of the null hypothesis.
#5
Which distribution is commonly used to model the number of successes in a fixed number of independent Bernoulli trials?
Binomial distribution
ExplanationThe binomial distribution is used to model the number of successes (e.g., heads in coin flips) in a fixed number of independent Bernoulli trials (e.g., coin flips) with the same probability of success.
#6
What is the coefficient of determination (R-squared) in linear regression used for?
To measure the goodness of fit of the regression model
ExplanationThe coefficient of determination (R-squared) in linear regression is used to measure how well the regression line fits the data. It represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
#7
What is the formula for calculating the z-score?
(x - μ) / σ
ExplanationThe z-score measures how many standard deviations a data point is from the mean. It is calculated as the difference between the data point and the mean, divided by the standard deviation: (x - mean) / standard deviation.
#8
What is the formula for calculating the standard deviation of a population?
sqrt((1/n) * Σ(x - μ)^2)
ExplanationThe standard deviation of a population is calculated as the square root of the average of the squared differences between each data point and the population mean. The formula is sqrt((1/n) * sum(x - mean)^2), where n is the number of data points and mean is the population mean.
#9
In hypothesis testing, what does Type I error refer to?
Rejecting the null hypothesis when it is true
ExplanationType I error occurs when the null hypothesis is rejected incorrectly, i.e., concluding that there is a significant effect or difference when there isn't one in reality.
#10
What is the purpose of a Q-Q plot in statistics?
To test the normality assumption of residuals
ExplanationA Q-Q plot (quantile-quantile plot) is used to compare the distribution of a dataset to a theoretical distribution, such as the normal distribution. It is often used to assess the normality assumption of residuals in regression analysis.
#11
What is the purpose of the Kolmogorov-Smirnov test?
To test for normality in a dataset
ExplanationThe Kolmogorov-Smirnov test is used to test whether a dataset follows a specific distribution, such as the normal distribution.
#12
What is the formula for calculating covariance between two variables?
Σ(x - μ)(y - ν) / (n - 1)
ExplanationThe covariance between two variables measures how the variables change together. It is calculated as the sum of the product of the differences between each data point and the mean of each variable, divided by n-1.