数据分析习题1(1) 下载本文

Question 1

Which of the following classifications of variable types is false?

Your Answer Score Explanation

Student height → continuous numerical

Population of each state in the US → continuous numerical Correct 1.00 Counted data are

can’t take on no

Customer satisfaction: very unsatisfied, unsatisfied, satisfied, very satisfied → ordinal categorical

Whether a student has previously taken a statistics course → categorical

Total 1.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Identify variables as numerical and categorical.

? If variable is numerical, further classify as continuous or discrete based on whether or not the variable can take on an infinite number of values or only non-negative whole numbers, respectively.

? If variable is categorical, determine if it is ordinal based on whether or not the levels have a natural ordering.

Question 2

A study is designed to test the effect of type of light on exam performance of students. 180 students are randomly assigned to three classrooms: one that is dimly lit, another with yellow lighting, and a third with white fluorescent lighting, and given the same exam. Which of the following correctlyidentifies the variables used in the study as explanatory and response?

Your Answer Score Explanation

explanatory:exam performance

response: type of light (categorical with 3 levels)

explanatory:dimly yellow, white fluorescent response: performance

lit,

exam

explanatory:exam performance

response: dimly lit, yellow, white fluorescent

explanatory:type of light Correct 1.00 (categorical with 3 levels) response: performance

exam

We are interested in the effect of type of light on e

light is the explanatory variable and exam perform

light is a categorical variable that can take on three

white light). These possible values are called levels

Total 1.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Identify the explanatory variable in a pair of variables as the variable suspected of affecting the other, however note that labeling variables as explanatory and response does not guarantee that the relationship between the two is actually causal, even if there is an association identified between the two variables.

Question 3

True or False: If subjects are randomly assigned to treatments, conclusions can be generalized to the population.

Your Answer

Score Explanation

False

True Inorrect 0.00 Random assignment allows us to make causal conclusions. Fosampling.

Total 0.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Classify a study as observational or experimental, and determine whether the study’s results can be generalized to the population and whether they suggest correlation or causation.

? If random sampling has been employed in data collection, the results should be generalizable to the target population.

? If random assignment has been employed in study design, the results suggest causality.

Question 4

An extraneous variable that is related to the explanatory and response variables and that prevents us from deducing causal relationships based on observational studies is called a .

Answer for Question 4You entered:

Your Answer Score

Incorrect 0.00

Total 0.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Question confounding variables and sources of bias in a given study.

Question 5

As part of a statistics project, Andrea would like to collect data on household size in her city. To do so, she asks each person in her statistics class for the size of their household, and reports that her sample is a simple random sample. However, this is not a simple random sample. Which of the following is the best reasoning for why this is not a random sample that is appropriate for this research question?

Your Answer

Andrea did not use any randomization; she took a convenience sample.

Andrea did not use a stratified sample.

Andrea did not use a random number table to randomize the order in which she collected thstudents’ responses, so the sample cannot be random.

Andrea asked everybody in her class instead of asking her classmates to volunteer.

Total

Question ExplanationThis question refers to the following learning objective(s):

Distinguish between simple random, stratified, and cluster sampling, and recognize the benefits and drawbacks of choosing one sampling scheme over another.

Question 6

Which of the following is one of the four principles of experimental design?

Your Answer Score Explanation

stratify

cluster

block Correct 1.00 The four principles of experimental design are contro

non-response

Total 1.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Identify the four principles of experimental design and recognize their purposes:

? control any possible confounders,

? randomize into treatment and control groups,

? replicate by using a sufficiently large sample or repeating the experiment, ? block any variables that might influence the response.

Question 7

Which of the below data sets has the highest standard deviation? You do not need to calculate the exact standard deviations to answer this question.

Your Answer Score Explanation

0, 25, 25, 25, 25, 25, 25

0,1,2,3,4,5,6

0, 100, 200, 300, 400, Correct 1.00 500, 600

The dataset with the least repeated observations tha

most variability, hence the highest standard deviation

0,1,1,1,1,1,2

Total 1.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Note that there are three commonly used measures of center and spread:

? center: mean (the arithmetic average), median (the midpoint), mode (the most frequent observation)

? spread: standard deviation (variability around the mean), range (max-min), interquartile range (middle 50% of the distribution)

Question 8

The distribution of housing prices in a country where 25% of the houses cost below $350,000, 50% of the houses cost below $450,000, 75% of the houses cost below $1,000,000 and there are a meaningful number of houses that cost more than $6,000,000 is most likely

Your Answer Score Explanation

left skewed

right skewed Correct 1.00 There is a long tail on the right side of the di

uniform

symmetric

Total 1.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Identify the shape of a distribution as symmetric, right skewed, or left skewed, and unimodal, bimodoal, multimodal, or uniform.

Question 9

Two distributions (A and B) are shown on the box plot below. Which of the following statements is not supported by the plot?

Your Answer Score Exp

Both distributions are unimodal.

Median of A is higher than median of B.

B is more variable than A.

Both distributions are roughly symmetric. Inorrect 0.00 Both

Total 0.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Use histograms and box plots to visualize the shape, center, and spread of numerical distributions, and intensity maps for visualizing the spatial distribution of the data.

Question 10

A recent housing survey was conducted to determine the price of a typical home in a city that is mostly middle-class, with one very expensive suburb. The mean price of a house in this city is roughly $650,000. Which of the following statements is most likely to be true?

Your Answer Score Explanation

We need to know the standard deviation question

to

answer

this

There are about as many houses in this city that cost more than $650,000 than less than this amount.

Majority of houses in this city cost more than $650,000.

Majority of houses in this city Correct 1.00 cost less than $650,000.

Since the city is mostly middle-class, with one

expect the distribution to be right skewed, and

than the median. Since 50% of observations fa

observations (i.e. majority) will cost less than $

Total 1.00 / 1.00

Question ExplanationThis question refers to the following learning objective:

Define a robust statistic (e.g. median, IQR) as a statistics that is not heavily affected by skewness and extreme outliers, and determine when such statistics are more appropriate measures of center and spread compared to other similar statistics.

Question 11

It is relatively common for fish to be mislabeled in supermarkets and even in restaurants. The table below shows the results of a study where a random sample of 156 fish for sale were collected and genetically tested. The researchers classified each sample as being labeled properly or being mislabeled. What fraction of smoked fish in the sample were mislabeled? Choose the closest answer.

Your Answer Score Explanation

28%

9%

72% Inorrect 0.00 Look among only the smoked fish, and find the proport

18%

78%

Total 0.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Use contingency tables and segmented bar plots or mosaic plots to assess the relationship between two categorical variables.

Question 12

Does meditation cure insomnia? Researchers randomly divided 400 people into two equal- sized groups. One group meditated daily for 30 minutes, the other group attended a 2-hour information session on insomnia. At the beginning of the

study, the average difference between the number of minutes slept between the two groups was about 0. After the study, the average difference was about 32 minutes, and the meditation group had a higher average number of minutes slept. To test whether an average difference of 32 minutes could be attributed to chance, a statistics student decided to conduct a randomization test. She wrote the number of minutes slept by each subject in the study on an index card. She shuffled the cards together very well, and then dealt them into two equal-sized groups. Which of the following best describes the outcome?

Your Answer Score Explanation

The average difference between the two stacks of Correct 1.00 cards will be about 0 minutes.

Since we’re randomly split

would expect similar averag

difference of 0 in the averag

The average difference between the two stacks of cards will be about 32 minutes.

If meditation is effective, the average difference between the two stacks of cards will be more than 32 minutes.

Total 1.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Note that an observed difference in sample statistics suggesting dependence between variables may be due to random chance, and that we need to use hypothesis testing to determine if this difference is too large to be attributed to random chance. Set up null and alternative hypotheses for testing for independence between variables, and evaluate the data support for these hypotheses using a simulation technique.