6931516contextideally tells who was measured, what was measured, how the data were collected, where the data were collected, and when and why the study was performed0
6931517datasystematically recorded information, whether numbers or labels, together with its context1
6931518data tablean arrangement of data in which each row represents a case and each column represents a variable2
6931519casean individual about whom or which we have data3
6931520variableholds information about the same characteristic for many cases4
6931521categorical variablea variable that names categories (whether with words or numerals)5
6931522quantitative variablea variable in which the numbers act as numerical values; always has units6
6931523unitsa quantity or amount adopted as a standard of measurement, such as dollars, hours, or grams7
6931524frequency tablelists the categories in a categorical variable and gives the count or percentage of observations for each category8
6931525distributiongives the possible values of the variable and the relative frequency of each value9
6931526area principlein a statistical display, each data value should be represented by the same amount of area10
6931527bar chartshows a bar representing the count of each category in a categorical variable11
6931528pie chartshows how a "whole" divides into categories by showing a wedge of a circle whose area corresponds to the proportion in each category12
6931529contingency tabledisplays counts and, sometimes, percentages of individuals falling into named categories on two or more variables; categorizes the individuals on all variables at once, to reveal possible patterns in one variable that may be contingent on the category of the other13
6931530marginal distributionthe distribution of either variable alone in a contingency table; the counts or percentages are the totals found in the margins (last row or column) of the table14
6931531conditional distributionthe distribution of a variable restricting the who to consider only a smaller group of individuals15
6931532independencevariables are said to be this if the conditional distribution of one variable is the same for each category of the other16
6931533simpson's paradoxwhen averages are taken across different groups, they can appear to contradict the overall averages17
6931534distributiongives the possible values of the variable and the frequency or relative frequency of each value18
6931535histogramuses adjacent bars to show the distribution of vales in a quantitative variable; each bar represents the frequency (or relative frequency) of values falling in an interval of values19
6931536stem-and-leaf displayshows quantitative data values in a way that sketches the distribution of the data20
6931537dotplotgraphs a dot for each case against a single axis21
6931538shapeto describe this aspect of a distribution, look for single vs. multiple modes, and symmetry vs. skewness22
6931539centera value that attempts the impossible by summarizing the entire distribution with a single number, a "typical" value23
6931540spreada numerical summary of how tightly the values are clustered around the "center"24
6931541modea hump or local high point in the shape of the distribution of a variable; the apparent locations of these can change as the scale of a histogram is changed25
6931542unimodalhaving one mode; this is a useful term for describing the shape of a histogram when it's generally mound-shaped26
6931543bimodaldistributions with two modes27
6931544multimodaldistributions with more than two modes28
6931545uniforma distribution that's roughly flat29
6931546symmetrica distribution is this if the two halves on either side of the center look approximately like mirror images of each other30
6931547tailsthe parts of a distribution that typically trail off on either side; they can be characterized as long or short31
6931548skeweda distribution is this if it's not symmetric and one tail stretches out farther than the other32
6931549outliersextreme values that don't appear to belong with the rest of the data33
6931550timeplotdisplays data that change over time34
6931551centersummarized with the mean or the median35
6931552medianthe middle value with half of the data above and half below it36
6931553spreadsummarized with the standard deviation, interquartile range, and range37
6931554rangethe difference between the lowest and highest values in a data set38
6931555quartilethe lower of this is the value with a quarter of the data below it; the upper of this has a quarter of the data above it39
6931556interquartile rangethe difference between the first and third quartiles40
6931557percentilethe ith ___ is the number that falls above i% of the data41
69315585-number summaryconsists of the minimum and maximum, the quartiles Q1 and Q3, and the median42
6931559boxplotdisplays the 5-number summary as a central box with whiskers that extend to the non-outlying data values43
6931560meanfound by summing all the data values and dividing by the count44
6931561variancethe sum of squared deviations from the mean, divided by the count minus one45
6931562standard deviationthe square root of the variance46
6931563comparing distributionswhen doing this, consider their shape, center, and spread47
6931564shiftingadding a constant to each data value adds the same constant to the mean, the median, and the quartiles, but does not change the standard deviation or IQR48
6931565rescalingmultiplying each data value by a constant multiplies both the measures of position and the measures of spread by that constant49
6931566standardizingdone to eliminate units; values can be compared and combined even if the original variables had different units and magnitudes50
6931567standardized valuevalue found by subtracting the mean and dividing by the standard deviation51
6931568normal modeluseful family of models for unimodal, symmetric distributions52
6931569parameternumerically valued attribute of a model53
6931570statisticvalue calculated from data to summarize aspects of the data54
6931571z-scoretells how many standard deviations a value is from the mean; have a mean of zero and a standard deviation of one55
6931572standard normal modela normal model with a mean of 0 and a standard deviation of 156
693157368-95-99.7 rulein a normal model, about 68% of values fall within 1 standard deviation of the mean, about 95% fall within 2 standard deviations of the mean, and about 99.7% fall within 3 standard deviations of the mean57
6931574normal percentilethis corresponding to a z-score gives the percentage of values in a standard normal distribution found at that z-score or below58
6931575normal probability plota display to help assess whether a distribution of data is approximately normal; if it is nearly straight, the data satisfy the nearly normal condition59
6931576changing center and spreaddoing this is equivalent to changing its units60
6933787scatterplotsshows the relationship between two quantitative variables measured on the same cases61
6933788directiona positive ____ or association means that, in general, as one variable increases, so does the other; when increases in one variable generally correspond to decreases in the other, the association is negative62
6933789formthe ____ we care about most is straight63
6933790strengtha scatterplot shows an association that is this if there is little scatter around the underlying relationship64
6933791correlationa numerical measure of the direction and strength of a linear association65
6933792outliera point that does not fit the overall pattern seen in the scatterplot66
6933793lurking variablea variable other than x and y that simultaneously affects both variables, accounting for the correlation between the two67
6933794modelan equation or formula that simplifies and represents reality68
6933795linear modelan equation of the form y-hat = b0 + b1x69
6933796residualsthe differences between data values and the corresponding values predicted by the regression model; ____ = observed value - predicted value70
6933797predicted valuefound by substituting the x-value in the regression equation; they're the values on the fitted line71
6933798slopegives a value in "y-units per x-unit"; changes of one unit in x are associated with changes of b1 units in predicted values of y72
6933799regression to the meaneach predicted y-hat tends to be fewer standard deviations from its mean than its corresponding x was from its mean73
6933800regression linethe linear equation y-hat = b0 + b1x that satisfies the least squares criterion74
6933801interceptthis, b0, gives a starting value in y-units; it's the y-hat-value when x is 075
6933802least squaresthis criterion specifies the unique line that minimizes the variance of the residuals or, equivalently, the sum of the squared residuals76
6933803r2the square of the correlation between y and x; gives the fraction of the variability of y accounted for by the least squares linear regression on x; an overall measure of how successful the regression is in linearly relating y to x77
6933804subsetif data consist of two or more groups that have been thrown together, it is usually best to fit different linear models to each group than to try to fit a single model to all of the data78
6933805extrapolationalthough linear models provide an easy way to predict values of y for a given value of x, it is unsafe to predict for values of x far from the ones used to find the linear model equation; predictions should not be trusted79
6933806outlierany data point that stands away from the others; can be extraordinary by having a large residual or by having high leverage80
6933807leveragedata points whose x-values are far from the mean of x are said to exert ____ on a linear model; with high enough ____, residuals can appear to be deceptively small81
6933808influential pointwhen omitting a point from the data results in a very different regression model, the point is an ____82
6933809lurking variablea variable that is not explicitly part of a model but affects the way the variables in the model appear to be related83
6933810re-express datawe do this by taking the logarithm, the square root, the reciprocal, or some other mathematical operation on all values in the data set84
6933811ladder of powersplaces in order the effects that many re-expressions have on the data85
6934222randoman event is this if we know what outcomes could happen, but not which particular values will happen86
6934223random numbersthese are hard to generate, but several websites offer an unlimited supply of equally likely random values87
6934224simulationmodels random events by using random numbers to specify event outcomes with relative frequencies that correspond to the true real-world relative frequencies we are trying to model88
6934225simulation componentthe most basic situation in a simulation in which something happens at random89
6934226outcomean individual result of a component of a simulation90
6934227trialthe sequence of several components representing events that we are pretending will take place91
6934228response variablevalues of this record the results of each trial with respect to what we were interested in92
6934229populationthe entire group of individuals or instances about whom we hope to learn93
6934230samplea representative subset of a population, examined in hope of learning about the population94
6934231sample surveya study that asks questions of a sample drawn from some population in the hope of learning something about the entire population95
6934232biasany systematic failure of a sampling method to represent its population; common errors are voluntary response, undercoverage, nonresponse ____, and response ____96
6934233randomizationthe best defense against bias, in which each individual is given a fair, random chance of selection97
6934234matchingany attempt to force a sample to resemble specified attributes of the population98
6934235sample sizethe number of individuals in a sample99
6934236censusa sample that consists of the entire population100
6934237population parametera numerically valued attribute of a model for a population101
6934238representativea sample is this if the statistics computed from it accurately reflect the corresponding population parameters102
6934239simple random samplethis of sample size n is one in which each set of n elements in the population has an equal chance of selection103
6934240sampling framea list of individuals from whom the sample is drawn104
6934241sampling variabilitythe natural tendency of randomly drawn samples to differ105
6934242stratified random samplea sampling design in which the population is divided into several subpopulations, and random samples are then drawn from each stratum106
6934243cluster samplea sampling design in which entire groups are chosen at random107
6934244multistage samplesampling schemes that combine several sampling methods108
6934245systematic samplea sample drawn by selecting individuals systematically from a sampling frame109
6934246voluntary response biasbias introduced to a sample when individuals can choose on their own whether to participate in the sample110
6934247convenience sampleconsists of the individuals who are conveniently available111
6934248undercoveragea sampling scheme that biases the sample in a way that gives a part of the population less representation than it has in the population112
6934249nonresponse biasbias introduced to a sample when a large fraction of those sampled fails to respond113
6934250response biasanything in a survey design that influences response114
6940717observational studya study based on data in which no manipulation of factors has been employed115
6940718retrospective studyan observational study in which subjects are selected and then their previous conditions or behaviors are determined116
6940719prospective studyan observational study in which subjects are followed to observe future outcomes117
6940720experimentmanipulates factor levels to create treatments, randomly assigns subjects to these treatment levels, and then compares the responses of the subject groups across treatment levels118
6940721random assignmentto be valid, an experiment must assign experimental units to treatment groups at random119
6940722factora variable whose levels are controlled by the experimenter120
6940723responsea variable whose values are compared across different treatments121
6940724experimental unitsindividuals on whom an experiment is performed122
6940725levelthe specific values that the experimenter chooses for a factor123
6940726treatmentthe process, intervention, or other controlled circumstance applied to randomly assigned experimental units124
6940727principles of experimental designcontrol, randomize, replicate, block125
6940728statistically significantwhen an observed difference is too large for us to believe that is is likely to have occurred naturally126
6940729control groupthe experimental units assigned to a baseline treatment level, typically either the default treatment, which is well understood, or a null, placebo treatment127
6940730blindingany individual associated with an experiment who is not aware of how subjects have been allocated to treatment groups128
6940731single-blindwhen either those who could influence or evaluate the results is blinded129
6940732double-blindwhen both those who could influence and evaluate the results are blinded130
6940733placeboa treatment known to have no effect, administered so that all groups experience the same conditions131
6940734placebo effectthe tendency of many human subjects (often 20% or more of experiment subjects) to show a response even when administered a placebo132
6940735blockwhen groups of experimental units are similar, it is a good idea to gather them together into these133
6940736matchedin a retrospective or prospective study, subjects who are similar in ways not under study may be ____ and then compared with each other on the variables of interest134
6940737randomized block designrandomization occurring within blocks135
6940738completely randomized designall experimental units have an equal chance of receiving any treatment136
6940739confoundedwhen the levels of one factor are associated with the levels of another factor so their effects cannot be separated137

