AP Notes, Outlines, Study Guides, Vocabulary, Practice Exams and more!

AP Statistics Flashcards

Terms : Hide Images
6648677145Categorical vs. Quantitative DataData are categorical if they fall into groups or categories and data are quantitative if they take on numerical values where it makes sense to find an average. -Use bar graphs, pie graphs, or segmented bar charts for categorical variables such as color or gender. -Use dotplots, stemplots, histograms, or boxplots for quantitative variables such as age or weight.0
6648685788Marginal vs. Conditional DistributionsIn a two-way table, marginal distributions consider only one variable and use the total row/column of the table only. Conditional distributions describe the distribution of one variable for a specific value of the other (one row/column inside the table).1
6648690829SOCSShape - Skewed Left, Skewed Right, Symmetric, Uniform, Unimodal, Bimodal Outliers - Discuss them if there are obvious ones Center - Mean or Median Spread - Range, IQR, or Standard Deviation Note: Also be on the lookout for gaps, clusters or other unusual features of the data set.2
6648692544Comparing DistributionsAddress: Shape, Outliers, Center, Spread in context! YOU MUST USE comparison phrases like "is greater than" or "is less than" for Center & Spread3
6648692545Outlier RuleUpper Cutoff = Q3 + 1.5(IQR) Lower Cutoff = Q1 - 1.5(IQR) IQR = Q3 - Q14
6648696164Interpret Standard DeviationStandard Deviation measures spread by giving the "typical" distance that the observations (context) are away from the mean (context).5
6648696165How does shape affect measures of center?In general, Skewed Left (Mean < Median) Skewed Right (Mean > Median) Fairly Symmetric (Mean ≈ Median)6
6648697986Interpret a z-scorez = (value - mean) / standard deviation A z-score describes how many standard deviations a value falls away from the mean of the distribution and in what direction. The further the z-score is away from zero the more "surprising" the value of the statistic is.7
6648697987PercentilesThe kth percentile of a distribution is the point that has k% of the values less than that point. For example, a student who scores at the 90th percentile got a higher score than 90% of the other test takers.8
6648700397Linear TransformationsAdding "a" to every member of a data set adds "a" to the measures of position, but does not change the measures of spread or the shape. Multiplying every member of a data set by "b" multiplies the measures of position by "b" and multiplies most measures of spread by |b|, but does not change the shape.9
6648770742The Standard Normal DistributionThe standard Normal distribution is the Normal distribution with mean μ = 0 and standard deviation σ = 1. The Normal table displays values for the standard Normal distribution.10
6648770743Using Normalcdf and InvNorm (Calculator Tips)Using boundaries to find area: Normalcdf (min, max, mean, SD) Using area to find boundary: Invnorm (area to the left as a decimal, mean, SD) If used on AP® exam, make sure to label each input!11
6648772417Describing an association in a scatterplotAddress the following, in context: Direction Outliers Form Strength12
6648772418Interpret rCorrelation measures the strength and direction of the linear relationship between x and y. • r is always between -1 and 1 • Close to zero = very weak • Close to 1 or -1 = strong • Exactly 1 or -1 = perfectly straight line • Positive r = positive correlation Negative r = negative correlation13
6648774183Interpret LSRL Slope "b"For every one unit change in the x variable (context) the y variable (context) is predicted to increase/decrease by ____ units (context).14
6648774184Interpret LSRL y-intercept "a"When the x variable (context) is zero, the y variable (context) is predicted to be ______.15
6648776258What is a Residual?Residual = y − ŷ (Actual - Predicted) A residual measures the difference between the actual y value and the y value that is predicted by the LSRL.16
6648776259Interpreting a Residual PlotIf there is a leftover pattern in the residual plot, then the model used does not have the same form as the association (the model is not appropriate). If there is no leftover pattern in the residual plot, then the model is appropriate.17
6648779945Interpret LSRL "ŷ"yˆ is the "estimated" or "predicted" y-value (context) for a given x-value (context)18
6648785376ExtrapolationUsing a LSRL to predict outside the domain of the explanatory variable. (Can lead to ridiculous conclusions if the observed association does not continue)19
6648799735Interpret LSRL "s"s = ___ is the standard deviation of the residuals. It measures the typical distance between the actual y values (context) and their predicted y values (context) in a regression setting20
6648817223Interpret r^2___% of the variation in y (context) is accounted for by the LSRL of y (context) on x (context). Or ___% of the variation in y (context) is accounted for by using the linear regression model with x (context) as the explanatory variable.21
6648821098Outliers and Influential Points in RegressionAny point that falls outside the pattern of the association should be considered an outlier. A point is influential if it has a big effect on a calculation, such as the correlation or equation of the least-squares regression line. Points separated in the x-direction are often influential.22
6648821099Reading Computer Output for RegressionUsing foot length (x) to predict height (y): Y intercept = 103.41 and Slope = 2.7469 s = 7.95126 and 2 r = 0.48623
6648822768SRSAn SRS (simple random sample) is a sample taken in such a way that every set of n individuals has an equal chance to be the sample actually selected.24
6648822769Using a Random Digit Table to Select a SampleStep 1: Label. Give each member of the population a numerical label with the same number of digits. Use as few digits as possible. Step 2: Randomize. Read consecutive groups of digits of the appropriate length from left to right across a line in table. Ignore any group of digits that wasn't used as a label or that duplicates a label already in the sample. Stop when you have chosen n different labels. Your sample contains the individuals whose labels you find.25
6648826540Sampling Techniques1. SRS- Names in a hat 2. Stratified - Split the population into homogeneous groups, select an SRS from each group. 3. Cluster - Split the population into groups (often based on location) called clusters, and randomly select whole clusters for the sample. 4. Census - An attempt to reach the entire population 5. Convenience- Selects individuals easiest to reach 6. Voluntary Response - People choose themselves by responding to a general appeal.26
6648826541Advantage of using a Stratified Random Sample Over an SRSStratified random sampling guarantees that each of the strata will be represented. When strata are chosen properly, a stratified random sample will produce better (less variable/more precise) information than an SRS of the same size.27
6648828227BiasA sampling method is biased if it consistently produces estimates that are too small or consistently produces estimates that are too large.28
6648829319Experiment vs. Observational StudyA study is an experiment ONLY if researchers impose a treatment upon the experimental units. In an observational study researchers make no attempt to influence the results and cannot conclude cause- and-effect.29
6648854863ConfoundingTwo variables are confounded if it cannot be determined which variable is causing the change in the response variable. For example, if people who take vitamins on their own have less cancer, we cannot say for sure that the vitamins are causing the reduction in cancer. It could be other characteristics of vitamin takers, such as diet or exercise.30
6648855947Why use a control group?A control group gives the researchers a comparison group to be used to evaluate the effectiveness of the treatment(s). (context) It allows the researchers to measure the effect of the treatment (context) compared to no treatment at all.31
6648855948BlindingWhen the subjects in an experiment don't know which treatment they are receiving, they are blind. If the people interacting with the subjects and measuring the response variable don't know which subjects received which treatments, they are blind. If both groups are blind, the study is double-blind.32
6648857370Experimental DesignsCRD (Completely Randomized Design) - Units are allocated at random among all treatments RBD (Randomized Block Design) -Units are put into homogeneous blocks and randomly assigned to treatments within each block. Matched Pairs - A form of blocking in which each subject receives both treatments in a random order or subjects are matched in pairs with one subject in each pair receiving each treatment, determined at random.33
6648860450Benefit of BlockingBlocking helps account for the variability in the response variable (context) that is caused by the blocking variable (context). If there really is a difference in the effectiveness of the treatments, using an appropriate blocking variable will increase power (probability of finding convincing evidence that the treatments are not equally effective).34
6648860451Scope of Inference: Generalizing to a Larger PopulationWe can generalize the results of a study to a larger population if we used a random sample from that population.35
6648861983Scope of Inference: Cause-and-EffectWe can make a cause-and-effect conclusion if we randomly assign treatments to experimental units in an experiment. Otherwise, Association is NOT Causation!36
6648861984Interpreting ProbabilityThe probability of an event is the proportion of times the event would occur in a very large number of repetitions. Probability is a long-term relative frequency.37
6648863499Law of Large NumbersThe Law of Large Numbers says that if we observe many repetitions of a chance process, the observed proportion of times that an event occurs approaches a single value, called the probability of that event.38
6648863500Conducting a simulationState: Ask a question about some chance process. Plan: Describe how to use a random device to simulate one trial of the process and indicate what will be recorded at the end of each trial. Do: Do many trials. Conclude: Answer the question of interest.39
6648883009Complementary EventsTwo mutually exclusive events whose union is the sample space. For example: -Rain / No Rain -Draw at least one heart / Draw NO hearts40
6649278637Conditional ProbabilityProbability that one event occurs given that another event is already known to have occurred. (on formula sheet)41
6649278638Two Events are Independent If...Events A and B are independent if knowing that Event A has occurred (or has not occurred) doesn't change the probability that event B occurs.42
6649291649Two Events are Mutually Exclusive If...Events A and B are mutually exclusive if they share no outcomes.43
6649279716Interpreting Expected Value/MeanIf we were to repeat the chance process (context) many times, the average value of ____ (context) would be about ____.44
6649315986Mean and Standard Deviation of a Discrete Random Variable45
6649280723Mean and Standard Deviation of a Sum of Two Random Variables46
6649281638Binomial Setting and Random VariableBinary? Each trial can be classified as success/failure Independent? Trials must be independent. Number? The number of trials (n) must be fixed in advance Success? The probability of success (p) must be the same for each trial. X = number of successes in n trials47
6649326600Binomial Distribution (Calculator Usage)48
6649329285Mean and Standard Deviation Of a Binomial RV49
6649331792Geometric Setting and Random VariableArises when we perform independent trials of the same chance process and record the number of trials it takes to get one success. On each trial, the probability p of success must be the same. X = number of trials needed to achieve one success50
6649331793Parameter vs. StatisticA parameter measures a characteristic of a population, such as a population mean μ or population proportion p. A statistic measures a characteristic of a sample, such as a sample mean x̄ or sample proportion p̂. Statistics are used to estimate parameters.51
6649333210What is a sampling distribution?A sampling distribution is the distribution of a sample statistic in all possible samples of the same size. It describes the possible values of a statistic and how likely these values are. Contrast with the distribution of the population and the distribution of a sample.52
6649334424What is the sampling distribution of p̂?53
6649341729What is the Central Limit Theorem (CLT)?If the population distribution is not Normal the sampling distribution of the sample mean x will become more and more Normal as n increases.54
6649342926Unbiased EstimatorA statistic is an unbiased estimator of a parameter if the mean of its sampling distribution equals the true value of the parameter being estimated. In other words, the sampling distribution of the statistic is centered in the right place.55
66493581194-Step Process Confidence IntervalsSTATE: What parameter do you want to estimate, and at what confidence level? PLAN: Choose the appropriate inference method. Check conditions. DO: If the conditions are met, perform calculations. CONCLUDE: Interpret your interval in the context of the problem.56
6649360732Interpreting a Confidence IntervalI am ___% confident that the interval from ___ to ___ captures the true ____.57
6649361243Interpreting a Confidence Level (The Meaning of 95% Confidence)If many, many samples are selected and many, many confidence intervals are calculated, about __% of them will capture the true ____.58
6649361244Standard Error vs. Margin of ErrorThe standard error of a statistic estimates how far the value of the statistic typically differs from the true value of the parameter. The margin of error estimates how far we expect the parameter to differ from the statistic, at most.59
6649362810What factors affect the Margin of Error?The margin of error decreases when: -The sample size increases -The confidence level decreases60
6649362811Inference for Means (Conditions)Random: Data from a random sample or randomized experiment 10%: The sample must be ≤ 10% of population Normal/Large Sample: Population distribution is Normal or sample size is large (n ≥ 30). If n < 30, graph sample data and verify no strong skewness or outliers. Include graph!61
6649363939Inference for Proportions (Conditions)Random: Data from a random sample or randomized experiment 10%: The sample must be ≤ 10% of population Large Counts: At least 10 successes and failures: npˆ ≥10 and n p (1 10 − ≥ ˆ ) (1 sample z test for p: 0 np ≥10 and n p (1 10 − ≥ 0 ) )62
6649364557Finding the Sample Size (For a given margin of error m)63
66493651074-Step Process Significance TestsSTATE: What hypotheses do you want to test, and at what significance level? Define any parameters you use. PLAN: Choose the appropriate inference method. Check conditions. DO: If the conditions are met, perform calculations. Compute the test statistic and find the P-value. CONCLUDE: Make a decision about the hypotheses in the context of the problem.64
6649365108Explain a P-valueAssuming that the null is true (context) there is a ___ probability of observing a statistic (context) as large as or larger than the one actually observed by chance alone.65
6649366391Carrying out a Two- Sided Test from a Confidence Intervalα = 1 - confidence level If the null hypothesis value is in the interval, then it is a plausible value that should not be rejected. If the null hypothesis value is not in the interval, then it is not a plausible value and should be rejected.66
6649379186Type I Error & Type II ErrorType I Error: Finding convincing evidence that Ha is true when in reality Ha is not true. (Rejecting H0 when H0 is actually true). Type II Error: Not finding convincing evidence that Ha is true when in reality Ha is true. (Failing II reject H0 when Ha is true).67
6649379187PowerPower: Probability of avoiding a Type II error = Probability of finding convincing evidence that Ha is true when in reality Ha is true.68
6649380060Factors that Affect PowerSample Size: To increase power, increase sample size. Significance Level α: A larger value of α increases power. Effect Size: The farther the true value is from the hypothesized value, the larger the power. Data Collection: Using blocking rather than a completely randomized design can increase power.69
6649380819Paired t-test Identification Hints, H0 and Ha70
6649381488Two Sample t-test Identification Hints, H0 and Ha71
6649382566Chi-Square Tests (Conditions)Random: Data from a random sample(s) or randomized experiment 10%: The sample must be ≤ 10% of the population. Large Counts: All expected counts are at least 5.72
6649382567Types of Chi-Square TestsGoodness of Fit: Use to compare the distribution of a categorical variable in one population to a hypothesized distribution. Homogeniety: Use to compare distribution of a categorical variable for 2+ populations or treatments. Independence: Use to test the association between two categorical variables in one population.73
6649384081Chi-Square Tests df and Expected Counts74
6649384909Inference for Regression (Conditions)Linear: Association between the variables is linear. Check with residual plot. Independent observations, 10% condition if sampling without replacement Normal: Responses vary normally around the regression line for all x-values. Check with graph of residuals. Equal SD around the regression line for all x-values. Check with residual plot. Random: Data from a random sample or randomized experiment75
6649385686Inference for Regression with Computer Output76

Need Help?

We hope your visit has been a productive one. If you're having any problems, or would like to give some feedback, we'd love to hear from you.

For general help, questions, and suggestions, try our dedicated support forums.

If you need to contact the Course-Notes.Org web experience team, please use our contact form.

Need Notes?

While we strive to provide the most comprehensive notes for as many high school textbooks as possible, there are certainly going to be some that we miss. Drop us a note and let us know which textbooks you need. Be sure to include which edition of the textbook you are using! If we see enough demand, we'll do whatever we can to get those notes up on the site for you!