AP Statistics Flashcards
Terms : Hide Images [1]
6648677145 | Categorical vs. Quantitative Data | Data are categorical if they fall into groups or categories and data are quantitative if they take on numerical values where it makes sense to find an average. -Use bar graphs, pie graphs, or segmented bar charts for categorical variables such as color or gender. -Use dotplots, stemplots, histograms, or boxplots for quantitative variables such as age or weight. | 0 | |
6648685788 | Marginal vs. Conditional Distributions | In a two-way table, marginal distributions consider only one variable and use the total row/column of the table only. Conditional distributions describe the distribution of one variable for a specific value of the other (one row/column inside the table). | 1 | |
6648690829 | SOCS | Shape - Skewed Left, Skewed Right, Symmetric, Uniform, Unimodal, Bimodal Outliers - Discuss them if there are obvious ones Center - Mean or Median Spread - Range, IQR, or Standard Deviation Note: Also be on the lookout for gaps, clusters or other unusual features of the data set. | 2 | |
6648692544 | Comparing Distributions | Address: Shape, Outliers, Center, Spread in context! YOU MUST USE comparison phrases like "is greater than" or "is less than" for Center & Spread | 3 | |
6648692545 | Outlier Rule | Upper Cutoff = Q3 + 1.5(IQR) Lower Cutoff = Q1 - 1.5(IQR) IQR = Q3 - Q1 | 4 | |
6648696164 | Interpret Standard Deviation | Standard Deviation measures spread by giving the "typical" distance that the observations (context) are away from the mean (context). | 5 | |
6648696165 | How does shape affect measures of center? | In general, Skewed Left (Mean < Median) Skewed Right (Mean > Median) Fairly Symmetric (Mean ≈ Median) | 6 | |
6648697986 | Interpret a z-score | z = (value - mean) / standard deviation A z-score describes how many standard deviations a value falls away from the mean of the distribution and in what direction. The further the z-score is away from zero the more "surprising" the value of the statistic is. | 7 | |
6648697987 | Percentiles | The kth percentile of a distribution is the point that has k% of the values less than that point. For example, a student who scores at the 90th percentile got a higher score than 90% of the other test takers. | 8 | |
6648700397 | Linear Transformations | Adding "a" to every member of a data set adds "a" to the measures of position, but does not change the measures of spread or the shape. Multiplying every member of a data set by "b" multiplies the measures of position by "b" and multiplies most measures of spread by |b|, but does not change the shape. | 9 | |
6648770742 | The Standard Normal Distribution | The standard Normal distribution is the Normal distribution with mean μ = 0 and standard deviation σ = 1. The Normal table displays values for the standard Normal distribution. | 10 | |
6648770743 | Using Normalcdf and InvNorm (Calculator Tips) | Using boundaries to find area: Normalcdf (min, max, mean, SD) Using area to find boundary: Invnorm (area to the left as a decimal, mean, SD) If used on AP® exam, make sure to label each input! | 11 | |
6648772417 | Describing an association in a scatterplot | Address the following, in context: Direction Outliers Form Strength | 12 | |
6648772418 | Interpret r | Correlation measures the strength and direction of the linear relationship between x and y. • r is always between -1 and 1 • Close to zero = very weak • Close to 1 or -1 = strong • Exactly 1 or -1 = perfectly straight line • Positive r = positive correlation Negative r = negative correlation | 13 | |
6648774183 | Interpret LSRL Slope "b" | For every one unit change in the x variable (context) the y variable (context) is predicted to increase/decrease by ____ units (context). | 14 | |
6648774184 | Interpret LSRL y-intercept "a" | When the x variable (context) is zero, the y variable (context) is predicted to be ______. | 15 | |
6648776258 | What is a Residual? | Residual = y − ŷ (Actual - Predicted) A residual measures the difference between the actual y value and the y value that is predicted by the LSRL. | 16 | |
6648776259 | Interpreting a Residual Plot | If there is a leftover pattern in the residual plot, then the model used does not have the same form as the association (the model is not appropriate). If there is no leftover pattern in the residual plot, then the model is appropriate. | 17 | |
6648779945 | Interpret LSRL "ŷ" | yˆ is the "estimated" or "predicted" y-value (context) for a given x-value (context) | 18 | |
6648785376 | Extrapolation | Using a LSRL to predict outside the domain of the explanatory variable. (Can lead to ridiculous conclusions if the observed association does not continue) | 19 | |
6648799735 | Interpret LSRL "s" | s = ___ is the standard deviation of the residuals. It measures the typical distance between the actual y values (context) and their predicted y values (context) in a regression setting | 20 | |
6648817223 | Interpret r^2 | ___% of the variation in y (context) is accounted for by the LSRL of y (context) on x (context). Or ___% of the variation in y (context) is accounted for by using the linear regression model with x (context) as the explanatory variable. | 21 | |
6648821098 | Outliers and Influential Points in Regression | Any point that falls outside the pattern of the association should be considered an outlier. A point is influential if it has a big effect on a calculation, such as the correlation or equation of the least-squares regression line. Points separated in the x-direction are often influential. | 22 | |
6648821099 | Reading Computer Output for Regression | Using foot length (x) to predict height (y): Y intercept = 103.41 and Slope = 2.7469 s = 7.95126 and 2 r = 0.486 | ![]() | 23 |
6648822768 | SRS | An SRS (simple random sample) is a sample taken in such a way that every set of n individuals has an equal chance to be the sample actually selected. | 24 | |
6648822769 | Using a Random Digit Table to Select a Sample | Step 1: Label. Give each member of the population a numerical label with the same number of digits. Use as few digits as possible. Step 2: Randomize. Read consecutive groups of digits of the appropriate length from left to right across a line in table. Ignore any group of digits that wasn't used as a label or that duplicates a label already in the sample. Stop when you have chosen n different labels. Your sample contains the individuals whose labels you find. | 25 | |
6648826540 | Sampling Techniques | 1. SRS- Names in a hat 2. Stratified - Split the population into homogeneous groups, select an SRS from each group. 3. Cluster - Split the population into groups (often based on location) called clusters, and randomly select whole clusters for the sample. 4. Census - An attempt to reach the entire population 5. Convenience- Selects individuals easiest to reach 6. Voluntary Response - People choose themselves by responding to a general appeal. | 26 | |
6648826541 | Advantage of using a Stratified Random Sample Over an SRS | Stratified random sampling guarantees that each of the strata will be represented. When strata are chosen properly, a stratified random sample will produce better (less variable/more precise) information than an SRS of the same size. | 27 | |
6648828227 | Bias | A sampling method is biased if it consistently produces estimates that are too small or consistently produces estimates that are too large. | 28 | |
6648829319 | Experiment vs. Observational Study | A study is an experiment ONLY if researchers impose a treatment upon the experimental units. In an observational study researchers make no attempt to influence the results and cannot conclude cause- and-effect. | 29 | |
6648854863 | Confounding | Two variables are confounded if it cannot be determined which variable is causing the change in the response variable. For example, if people who take vitamins on their own have less cancer, we cannot say for sure that the vitamins are causing the reduction in cancer. It could be other characteristics of vitamin takers, such as diet or exercise. | 30 | |
6648855947 | Why use a control group? | A control group gives the researchers a comparison group to be used to evaluate the effectiveness of the treatment(s). (context) It allows the researchers to measure the effect of the treatment (context) compared to no treatment at all. | 31 | |
6648855948 | Blinding | When the subjects in an experiment don't know which treatment they are receiving, they are blind. If the people interacting with the subjects and measuring the response variable don't know which subjects received which treatments, they are blind. If both groups are blind, the study is double-blind. | 32 | |
6648857370 | Experimental Designs | CRD (Completely Randomized Design) - Units are allocated at random among all treatments RBD (Randomized Block Design) -Units are put into homogeneous blocks and randomly assigned to treatments within each block. Matched Pairs - A form of blocking in which each subject receives both treatments in a random order or subjects are matched in pairs with one subject in each pair receiving each treatment, determined at random. | 33 | |
6648860450 | Benefit of Blocking | Blocking helps account for the variability in the response variable (context) that is caused by the blocking variable (context). If there really is a difference in the effectiveness of the treatments, using an appropriate blocking variable will increase power (probability of finding convincing evidence that the treatments are not equally effective). | 34 | |
6648860451 | Scope of Inference: Generalizing to a Larger Population | We can generalize the results of a study to a larger population if we used a random sample from that population. | 35 | |
6648861983 | Scope of Inference: Cause-and-Effect | We can make a cause-and-effect conclusion if we randomly assign treatments to experimental units in an experiment. Otherwise, Association is NOT Causation! | 36 | |
6648861984 | Interpreting Probability | The probability of an event is the proportion of times the event would occur in a very large number of repetitions. Probability is a long-term relative frequency. | 37 | |
6648863499 | Law of Large Numbers | The Law of Large Numbers says that if we observe many repetitions of a chance process, the observed proportion of times that an event occurs approaches a single value, called the probability of that event. | 38 | |
6648863500 | Conducting a simulation | State: Ask a question about some chance process. Plan: Describe how to use a random device to simulate one trial of the process and indicate what will be recorded at the end of each trial. Do: Do many trials. Conclude: Answer the question of interest. | 39 | |
6648883009 | Complementary Events | Two mutually exclusive events whose union is the sample space. For example: -Rain / No Rain -Draw at least one heart / Draw NO hearts | 40 | |
6649278637 | Conditional Probability | Probability that one event occurs given that another event is already known to have occurred. (on formula sheet) | ![]() | 41 |
6649278638 | Two Events are Independent If... | Events A and B are independent if knowing that Event A has occurred (or has not occurred) doesn't change the probability that event B occurs. | ![]() | 42 |
6649291649 | Two Events are Mutually Exclusive If... | Events A and B are mutually exclusive if they share no outcomes. | 43 | |
6649279716 | Interpreting Expected Value/Mean | If we were to repeat the chance process (context) many times, the average value of ____ (context) would be about ____. | 44 | |
6649315986 | Mean and Standard Deviation of a Discrete Random Variable | ![]() | 45 | |
6649280723 | Mean and Standard Deviation of a Sum of Two Random Variables | ![]() | 46 | |
6649281638 | Binomial Setting and Random Variable | Binary? Each trial can be classified as success/failure Independent? Trials must be independent. Number? The number of trials (n) must be fixed in advance Success? The probability of success (p) must be the same for each trial. X = number of successes in n trials | 47 | |
6649326600 | Binomial Distribution (Calculator Usage) | ![]() | 48 | |
6649329285 | Mean and Standard Deviation Of a Binomial RV | ![]() | 49 | |
6649331792 | Geometric Setting and Random Variable | Arises when we perform independent trials of the same chance process and record the number of trials it takes to get one success. On each trial, the probability p of success must be the same. X = number of trials needed to achieve one success | 50 | |
6649331793 | Parameter vs. Statistic | A parameter measures a characteristic of a population, such as a population mean μ or population proportion p. A statistic measures a characteristic of a sample, such as a sample mean x̄ or sample proportion p̂. Statistics are used to estimate parameters. | 51 | |
6649333210 | What is a sampling distribution? | A sampling distribution is the distribution of a sample statistic in all possible samples of the same size. It describes the possible values of a statistic and how likely these values are. Contrast with the distribution of the population and the distribution of a sample. | 52 | |
6649334424 | What is the sampling distribution of p̂? | ![]() | 53 | |
6649341729 | What is the Central Limit Theorem (CLT)? | If the population distribution is not Normal the sampling distribution of the sample mean x will become more and more Normal as n increases. | 54 | |
6649342926 | Unbiased Estimator | A statistic is an unbiased estimator of a parameter if the mean of its sampling distribution equals the true value of the parameter being estimated. In other words, the sampling distribution of the statistic is centered in the right place. | 55 | |
6649358119 | 4-Step Process Confidence Intervals | STATE: What parameter do you want to estimate, and at what confidence level? PLAN: Choose the appropriate inference method. Check conditions. DO: If the conditions are met, perform calculations. CONCLUDE: Interpret your interval in the context of the problem. | 56 | |
6649360732 | Interpreting a Confidence Interval | I am ___% confident that the interval from ___ to ___ captures the true ____. | 57 | |
6649361243 | Interpreting a Confidence Level (The Meaning of 95% Confidence) | If many, many samples are selected and many, many confidence intervals are calculated, about __% of them will capture the true ____. | 58 | |
6649361244 | Standard Error vs. Margin of Error | The standard error of a statistic estimates how far the value of the statistic typically differs from the true value of the parameter. The margin of error estimates how far we expect the parameter to differ from the statistic, at most. | 59 | |
6649362810 | What factors affect the Margin of Error? | The margin of error decreases when: -The sample size increases -The confidence level decreases | 60 | |
6649362811 | Inference for Means (Conditions) | Random: Data from a random sample or randomized experiment 10%: The sample must be ≤ 10% of population Normal/Large Sample: Population distribution is Normal or sample size is large (n ≥ 30). If n < 30, graph sample data and verify no strong skewness or outliers. Include graph! | 61 | |
6649363939 | Inference for Proportions (Conditions) | Random: Data from a random sample or randomized experiment 10%: The sample must be ≤ 10% of population Large Counts: At least 10 successes and failures: npˆ ≥10 and n p (1 10 − ≥ ˆ ) (1 sample z test for p: 0 np ≥10 and n p (1 10 − ≥ 0 ) ) | 62 | |
6649364557 | Finding the Sample Size (For a given margin of error m) | ![]() | 63 | |
6649365107 | 4-Step Process Significance Tests | STATE: What hypotheses do you want to test, and at what significance level? Define any parameters you use. PLAN: Choose the appropriate inference method. Check conditions. DO: If the conditions are met, perform calculations. Compute the test statistic and find the P-value. CONCLUDE: Make a decision about the hypotheses in the context of the problem. | 64 | |
6649365108 | Explain a P-value | Assuming that the null is true (context) there is a ___ probability of observing a statistic (context) as large as or larger than the one actually observed by chance alone. | 65 | |
6649366391 | Carrying out a Two- Sided Test from a Confidence Interval | α = 1 - confidence level If the null hypothesis value is in the interval, then it is a plausible value that should not be rejected. If the null hypothesis value is not in the interval, then it is not a plausible value and should be rejected. | 66 | |
6649379186 | Type I Error & Type II Error | Type I Error: Finding convincing evidence that Ha is true when in reality Ha is not true. (Rejecting H0 when H0 is actually true). Type II Error: Not finding convincing evidence that Ha is true when in reality Ha is true. (Failing II reject H0 when Ha is true). | 67 | |
6649379187 | Power | Power: Probability of avoiding a Type II error = Probability of finding convincing evidence that Ha is true when in reality Ha is true. | 68 | |
6649380060 | Factors that Affect Power | Sample Size: To increase power, increase sample size. Significance Level α: A larger value of α increases power. Effect Size: The farther the true value is from the hypothesized value, the larger the power. Data Collection: Using blocking rather than a completely randomized design can increase power. | 69 | |
6649380819 | Paired t-test Identification Hints, H0 and Ha | ![]() | 70 | |
6649381488 | Two Sample t-test Identification Hints, H0 and Ha | ![]() | 71 | |
6649382566 | Chi-Square Tests (Conditions) | Random: Data from a random sample(s) or randomized experiment 10%: The sample must be ≤ 10% of the population. Large Counts: All expected counts are at least 5. | 72 | |
6649382567 | Types of Chi-Square Tests | Goodness of Fit: Use to compare the distribution of a categorical variable in one population to a hypothesized distribution. Homogeniety: Use to compare distribution of a categorical variable for 2+ populations or treatments. Independence: Use to test the association between two categorical variables in one population. | 73 | |
6649384081 | Chi-Square Tests df and Expected Counts | ![]() | 74 | |
6649384909 | Inference for Regression (Conditions) | Linear: Association between the variables is linear. Check with residual plot. Independent observations, 10% condition if sampling without replacement Normal: Responses vary normally around the regression line for all x-values. Check with graph of residuals. Equal SD around the regression line for all x-values. Check with residual plot. Random: Data from a random sample or randomized experiment | 75 | |
6649385686 | Inference for Regression with Computer Output | ![]() | 76 |