AP Statistics Flashcards

6648677145	Categorical vs. Quantitative Data	Data are categorical if they fall into groups or categories and data are quantitative if they take on numerical values where it makes sense to find an average. -Use bar graphs, pie graphs, or segmented bar charts for categorical variables such as color or gender. -Use dotplots, stemplots, histograms, or boxplots for quantitative variables such as age or weight.	0
6648685788	Marginal vs. Conditional Distributions	In a two-way table, marginal distributions consider only one variable and use the total row/column of the table only. Conditional distributions describe the distribution of one variable for a specific value of the other (one row/column inside the table).	1
6648690829	SOCS	Shape - Skewed Left, Skewed Right, Symmetric, Uniform, Unimodal, Bimodal Outliers - Discuss them if there are obvious ones Center - Mean or Median Spread - Range, IQR, or Standard Deviation Note: Also be on the lookout for gaps, clusters or other unusual features of the data set.	2
6648692544	Comparing Distributions	Address: Shape, Outliers, Center, Spread in context! YOU MUST USE comparison phrases like "is greater than" or "is less than" for Center & Spread	3
6648692545	Outlier Rule	Upper Cutoff = Q3 + 1.5(IQR) Lower Cutoff = Q1 - 1.5(IQR) IQR = Q3 - Q1	4
6648696164	Interpret Standard Deviation	Standard Deviation measures spread by giving the "typical" distance that the observations (context) are away from the mean (context).	5
6648696165	How does shape affect measures of center?	In general, Skewed Left (Mean < Median) Skewed Right (Mean > Median) Fairly Symmetric (Mean ≈ Median)	6
6648697986	Interpret a z-score	z = (value - mean) / standard deviation A z-score describes how many standard deviations a value falls away from the mean of the distribution and in what direction. The further the z-score is away from zero the more "surprising" the value of the statistic is.	7
6648697987	Percentiles	The kth percentile of a distribution is the point that has k% of the values less than that point. For example, a student who scores at the 90th percentile got a higher score than 90% of the other test takers.	8
6648700397	Linear Transformations	Adding "a" to every member of a data set adds "a" to the measures of position, but does not change the measures of spread or the shape. Multiplying every member of a data set by "b" multiplies the measures of position by "b" and multiplies most measures of spread by \|b\|, but does not change the shape.	9
6648770742	The Standard Normal Distribution	The standard Normal distribution is the Normal distribution with mean μ = 0 and standard deviation σ = 1. The Normal table displays values for the standard Normal distribution.	10
6648770743	Using Normalcdf and InvNorm (Calculator Tips)	Using boundaries to find area: Normalcdf (min, max, mean, SD) Using area to find boundary: Invnorm (area to the left as a decimal, mean, SD) If used on AP® exam, make sure to label each input!	11
6648772417	Describing an association in a scatterplot	Address the following, in context: Direction Outliers Form Strength	12
6648772418	Interpret r	Correlation measures the strength and direction of the linear relationship between x and y. • r is always between -1 and 1 • Close to zero = very weak • Close to 1 or -1 = strong • Exactly 1 or -1 = perfectly straight line • Positive r = positive correlation Negative r = negative correlation	13
6648774183	Interpret LSRL Slope "b"	For every one unit change in the x variable (context) the y variable (context) is predicted to increase/decrease by ____ units (context).	14
6648774184	Interpret LSRL y-intercept "a"	When the x variable (context) is zero, the y variable (context) is predicted to be ______.	15
6648776258	What is a Residual?	Residual = y − ŷ (Actual - Predicted) A residual measures the difference between the actual y value and the y value that is predicted by the LSRL.	16
6648776259	Interpreting a Residual Plot	If there is a leftover pattern in the residual plot, then the model used does not have the same form as the association (the model is not appropriate). If there is no leftover pattern in the residual plot, then the model is appropriate.	17
6648779945	Interpret LSRL "ŷ"	yˆ is the "estimated" or "predicted" y-value (context) for a given x-value (context)	18
6648785376	Extrapolation	Using a LSRL to predict outside the domain of the explanatory variable. (Can lead to ridiculous conclusions if the observed association does not continue)	19
6648799735	Interpret LSRL "s"	s = ___ is the standard deviation of the residuals. It measures the typical distance between the actual y values (context) and their predicted y values (context) in a regression setting	20
6648817223	Interpret r^2	___% of the variation in y (context) is accounted for by the LSRL of y (context) on x (context). Or ___% of the variation in y (context) is accounted for by using the linear regression model with x (context) as the explanatory variable.	21
6648821098	Outliers and Influential Points in Regression	Any point that falls outside the pattern of the association should be considered an outlier. A point is influential if it has a big effect on a calculation, such as the correlation or equation of the least-squares regression line. Points separated in the x-direction are often influential.	22
6648821099	Reading Computer Output for Regression	Using foot length (x) to predict height (y): Y intercept = 103.41 and Slope = 2.7469 s = 7.95126 and 2 r = 0.486	23
6648822768	SRS	An SRS (simple random sample) is a sample taken in such a way that every set of n individuals has an equal chance to be the sample actually selected.	24
6648822769	Using a Random Digit Table to Select a Sample	Step 1: Label. Give each member of the population a numerical label with the same number of digits. Use as few digits as possible. Step 2: Randomize. Read consecutive groups of digits of the appropriate length from left to right across a line in table. Ignore any group of digits that wasn't used as a label or that duplicates a label already in the sample. Stop when you have chosen n different labels. Your sample contains the individuals whose labels you find.	25
6648826540	Sampling Techniques	1. SRS- Names in a hat 2. Stratified - Split the population into homogeneous groups, select an SRS from each group. 3. Cluster - Split the population into groups (often based on location) called clusters, and randomly select whole clusters for the sample. 4. Census - An attempt to reach the entire population 5. Convenience- Selects individuals easiest to reach 6. Voluntary Response - People choose themselves by responding to a general appeal.	26
6648826541	Advantage of using a Stratified Random Sample Over an SRS	Stratified random sampling guarantees that each of the strata will be represented. When strata are chosen properly, a stratified random sample will produce better (less variable/more precise) information than an SRS of the same size.	27
6648828227	Bias	A sampling method is biased if it consistently produces estimates that are too small or consistently produces estimates that are too large.	28
6648829319	Experiment vs. Observational Study	A study is an experiment ONLY if researchers impose a treatment upon the experimental units. In an observational study researchers make no attempt to influence the results and cannot conclude cause- and-effect.	29
6648854863	Confounding	Two variables are confounded if it cannot be determined which variable is causing the change in the response variable. For example, if people who take vitamins on their own have less cancer, we cannot say for sure that the vitamins are causing the reduction in cancer. It could be other characteristics of vitamin takers, such as diet or exercise.	30
6648855947	Why use a control group?	A control group gives the researchers a comparison group to be used to evaluate the effectiveness of the treatment(s). (context) It allows the researchers to measure the effect of the treatment (context) compared to no treatment at all.	31
6648855948	Blinding	When the subjects in an experiment don't know which treatment they are receiving, they are blind. If the people interacting with the subjects and measuring the response variable don't know which subjects received which treatments, they are blind. If both groups are blind, the study is double-blind.	32
6648857370	Experimental Designs	CRD (Completely Randomized Design) - Units are allocated at random among all treatments RBD (Randomized Block Design) -Units are put into homogeneous blocks and randomly assigned to treatments within each block. Matched Pairs - A form of blocking in which each subject receives both treatments in a random order or subjects are matched in pairs with one subject in each pair receiving each treatment, determined at random.	33
6648860450	Benefit of Blocking	Blocking helps account for the variability in the response variable (context) that is caused by the blocking variable (context). If there really is a difference in the effectiveness of the treatments, using an appropriate blocking variable will increase power (probability of finding convincing evidence that the treatments are not equally effective).	34
6648860451	Scope of Inference: Generalizing to a Larger Population	We can generalize the results of a study to a larger population if we used a random sample from that population.	35
6648861983	Scope of Inference: Cause-and-Effect	We can make a cause-and-effect conclusion if we randomly assign treatments to experimental units in an experiment. Otherwise, Association is NOT Causation!	36
6648861984	Interpreting Probability	The probability of an event is the proportion of times the event would occur in a very large number of repetitions. Probability is a long-term relative frequency.	37
6648863499	Law of Large Numbers	The Law of Large Numbers says that if we observe many repetitions of a chance process, the observed proportion of times that an event occurs approaches a single value, called the probability of that event.	38
6648863500	Conducting a simulation	State: Ask a question about some chance process. Plan: Describe how to use a random device to simulate one trial of the process and indicate what will be recorded at the end of each trial. Do: Do many trials. Conclude: Answer the question of interest.	39
6648883009	Complementary Events	Two mutually exclusive events whose union is the sample space. For example: -Rain / No Rain -Draw at least one heart / Draw NO hearts	40
6649278637	Conditional Probability	Probability that one event occurs given that another event is already known to have occurred. (on formula sheet)	41
6649278638	Two Events are Independent If...	Events A and B are independent if knowing that Event A has occurred (or has not occurred) doesn't change the probability that event B occurs.	42
6649291649	Two Events are Mutually Exclusive If...	Events A and B are mutually exclusive if they share no outcomes.	43
6649279716	Interpreting Expected Value/Mean	If we were to repeat the chance process (context) many times, the average value of ____ (context) would be about ____.	44
6649315986	Mean and Standard Deviation of a Discrete Random Variable		45
6649280723	Mean and Standard Deviation of a Sum of Two Random Variables		46
6649281638	Binomial Setting and Random Variable	Binary? Each trial can be classified as success/failure Independent? Trials must be independent. Number? The number of trials (n) must be fixed in advance Success? The probability of success (p) must be the same for each trial. X = number of successes in n trials	47
6649326600	Binomial Distribution (Calculator Usage)		48
6649329285	Mean and Standard Deviation Of a Binomial RV		49
6649331792	Geometric Setting and Random Variable	Arises when we perform independent trials of the same chance process and record the number of trials it takes to get one success. On each trial, the probability p of success must be the same. X = number of trials needed to achieve one success	50
6649331793	Parameter vs. Statistic	A parameter measures a characteristic of a population, such as a population mean μ or population proportion p. A statistic measures a characteristic of a sample, such as a sample mean x̄ or sample proportion p̂. Statistics are used to estimate parameters.	51
6649333210	What is a sampling distribution?	A sampling distribution is the distribution of a sample statistic in all possible samples of the same size. It describes the possible values of a statistic and how likely these values are. Contrast with the distribution of the population and the distribution of a sample.	52
6649334424	What is the sampling distribution of p̂?		53
6649341729	What is the Central Limit Theorem (CLT)?	If the population distribution is not Normal the sampling distribution of the sample mean x will become more and more Normal as n increases.	54
6649342926	Unbiased Estimator	A statistic is an unbiased estimator of a parameter if the mean of its sampling distribution equals the true value of the parameter being estimated. In other words, the sampling distribution of the statistic is centered in the right place.	55
6649358119	4-Step Process Confidence Intervals	STATE: What parameter do you want to estimate, and at what confidence level? PLAN: Choose the appropriate inference method. Check conditions. DO: If the conditions are met, perform calculations. CONCLUDE: Interpret your interval in the context of the problem.	56
6649360732	Interpreting a Confidence Interval	I am ___% confident that the interval from ___ to ___ captures the true ____.	57
6649361243	Interpreting a Confidence Level (The Meaning of 95% Confidence)	If many, many samples are selected and many, many confidence intervals are calculated, about __% of them will capture the true ____.	58
6649361244	Standard Error vs. Margin of Error	The standard error of a statistic estimates how far the value of the statistic typically differs from the true value of the parameter. The margin of error estimates how far we expect the parameter to differ from the statistic, at most.	59
6649362810	What factors affect the Margin of Error?	The margin of error decreases when: -The sample size increases -The confidence level decreases	60
6649362811	Inference for Means (Conditions)	Random: Data from a random sample or randomized experiment 10%: The sample must be ≤ 10% of population Normal/Large Sample: Population distribution is Normal or sample size is large (n ≥ 30). If n < 30, graph sample data and verify no strong skewness or outliers. Include graph!	61
6649363939	Inference for Proportions (Conditions)	Random: Data from a random sample or randomized experiment 10%: The sample must be ≤ 10% of population Large Counts: At least 10 successes and failures: npˆ ≥10 and n p (1 10 − ≥ ˆ ) (1 sample z test for p: 0 np ≥10 and n p (1 10 − ≥ 0 ) )	62
6649364557	Finding the Sample Size (For a given margin of error m)		63
6649365107	4-Step Process Significance Tests	STATE: What hypotheses do you want to test, and at what significance level? Define any parameters you use. PLAN: Choose the appropriate inference method. Check conditions. DO: If the conditions are met, perform calculations. Compute the test statistic and find the P-value. CONCLUDE: Make a decision about the hypotheses in the context of the problem.	64
6649365108	Explain a P-value	Assuming that the null is true (context) there is a ___ probability of observing a statistic (context) as large as or larger than the one actually observed by chance alone.	65
6649366391	Carrying out a Two- Sided Test from a Confidence Interval	α = 1 - confidence level If the null hypothesis value is in the interval, then it is a plausible value that should not be rejected. If the null hypothesis value is not in the interval, then it is not a plausible value and should be rejected.	66
6649379186	Type I Error & Type II Error	Type I Error: Finding convincing evidence that Ha is true when in reality Ha is not true. (Rejecting H0 when H0 is actually true). Type II Error: Not finding convincing evidence that Ha is true when in reality Ha is true. (Failing II reject H0 when Ha is true).	67
6649379187	Power	Power: Probability of avoiding a Type II error = Probability of finding convincing evidence that Ha is true when in reality Ha is true.	68
6649380060	Factors that Affect Power	Sample Size: To increase power, increase sample size. Significance Level α: A larger value of α increases power. Effect Size: The farther the true value is from the hypothesized value, the larger the power. Data Collection: Using blocking rather than a completely randomized design can increase power.	69
6649380819	Paired t-test Identification Hints, H0 and Ha		70
6649381488	Two Sample t-test Identification Hints, H0 and Ha		71
6649382566	Chi-Square Tests (Conditions)	Random: Data from a random sample(s) or randomized experiment 10%: The sample must be ≤ 10% of the population. Large Counts: All expected counts are at least 5.	72
6649382567	Types of Chi-Square Tests	Goodness of Fit: Use to compare the distribution of a categorical variable in one population to a hypothesized distribution. Homogeniety: Use to compare distribution of a categorical variable for 2+ populations or treatments. Independence: Use to test the association between two categorical variables in one population.	73
6649384081	Chi-Square Tests df and Expected Counts		74
6649384909	Inference for Regression (Conditions)	Linear: Association between the variables is linear. Check with residual plot. Independent observations, 10% condition if sampling without replacement Normal: Responses vary normally around the regression line for all x-values. Check with graph of residuals. Equal SD around the regression line for all x-values. Check with residual plot. Random: Data from a random sample or randomized experiment	75
6649385686	Inference for Regression with Computer Output		76

Class Notes

Social Science

Math

Science

Fine Arts

Test Prep

Textbook Notes

Members Only

Forum

Blogs

Textbook Request

AP Statistics Flashcards

Primary tabs

Need Help?

Need Notes?

About Course-Notes.Org

You are here

AP Statistics Flashcards

Primary tabs

Need Help?

Need Notes?

About Course-Notes.Org