AP Statistics formulas, vocab, conditions, etc. Flashcards
Terms : Hide Images [1]
1427272854 | Cases | Subjects or objects of statistical examination | 0 | |
1427272855 | Variables | Characteristics of case | 1 | |
1427272856 | Steps of a Simulation | 1. Model - set up a model in which chance is the only cause of being selected 2. Repetition 3. Distribution - display the distribution of data 4. Conclusion | 2 | |
1427272857 | Uniform/Rectangular Distributions | A distribution in which all values occur equally often | 3 | |
1427272858 | Normal Distributions (basic appearance) | -Bell curve -One peak (mode) -Use MEAN to describe center -Use standard deviation SD to describe spread | 4 | |
1427272859 | Skewed distributions | -Bunching at one end, tail at other -Skewed left/right depending on which way the tail stretches -Use 5-number summary to indicate center/spread | 5 | |
1427272860 | Bimodal distributions | -2 peaks -Summarize by locating 2 peaks -Even better if you can find a variable leading to the 2 peaks | 6 | |
1427272861 | Outlier | A value that stands apart from the data 1.5 times IQR from nearest quartile | 7 | |
1427272862 | Quantitative variables | How many/how much | 8 | |
1427272863 | Categorical Variables | a variable that groups cases into categories | 9 | |
1427272864 | Histograms | -Divide number line into intervals called bins -Over each bin, construct a bar that has a height equal to the number of cases in that bin | 10 | |
1427272865 | When Histograms Work Best | -Large number of values to plot -Don't care about individual values -Want general case of distribution -Only one distribution or a small number of distributions to compare | 11 | |
1427272866 | Relative Frequency Histogram | Shows proportions instead of counts | 12 | |
1427272867 | Stemplot/Stem-and-leaf plots | -Has numbers on left side of a line, which are stems, that are the tens digits, and the numbers on the right are the leaves -If more than 2 digits, others are truncated o r rounded | 13 | |
1427272868 | When Stemplots Work Best | -Single quantitative variable -Small number of values -See individual values exactly -See shape of distribution clearly -2 or more groups to compare | 14 | |
1427272869 | When Dot Plots Work Best | -Small number of values -See individual values -See shape of distribution -One group or a small number of groups to compare | 15 | |
1427272870 | Bar Charts | -Like a histogram but for categorical variables | 16 | |
1427272871 | Mean | x-bar, "average", add up all the values of x and divide by the number of values, n | 17 | |
1427272872 | Median | Divides the data into halves - the middle value | 18 | |
1427272873 | IQR, interquartile range | Q3-Q1, measure of spread | 19 | |
1427272874 | 5-number summary | Minimum, Q1, median, Q2, maximum | 20 | |
1427272875 | Boxplot | A graphical display of the 5-number summary | 21 | |
1427272876 | Deviations | Difference from the mean (X minus X-bar) | 22 | |
1427272877 | Variance | Square of the standard deviation | 23 | |
1427272878 | Recentering | -Adding the same number c to all values in the set -Shape and spread stay the same (so standard deviation) but slides distribution by C - adds C to median and mean | 24 | |
1427272879 | Rescaling a data set | -Same basic shape -Stretches or shrinks distribution -Multiplies spread by d and center by d | 25 | |
1427272880 | Resistant to outliers | Summary statistic is not changed when outlier is removed from data | 26 | |
1427272881 | Sensitive to outliers | Statistic changes when outliers are removed | 27 | |
1427272882 | Percentiles | A value is at the kth percentile if k% of all values are less than or equal to it | 28 | |
1427272883 | Cumulative percentage plot/cumulative relative frequency plot | Displays values on X axis and percentile on y-axis | 29 | |
1427272884 | Standard Normal Distribution | Normal distribution with mean 0 and standard deviation 1, x-axis variable is the z-score | 30 | |
1427272885 | Standardizing | Recenter by subtracting mean, rescale by dividing by standard deviation | 31 | |
1427272886 | Using Normalcdf | (min, max, mean, standard deviation) or use z-scores - gets percentage of the area enclosed by those values | 32 | |
1427272887 | Central Intervals for normal distribution | -68% within 1 standard deviation -90% within 1.645 -95% within 1.96 -99.7 (or almost all) within 3 | 33 | |
1427272888 | Describing the pattern of a scatterplot | -Identify cases and variables -Describe shape (linearity, clusters, outliers) -Trend (positive/negative) -Strength (strong? weak?) -Does it vary in strength? Constant strength? -Generalize to other cases? -Explanations? Lurking variables? | 34 | |
1427272889 | Properties of least squares regression line | -Sum and mean of residuals is 0 -Contains the point of averages (x-bar, y-bar) -The standard deviation of the residuals is smaller than for any other line that goes through the point (x, y) -Slope b1 = r(sy/sx) | 35 | |
1427272890 | Lurking Variable | Correlation does not imply causation - a variable that you didn't include in your analysis but that might explain the relationship between the variables you did include | 36 | |
1427272891 | Regression towards the mean | On a scatterplot, the difference between the regression line and the major axis of the elliptical cloud | 37 | |
1427272892 | Potentially influential points | To judge if an outlier is potentially influential, compare the regression equation and correlation with and without the point | 38 | |
1427272893 | Exponential growth and decay | If you have a curved scatterplot, try replacing y with logy Makes equation y = ab^x or logy = loga + (logb)x | 39 | |
1427272894 | Power functions | If you have a curved scatterplot, also try a log-log transformation. Makes equation y = ax^b or logy = loga + blogx | 40 |