Statistics I Midterm Flashcards

939071271	Categorical Variables	Consist of Nominal and Ordinal Variables
939071272	Continuous Variables	Consist of Interval Scale and Ratio Scale
939071273	Nominal Variable	A type of categorical variable that has 2 levels (binary, like military vs. civilian) or 3+ levels (branches of the military)
939071274	Ordinal Variable	A type of categorical variable where categories have logical order (e.g. ranks in the navy)
939071275	Interval Scale	A type of continuous variable in which there are equal distances between intervals (e.g. questionnaire ratings)
939071276	Ratio Scale	A type of continuous variable in which there is an absolute zero point (e.g. height)
939116406	Covariance	A measure of the degree of relationship between 2 variables (week 6, slide 60). H₀: Covariance in the population = 0 COVxy = (∑[X-Xbar][Y-Ybar])/(N-1)
939116407	Correlation (r)	A measure of the degree of relationship between two variables (week 6, slide 60), effect size measure. H₀:ρ = 0 (between -1 and +1) r = COVxy/SxSy
939116408	Regression Coefficient (b)	Slope of the regression line (week 6, slide 60). Change in y for a 1 unit change in x. Line passes through (Xbar, Ybar) and (0, a). When r = 0, b = 0. Beta coefficient = regression coefficient when x and y standardized. H₀: b* = 0 b = COVxy/Sx²
939116409	Intercept (a)	Predicted y when x = 0 (week 6, slide 60). H₀: a* = 0 a = Ybar - (b)Xbar A value often not of interest
939116410	R Square (r²)	% of Variability in the dependent variable (dv) that is accounted for by variability in the predictor variable (week 6, slide 60). H₀:ρ² = 0 r² = ssγbar/ssγ (Effect Size Measure) ADJUSTED r² is an UNBIASED ESTIMATOR. (ρ²)
939116411	Standard Error of the Estimate	*Do not need to know formula* (week 6, slide 60) Measure of: -Degree to which points diverge from the regression line -Accuracy of prediction -Square root of error variance If the standard error of the estimate is ) = no errors of prediction/no residuals If the standard error of the estimate is LARGE = residuals are large
939116412	1-Sample Z-Test	*Do not need to know formula* Compare 1 sample mean to known population mean µ₂ (also when population SD, σ, is known) (week 5, slide 4). H₀: µ₁ = µ₂ Example: Test if ACBC scores of 15 hospitalized children are different from population mean of 50 (σ = 10).
939116413	1-sample (student) t-test	*Do not need to know formula* Compare 1 sample mean to known population mean (week 5, slide 4). Population mean µ₂ (but not σ) of comparison mean known. H₀: µ₁ = µ₂ Example: Test if PDI scores of 56 LBW infants are different from population mean of 100. Assumption: Score is normally distributed in the population
939116414	Paired sample (student) t-test (WITHIN subject t-test)	*Do not need to know formula* Compare 2 sample means from same subjects. Population parameters are NOT KNOWN. H₀: µd = 0 (µ₁-µ₂ = 0) Example: Test of change in weight (difference scores) in 17 anorexics from pre- to post-family therapy. Assumption: Difference score is normally distributed in population.
939116415	Independent-samples (Student) t-test (BETWEEN subject t-test)	*Do not need to know formula* Compare 2 sample means from different subjects. Population parameters are NOT KNOWN. H₀: µ₁ = µ₂ Example: Test if Caucasians in stereotype threat condition do worse than Caucasians in control condition on math problem. Notes: You need to use pooled variances for unequal population sizes. Assumptions: Normality σ1² = σ2² If σ1² ≠ σ2² use Satterthwaite t'test.
939134801	Independent Variable	What is manipulated by the experimenter, predictor variable or explanatory variables, IV
939134802	Dependent variable	What is measured, outcome variable or criterion variable, DV
939134803	Random Assignment to Conditions	(Week 1, slide 41) Is important because it ensures high internal validity. -Strength of study design -Ability to draw causal Inferences
939134804	Internal Validity	(Week 1, slide 37) A measure of how well a research study has been designed. High: We can draw strong causal inferences. Low: We cannot draw strong causal inferences.
939134805	Random SAMPLE/SELECTION of Pariticipants	(Week 1, slide 41) It is important because it ensures high external validity. -Whether study sample/s reflect population under investigation -Ability to state if results apply to population of interest
939134806	External Validity	(Week 1, slide 40) Whether sample/s reflects population. High: Sample representative of population - results likely generalize to population Low: Sample not representative of population - results may not generalize to (unsampled populations)
939134807	Parameter	e.g. Mean favorable ratings in a POPULATION
939134808	Statistics ("guesses")	e.g. Mean favorable ratings in SAMPLE/S
939134809	Population	(Week 1, slide 44) Population characteristics = Parameters (Normally) Invisible to investigator Denoted by Greek letters (e.g.): µ (mu) = Population mean σ (sigma) = Population standard deviation ρ (rho) = correlation in the population
939134810	Sample	Sample Characteristics = Statistics Visible to investigator Denoted by Roman letters (e.g.): M or Xbar = Sample mean SD or s = Sample standard deviation r = correlation in a sample
939134811	Inferential Statistics	Making inferences about POPULATION parameters
939134812	Descriptive statistics	Describing the SAMPLE/S, no reference to population parameters
939273963	Normal Distribution	Looks symmetrical, normal, unimodal
939273964	Bimodal Distribution	Has 2 humps
939273965	Negatively Skewed	Most of the data is on the right side (highest point) and very little to no data on the left side, i.e. tail points in the negative direction. Mean > Median > Mode
939273966	Positively Skewed	Most of the data is on the left side (highest point) and very little to no data on the right side, i.e. tail points in the positive direction. Mean < Median < Mode
939273967	Platykurtic/Negative Kurtosis	Kind of flat on the top, no one peak, flattest
939273968	Leptokurtic/Positive Kurtosis	Pointier than normal, peaky, a few points, pointiest
939273969	Mesokurtic (Normal)	In between Platykurtic and Leptokurtic, a normal peak
939273970	Example of Outliers	Reaction time data
939273971	Mode	Measure of central tendency, represents most common score
939273972	Median	Measure of central tendency, represents the middle number (N is odd) or the average of two middle numbers (N is even). MEDIAN LOCATION = (N+1)/2 Unbiased, resistant estimator.
939273973	Mean	Measure of central tendency, average score Unbiased, sufficient, and efficient estimator.
939273974	Range	Measure of variability or dispersion, it is the distance from the lowest to the highest score
939273975	Variance	Measure of variability or dispersion, it is the standard deviation squared. Summation of the squared differences of X from Xbar divided by (N-1) sx² Doesn't have a natural interpretation. Always greater than the standard deviation but less than the sums of squares.
939273976	Standard Deviation	Measure of variability or dispersion, it is the square root of the variance. Square root of(Summation of the squared differences of X from Xbar divided by (N-1)). Average deviation from the mean. sx. Always less than the variance and sums of squares.
939273977	Skew	(Mean-Median)/SD Measure of asymmetry of distribution Positive = right-tailed Negative = left-tailed
939273978	95% Confidence Interval	This is a sample statistic. We don't know what the true mean is. This means that we have 95% confidence that the true mean, the population mean, likely lies between x and y.
939273979	Properties of Estimator of Population Parameters	Sufficiency, Unbiasedness, Efficiency, Resistence
939273980	Why is the mean the predominant measure of central tendency?	The mean has an equation. It is influenced by outliers, so on the point of resistance it doesn't do too well. IT is sufficient because everything has a part in computing the mean. The mean is efficient because it is likely that the population mean will be similar to the sample mean. The SD of the mean is smaller than it is for the medians so it is more efficient. Unbiasedness (both mean and median are unbiased)- the grand mean of the population, the sample mean is a good estimate of the population mean. If you sample n=5 10,000 times, the distribution of those means will be normal and the mean of those means will be exactly the same as the population mean. The distribution of the means sampled is the standard error.
939273981	Sufficiency	Makes use of all data
939273982	Unbiasedness	Expected value = population parameter
939273983	Efficiency	Samples cluster tightly around parameter
939273984	Resistance	Not influenced by outliers.
939273985	Outlier	Week 1, Slide 140 Often observations > ± SDs from the mean. May reflect processes not under investigation.
939273986	Kurtosis	Week 1, slide 140 Measure of "peakedness" of distribution. Positive = pointy, negative = flat
939273987	Trimodal	3 peaks in a distribution
939581300	Mean = Median	0 skew
939581301	Mean > Median	Positive Skew
939581302	Normal Distribution / "Bell-Shaped Curve"	Unimodal (1 peak), Symetrical (skew = 0), Mesokurtic (kurtosis = 0), Mathematically defined (do not need to know) Gaussian distribution, There is an infinite number of normal distributions corresponding to different values of µ and σ.
939581303	Standard Normal Distribution	µ = 0, σ = 1
939581304	Standard Scores	(Week 2, slide 43) = z scores = z values. Indicates how many standard deviations an observation is above or below the mean. The unit of measurement of the z-score is the standard deviation. Z score of +1 = score 1 SD above the mean Z score of +0.5 = score half a standard deviation above the mean Z score of -1 = score 1 SD below the mean Z score of -0.5 = score half a standard deviation below the mean Z score for population data = z = (X-µ)/σ Z score for sample data = z = (X-Xbar)/sx
939648546	Why is the normal distribution so important?	-Many variables appear normally distributed -If variable normal, can make many inferences about values of variable -Many statistical procedures assume scores are normally distributed in the population
939648547	Tests for normality	-Eyeballing -Quantile-Quantile Plots (normal sample will have a close to straight line, y=x, non-normal sample will have large deviations from a straight line) -Kolmogorov-Smirnov Test (If significance is greater than 0.05, then we can assume normality) If Normal --> Use "parametric test" If NOT normal --> Use "distribution-free" test, use transformation
939648548	T-statistics and the null hypothesis	The t-value is distributed around 0 when the null hypothesis is true. When null is true, it is unlikely to get a t value much bigger or smaller than 0.
939648549	Steps in Hypothesis Testing	-State Alternative/Research Hypothesis -State Null Hypothesis -Collect data -Construct/consult sampling distribution of a particular statistic on the assumption that H0 is true Compare obtained sample statistic to distribution above Decision: Reject H0 or Do not reject H0 based on the probability of observing a sample statistic at least as extreme as the one obtained if the null hypothesis were true (p value)
939648550	Ronald Fisher	In hypothesis testing -Sampling distributions -Hypothesis Testing & p value -Design of experiments -ANOVA
939648551	Karl Pearson	In hypothesis testing -Pearson's Correlation Coefficient (r) -Pearson's Chi Square Statistic
939648552	P value	(Week 2, slide 90) "The probability of obtaining a pattern of data at least as extreme as the one that was actually observed, given that the null hypothesis is true" -Probability -Varies between 0 and 1 -Appears in virtually every empirical study in science -Conditional probability: p(D\|H0) -Widely misunderstood -NOT the probability that the null hypothesis is true (it is NOT p(H0\|D))
939648553	If we reject H0...	We accept the alternative hypothesis (H1) (μ1 ≠ μ2)
939648554	If we Do not reject H0	Fisher - Suspend judgment
939648555	H0 is true, Reject H0	(1) Type 1 error, α (alpha)
939648556	H0 is false, Reject H0	(2) Correct Decision, 1 - β Power = 1 - β
939648557	H0 is true, Do not reject H0	(3) Correct Decision, 1 - α
939648558	H0 is false, Do not reject H0	(4) Type II Error, β
939648559	α (alpha)	Set by the experimenter (normally to 0.05) before data collection, probability of a type I error. Probability of (incorrectly) rejecting H₀ given that H₀ is true (conditional probability).
939648560	How do you find the number of significant findings when the null is true?	Multiple trials/simulations by p value to get the approximate number of significant findings.
939648561	β	A function of a particular alternative hypothesis, the probability of a type II error. The probability of (incorrectly) failing to reject H₀ when a particular alternative is true (and H₀ is false) (conditional probability).
939648562	Power	Probability of correctly rejecting a false H₀ when a particular alternative hypothesis is true. Power = 1 - β, a/k/a Type II Error
939648563	In "real" experiments, i.e. when the true state of the world is not known...	If we reject H₀ we do not know if we have made a Type II error or a Correct decision.
939648564	1-Tailed Tests	Determined by the experimenter prior to data collection, Strong directional hypothesis (e.g. µ1 > µ2) - Reject H₀ only if difference is in one particular direction. If strong directional hypothesis and no reason to suspect the effect can go in the opposite direction, then 1-tailed can be used. Divide p value by 2 to to get 1-tailed p value.
939648565	2-Tailed Tests	Reject H₀ if difference goes in either direction. Determined by the experimenter prior to data collection. It is preferred. Ordinarily from SPSS. Represents double the 1-tailed value.

Class Notes

Social Science

Math

Science

Fine Arts

Test Prep

Textbook Notes

Members Only

Forum

Blogs

Textbook Request

Statistics I Midterm Flashcards

Primary tabs

Need Help?

Need Notes?

About Course-Notes.Org

You are here

Statistics I Midterm Flashcards

Primary tabs

Need Help?

Need Notes?

About Course-Notes.Org