Statistics Flashcards

4790636786	Arcsine Transform	When the data are proportions it is usually recommended that they be transformed with the arcsine transform. This takes the original data x and converts it to the transformed data y using this formula. (Jk08-28)	0
4790636787	Average Deviation	the mean absolute deviation that measures the absolute difference between the mean and each observation. This measure of deviation is not as well defined as is the standard deviation, partly because the mean is the least squares estimator of central tendency -so a measure of deviations that uses squared deviations is more comparable to the mean. (Jk08-718-734)	1
4790636788	Bimodal Frequency Distribution	A combination of two normal distributions -there are two peaks. If you find that your data fall into this distribution you might consider whether the data actually represent two separate populations of measurements. (Jk08-481)	2
4790636789	Central Tendency	In statistics, a __________________ (or more commonly, a measure of ____________________) is a central value of a typical value for a probability distribution. It is occasionally called an average or just the center of the distribution. (Jk08-22)	3
4790636790	Data Reduction	Summarize trends, capture the common aspects of a set of observations such as the average, standard deviation, and correlation among variables. (Jk08-304) In data reduction, we can describe the whole frequency distribution with just two numbers -the mean and the standard deviation. (Jk08-514)	4
4790636791	Degree of Freedom	In trying to measure variance we have to keep in mind that our estimate of the central tendency x-barra is probably wrong to a certain extent. We take this into account by giving up a "degree of freedom" in the sample formula. Degree of freedom is a measure of how much precision an estimate of variation has. (Jk08-734)	5
4790636792	x-bar		6
4790636793	Dispersion	We usually want to also know how closely clustered the data are around the central point or most typical value in the data. That is, how dispersed are the data values away from the center of the distribution? The minimum possible amount of dispersion is the case in which every measurement has the same value. (Jk08-718)	7
4790636794	Frequency Distribution	An arrangement of the values that one or more variables take in a sample. Each entry in the table contains the frequency of count of the occurrences of values within a particular group or interval, and in this way, the table summarizes the distribution of values in the sample. (Wikipedia)	8
4790636795	Histogram	A graphical representation of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson. (Wikipedia)	9
4790636796	Inference	Generalize from a representative set of observations to a large universe of possible observations using hypothesis tests such as the t-test or analysis of variance. (Jk08-210) The normal distribution provides a basis for drawing inferences about the accuracy of our statistical estimates. (Jk08:515)	10
4790636797	Interval	This is a property that is measured on a scale that does not have a true zero value. In an interval scale, the magnitud of differences of adjacent observations can be determined (unlike the adjacent items on an ordinal scale), but because the zero value on the scale is arbitrary the scale cannot be interpreted in any absolute sense (Farenheit or Celcius scales). (Jk08-323)	11
4790636798	J-Shaped Frequency Distribution	This is a kind of skewed distribution with most observations coming from the very end of the measurement scale. For example, if you count speech errors per utterance you might find that most utterances have a speech error count of zero. So in a histogram, the number of utterances with a low error count will be very high and will decrease dramatically as the number of errors per utterance increases. (Jk08-481)	12
4790636799	Least squares estimates of central tendency	This means that if we take the difference between the mean and each value in our data set, square these differences and add them up, we will have a smaller value than if we were to do the same thing with the median or any other estimate of the "mid-point" of the data set. (Jk08669-674) This property is a very useful one of the derivation of statistical tests of significance. (Jk08:674)	13
4790636800	Mean	Also referred to as the arithmetic average, this is the least squares estimate of central tendency. First, how to calculate the mean -sum the data values and then divide by the number of values in the data set. (Jk08:660)	14
4790636801	Measures of Central Tendency	1. Mode 2. Median (center of gravity) 3. Mean (arithmetic average)	15
4790636802	Mode	The most frequently occurring value in the distribution --the tip of the frequency distribution. (Jk08:645)	16
4790636803	Nominal	Named properties --they have no meaningful order on a scale of any type. (Jk08:323) Examples: What language is being observed? What dialect.	17
4790636804	Normal Distribution	This is an especially useful theoretical function... If this is a good description of the source of variability in our measurements, then we can model this situation by assuming that the underlying property is at the center of the frequency distribution that we observe in our measurements and that the spread of the distribution is caused by error, with bigger errors being less likely to occur than smaller errors. (Jk08-443)	18
4790636805	Normal Distribution (2)	In this distribution, measurements tend to congregate around a typical value and values become less and less likely as they deviate further from this central value. (Jk08-461)	19
4790636806	Normal Distribution (3)	The curve of this distribution is defined by two parameters --what the central tendency is (M) and how quickly probability goes down as you move away from the center of the distribution (s). (Jk08-475)	20
4790636807	Descriptive Properties	Each observation has these. Some of these will be qualitative and some will be quantitative-- and descriptive properties (variables) come in one of four types: (i) Nominal, (ii) Ordinal, (iii) Interval, and (iv) Ratio. (Jk08-323)	21
4790636808	Ordinal	Orderable propierties --they aren't observed on a measurable scale, but this kind of property is transitive so that if is less than and is less than then is also less than . (e.g. excellent, good, fair, poor) (Jk08:323)	22
4790636809	Probabilty	One of the main goals of quantitative analysis is the exploration of processes that may have a basis in probability: theoretical modeling, say in information theory, or in practical contexts such as a probabilistic sentence parsing. (Jk08:515) We can quantify the difference between the sample mean and the hypothesized population mean in terms of a probability. (Jk08-1173)	23
4790636810	Probability Density Function (p.d.f.)	As probability theory is used in quite diverse applications, terminology is not uniform and sometimes confusing. This is most often reserved for continuous random variables. (Wikipedia) This takes the familiar bell-shaped curved. (Jk08:962)	24
4790636811	Probability Plot	This plot is a graphical technique for assessing whether or not a date set follows a given distribution such as the normal or Weibull. The data are plotted against a theoretical distribution in such a way that the points should form approximately a straight line. Departures from this straight line indicates departures from the specified distribution. (Jk08:583)	25
4790636812	Advantages of q-q plot	(1) The sample sizes do not need to be equal. (2) Many distributional aspects can be simultaneously tested. For example, shifts in location, shifts in scale, changes in symmetry, and the presence of outliers can all be detected from this plot. (Jk08-583)	26
4790636813	Quantile	By this, we mean the fraction (or percent) of points below the given value. That is the 0.3 (or 30%) one is the point at which 30% of the data fall below and 70% fall above that value. (Jk08-568)	27
4790636814	Quantile-quantile (q-q) plot	This is a graphical technique for determining if two data sets come from populations with a common distribution. This is a plot of the quantiles of the first data set against the quantiles of the second data set. A 45-degree reference line is also plotted. If the two sets come from a population with the same distribution, the points should fall approximately along this reference line. The greater the departure from this reference line, the greater the evidence of the conclusion that the two data sets have come from populations with different distributions. (Jk08-586)	28
4790636815	Quantitative Analysis	The four main goals of this are: (1) data reduction, (2) inference, (3) discovery of relationships, and (4) exploration of processes that may have a basis in probability. (Jk08-304)	29
4790636816	Range	A simple, but not very useful measure of dispersion is the range of the data values. This is the difference between the maximum and minimum values in the data set. (Jk08-78)	30
4790636817	Ratio	This is a property that we measure on a scale that does have an absolute zero value. This is called a ratio scale because ratios of these measurements are meaningful. Examples: acoustic measures --frequency, duration, frequency counts, reaction time. (Jk08-323)	31
4790636818	Relationships Discovery	Find descriptive or casual patterns in data which may be described in multiple regression models or in factor analysis. (Jk08-515)	32
4790636819	Root Mean Square (RMS)	The variance is the average squared deviation --the units are squared-- to get back to the original unit of measure we take the square root of the variance. This is the same as the value known as the RMS (root mean square), a measure of deviation used in acoustic phonetics (among other disciplines). (Jk08-734-755)	33
4790636820	Sum of The Squared Deviations		34
4790636821	Skewed Frequency Distribution	If measurements are taken on a scale, as we approach one end of the scale the frequency distribution is bound to be skewed because there is a limit beyond which the data values cannot go. We most often run into skewed frequency distributions when dealing with percentage data and reaction time data (where negative reactions times are not meaningful). (Jk08-475)	35
4790636822	Standardizing a data set	We can relate the frequency distribution of our data to the normal distribution because we know the mean and standard deviation of both. The key is to be able to express any value in a data set in terms of its distance in standard deviations from the mean. This way of expressing data values, in standard deviation units, puts our data on the normal distribution --where the mean is 0 and the standard deviation is 1. (Jk08-780-803)	36
4790636823	Transformation	One standard method that is used to make a data set fall on a more normal distribution is to transform the data from the original measurement scale and put it on a scale that is stretched or compressed in helpful ways. (Jk08-630)	37
4790636824	Types of Distribution	Data come in a variety of shapes of frequency distributions: (a) uniform, (b) skewed, (c) bimodal, (d) normal, (e) J-shaped, (f) U-shaped.	38
4790636825	U-shaped Frequency Distribution	A very polarized distribution of results. If you ask a number of people how strongly they supported the US invasion of Iraq most people world be either strongly in favor or strongly opposed with not too many in the middle. (Jk08-491)	39
4790636826	Uniform Frequency Distribution	If every outcome is equaly likely then the distribution is uniform. This happens for example with the six sideds of a dice -each one is (supposed to be) equally likely, so if you count up the number of rolls that come up "1" it should be on average 1 out of every 6 rolls. (Jk08-461)	40
4790636827	Variance	Variance is like the mean absolute deviation except that we square the deviations before averaging them. The variance is the average squared deviation --the units are squared--. (Jk08-734)	41
4790636828	Population Variance		42
4790636829	Sample Variance		43
4790636830	Weighted Mean	Suppose you asked someone to rate the grammaticality of a set of sentences, but you also left the person rate their ratings, to say that they feel very sure or not very sure at all about the rating given. These confidence values can be used as weights (Wi) in calculating the central tendency of the ratings, so that ratings given with high confidence influence the measure more than ratings given with a sense of confusion. (Jk08-695-718)	44
4790636831	Z-scores	The data values are converted into z-scores when each data value is replaced by the distance between it and the sample mean where the distance is measured as the number of standard deviations between the data value and the mean. Z-scores always have a mean of 0 and a standard deviation of 1. (Jk08-803)	45
4790739050	population mean	mu Calculated by adding up all the values in a population and dividing by the number of items in that population Sum(all items) ------ number of items	46
4790777938	t statistic	used for validating or invalidating null hypothesis where it's less than 30 and you can't use the normal distribution follows a "normalized T distribution"	47
4790802748	standard deviation	"A measure of how much the data is varying from the mean"	48
4797961535	confidence interval	The confidence with which you can determine that a sample mean reflects a population mean within two parameters. E.G. I can say with 95% probability that the value of the population mean is .568 plus or minus .08.	49
4798191622	type 1 error	rejecting Ho when it is actually true	50
4798193819	type II error	rejecting H1 when it is actually true. Likelihood of beta is how likely type II error is.	51
4798197994	POWER	1 - beta. Null hypothesis is false and you reject it.	52
4798207648	parts of all statistical tests	Hypotheses test statistic critical value	53
4798208725	critical value	the probability value at reject the null hypothesis	54
4805793836	degrees of freedom	the number of values that are free to vary given summary statistics we are using.	55
4805815079	t test	you estimate the standard of deviation by using sample means. Totally different test. It gets more accurate as the degrees of freedom approach infinity, because you are getting closer to the population mean.	56
4805912551	two sample t test	test with two samples, and you are often comparing if the mean of both is no different. For example, a control group and a treatment group. null Hypothesis= u1-U2=0 hypothesis= U1-U2 does not equal 0	57
4805925626	causal inference	items or people are randomly assigned to groups such that the independent variable really is causing the difference in the dependent variable.	58
4805948356	nonrandom assignment	differences in results...could be explained by independent variable or other influences (If parents signed up the kids, that could be it).	59
4805999165	pooled standard deviation	used to compare two samples in a t test and combine their error. based on the assumption that the groups in the population having equal variance.	60
5026128952	factorial design	design used when you have more than one factor in a statistical analysis	61
5026135987	factor	In an experiment, the factor (also called an independent variable) is an explanatory variable manipulated by the experimenter. Each factor has two or more levels, i.e., different values of the factor. from Stattrek.com	62
5026176724	interaction effect	the effect of two factors collectively on a dependent variable. The main reason you perform a two factor ANOVA is to see the interaction effect as well as the main effects of the variables	63
5026218369	ANOVA	Analysis of Variance--	64
5026256375	Multiple Comparison Procedures (MCP)	What do you do if you reject the null hypothesis in ANOVA. If J>2--you don't know which groups yet... Only two groups--we know those two are the different ones. Used to control family-wise type 1 error rate and determine which groups are significantly different from each other. In general, you should do planned comparisons based on Theory; post-hoc should be reserved for exploratory situations. planned: higher power; limited to only a beforehand/a priori hypothesis Post hoc: can test everything (an infinity of tests); but... lower power	65
5026270518	ANCOVA	looking at covariance and filtering out the effects of "nuisance variables" to show the relationship between the independent variable and the dependent variable.	66
5026290207	two-factor ANOVA	Analysis of variance, also called ANOVA, is a collection of methods for comparing multiple means across different groups. It "Computes an F-Ratio of the variance between groups over the variance inside the groups, both scaled by their respective degrees of freedom: MSB/MSW. You take the variance??? of the means you want to compare, and you find the fstatistic critical number based on alpha (probability at which you reject the null hypothesis) using the degrees of freedom of the numerator and the the degreees of freedom of the denominator, v2. when you reject the null: Treatment + error It's an Omnibus test--you can test all variables/everything overall	67
5034798612	least squares	the best fit line with the lowest variance (squared error)	68