Z Score
For industrial and technical standards, see Standardization.
For Z-values in ecology, see Z-value.
For Z-factor in high-throughput screening, see Z-factor.
In statistics, a standard score is a dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation. This conversion process is called standardizing or normalizing; however, "normalizing" can refer to many types of ratios; see normalization (statistics) for more.
Standard scores are also called z-values, z-scores, normal scores, and standardized variables; the use of "Z" is because the normal distribution is also known as the "Z distribution".
They are most frequently used to compare a sample to a standard normal deviate (standard normal distribution, with μ=0 and σ=1), though they can be defined without assumptions of normality.
The standard score indicates how many standard deviations an observation is above or below the mean: the standard deviation is the unit of measurement of the z-score. It allows comparison of observations from different normal distributions, which is done frequently in research.
The z-score is only defined if one knows the population parameters, as in standardized testing; if one only has a sample set, then the analogous computation with sample mean and sample standard deviation yields the Student's t-statistic.
The standard score is not the same as the z-factor used in the analysis of high-throughput screening data, but is sometimes confused with it.
Formula
The standard score is
where:
x is a raw score to be standardized;
μ is the mean of the population;
σ is the standard deviation of the population.
The quantity z represents the distance between the raw score and the population mean in units of the standard deviation.
z is negative when the raw score is below the mean, positive when above.
A key point is that calculating z requires the population mean and the population standard deviation, not the sample mean or sample deviation. It requires knowing the population parameters, not the statistics of a sample drawn from the population of interest.
But knowing the true standard deviation of a population is often unrealistic except in cases such as standardized testing, where the entire population is measured. In cases where it is impossible to measure every member of a population, the standard deviation may be estimated using a random sample.
For example, a population of people who smoke cigarettes is not fully measured.
When a population is normally distributed, the percentile rank may be determined from the standard score and statistical tables.
Related statistics
If using sample mean and sample standard deviation (rather than the population mean and standard deviation), the resulting ratio is the (single-sample) Student's t-statistic. In regression analysis, one instead uses the studentized residual, as the standard error of estimates of response variables vary for different input explanatory variables.
The T score statistic is a simple transformation of the z score, calculated using the formula
T = (z * 10) + 50
The T score has a mean of 50 and a standard deviation of 10 (Carroll, Carroll & 2002 , p.56).
Applications
The z-score is most often used in the z-test in standardized testing – the analog of the Student's t-test for a population whose parameters are known, rather than estimated.
As it is very unusual to know the entire population, the t-test is much more widely used.
Darby and Reissland (1981) make use of z-scores as a way of understanding the contributions from various subsets of data to an overall test of trend. The overall analysis was of trends in the rate of occurrence of cancer and the subsets considered approximately 55 different types of cancer, together with various groupings of these types.