# What is a STATISTICAL formula in a spreadsheet?

Statistical formulas and functions are used for statistical analysis. They include measures of central tendency (mean, median, mode), dispersion (variance, standard deviation), regression analysis, hypothesis testing, and other statistical calculations.

## STATISTICAL formula usage examples.

The AVERAGEIF function calculates the average of a range of cells that meet a specific criteria. It takes three arguments: criteria_range, which is the range of cells to be evaluated for the criteria; criterion, which is the criteria or condition that the cells must meet; and [average_range], which is an optional range of cells to be averaged. The function returns the average of the cells that meet the criteria.

The AVERAGEIFS function calculates the average of a range based on multiple criteria. It takes an average range and one or more criteria ranges, along with their corresponding criteria. The function only includes values in the average range that meet all of the specified criteria. This function is useful when you need to calculate an average based on specific conditions.

The BETA.DIST function calculates the probability of a given value occurring within a specified range, based on the beta distribution. It is commonly used in statistical analysis to model uncertainty and estimate probabilities. The function takes parameters such as value, alpha, beta, cumulative, lower_bound, and upper_bound to define the specific characteristics of the beta distribution.

The BETA.INV function is used to calculate the value of the inverse beta distribution function for a given probability. It is commonly used in statistical analysis and risk assessment. The function takes the probability, alpha and beta parameters, and the lower and upper bounds of the distribution as inputs. It returns the value that corresponds to the given probability in the beta distribution.

The BINOM.DIST function calculates the probability of a certain number of successes in a series of trials. It is based on the binomial distribution and takes four arguments: the number of successes, the number of trials, the probability of success in each trial, and a flag indicating whether to calculate the cumulative probability or not.

The BINOM.INV function is used to calculate the number of successful trials required to achieve a target probability in a binomial distribution. It takes three arguments: num_trials (the number of trials or observations), prob_success (the probability of success in each trial), and target_prob (the desired target probability). The function returns the smallest value for which the cumulative binomial distribution is greater than or equal to the target probability.

The BINOMDIST function calculates the probability of a certain number of successes in a certain number of trials, given a probability of success for each trial. It is commonly used in statistics and probability calculations. The function takes four arguments: the number of successes, the total number of trials, the probability of success for each trial, and a boolean value indicating whether to calculate the cumulative probability.

The CHIINV function calculates the inverse of the right-tailed chi-squared distribution. It is commonly used in statistical analysis to determine critical values, estimate confidence intervals, and calculate sample sizes for chi-squared tests. The function takes two arguments: the probability (a value between 0 and 1) and the degrees of freedom (a positive integer). The probability represents the area under the chi-squared distribution curve to the right of the critical value, and the degrees of freedom determine the shape of the distribution.

The CHISQ.DIST.RT function calculates the right-tailed chi-squared distribution, which is commonly used in hypothesis testing. It returns the probability that a value from the chi-squared distribution is greater than the given value. The function takes two arguments: x, which represents the value at which to evaluate the distribution, and degrees_freedom, which represents the degrees of freedom for the distribution.

The CHISQ.DIST function calculates the left-tailed chi-squared distribution, which is often used in hypothesis testing. It returns the probability that a value falls within a specified range in a chi-squared distribution. The 'x' parameter represents the value at which to evaluate the distribution, the 'degrees_freedom' parameter represents the degrees of freedom, and the 'cumulative' parameter determines whether to calculate the cumulative distribution or the probability density function.

The CHISQ.INV function calculates the inverse of the left-tailed chi-squared distribution. It returns the value x for which the cumulative distribution function (CDF) of the chi-squared distribution is equal to the given probability. The degrees_freedom parameter specifies the number of degrees of freedom for the chi-squared distribution.

The CHITEST function is used to perform a chi-squared test of independence or goodness-of-fit in Excel. It compares observed and expected frequencies to determine if there is a significant association or difference between variables. The function returns the probability of observing the given frequencies under the null hypothesis of independence or goodness-of-fit.

The CONFIDENCE.T function is used to calculate the width of half the confidence interval for a Student's t-distribution. It takes three arguments: alpha, which represents the significance level; standard_deviation, which is the standard deviation of the population; and size, which is the sample size. The function returns the width of half the confidence interval.

The CORREL function calculates the Pearson product-moment correlation coefficient between two datasets. It measures the strength and direction of the linear relationship between the two datasets. The function takes two arguments: data_y and data_x. Data_y represents the dependent variable, while data_x represents the independent variable. The function returns a value between -1 and 1, where -1 indicates a perfect negative correlation, 1 indicates a perfect positive correlation, and 0 indicates no correlation.

The COVAR function calculates the covariance between two sets of data. Covariance measures the relationship between two variables and indicates how changes in one variable are associated with changes in another variable. A positive covariance indicates a positive relationship, while a negative covariance indicates a negative relationship. The COVAR function is commonly used in statistical analysis and portfolio management to assess the relationship between variables and evaluate risk and return.

The COVARIANCE.P function calculates the covariance between two sets of data. It measures the relationship between the variables and indicates how changes in one variable are associated with changes in the other variable. The function returns the population covariance, which is an unbiased estimate of the true covariance.

The F.DIST function calculates the left-tailed F probability distribution, also known as the Fisher-Snedecor distribution or Snedecor's F distribution. It is used to analyze the degree of diversity between two data sets. The function takes four arguments: x, which is the input value; degrees_freedom1, which represents the degrees of freedom for the numerator; degrees_freedom2, which represents the degrees of freedom for the denominator; and cumulative, which is a logical value indicating whether to calculate the cumulative distribution function (TRUE) or the probability density function (FALSE). The function returns the probability of observing a value less than or equal to x in the F distribution.

The F.INV.RT function calculates the inverse of the right-tailed F probability distribution. It is also known as the Fisher-Snedecor distribution or Snedecor's F distribution. This function is commonly used in statistical analysis to compare the variances of two populations. By providing the probability, degrees of freedom for the numerator, and degrees of freedom for the denominator, the F.INV.RT function returns the critical value or probability for the right-tailed F-distribution.

The F.INV function calculates the inverse of the left-tailed F probability distribution. It is also known as the Fisher-Snedecor distribution or Snedecor's F distribution. The function takes three arguments: probability, degrees_freedom1, and degrees_freedom2. The probability is the desired probability associated with the F-distribution. The degrees_freedom1 and degrees_freedom2 are the degrees of freedom for the numerator and denominator of the F-distribution, respectively.

The FISHER function returns the Fisher transformation of a specified value. The Fisher transformation is a mathematical function that converts a variable with a non-normal distribution into a variable with a more normal distribution. This transformation is commonly used in statistical analysis to stabilize variances and improve the validity of statistical tests.

The FISHERINV function returns the inverse Fisher transformation of a specified value. The Fisher transformation is a mathematical function that is commonly used to stabilize the variance of a variable and improve the accuracy of statistical analyses. The inverse Fisher transformation reverses this process and converts the transformed value back to its original scale.

The FORECAST.LINEAR function is used to predict a future value based on existing data points. It uses linear regression to calculate the best-fit line through the data and then extrapolates the predicted value based on the independent variable x. The function requires two arrays: data_y, which contains the dependent variable values, and data_x, which contains the corresponding independent variable values.

The FORECAST function calculates the expected y-value for a specified x based on a linear regression of a dataset. It takes three arguments: x, which is the value for which you want to calculate the expected y-value; data_y, which is the array or range of dependent y-values; and data_x, which is the array or range of independent x-values. The function uses the least squares method to find the best-fit line that represents the relationship between the x and y values, and then calculates the expected y-value for the specified x.

The GAMMA.DIST function calculates the gamma distribution, which is a two-parameter continuous probability distribution. It is commonly used to model waiting times, failure times, and other positive continuous variables. The function takes four arguments: x, which is the value for which we want to calculate the probability or density; alpha, which represents the shape parameter of the distribution; beta, which represents the scale parameter of the distribution; and cumulative, which is a logical value indicating whether to calculate the cumulative distribution function or the probability density function.

The GAMMADIST function is used to calculate the probability density function or cumulative distribution function of a gamma distribution. It takes four arguments: x, alpha, beta, and cumulative. The 'x' argument represents the value at which to evaluate the distribution. The 'alpha' argument represents the shape parameter of the distribution. The 'beta' argument represents the scale parameter of the distribution. The 'cumulative' argument is a logical value that determines whether to calculate the cumulative distribution function or the probability density function.

The GAMMAINV function is used to calculate the inverse of the gamma distribution or estimate the shape parameter of a gamma distribution. It takes three arguments: probability, alpha, and beta. The probability is the value at which to evaluate the inverse gamma distribution. The alpha and beta parameters define the shape and scale of the gamma distribution.

The INTERCEPT function is used to calculate the y-value at which the line resulting from linear regression of a dataset will intersect the y-axis (x=0). It takes two arguments: the dependent variable data (data_y) and the independent variable data (data_x). The function returns the y-intercept of the linear regression line.

The LARGE function returns the nth largest element from a data set, where n is user-defined. It is commonly used to find the top or bottom values in a dataset. The function takes two arguments: the range of data and the value of n. The range of data can be a single column or row, or a range of cells. The function returns the nth largest value from the data set.

The LOGNORM.DIST function returns the probability or cumulative probability of a value occurring in a log-normal distribution. It is commonly used in statistics and finance to model data that is positively skewed and has a long tail. The function takes three arguments: x, which is the value for which we want to calculate the probability or cumulative probability; mean, which is the mean of the distribution; and standard_deviation, which is the standard deviation of the distribution.

The LOGNORM.INV function is used to calculate the inverse of the cumulative distribution function (CDF) of a log-normal distribution. It returns the value at which the specified cumulative probability is reached. The function takes three arguments: x, the value at which to evaluate the inverse of the distribution; mean, the mean of the logarithm of the distribution; and standard_deviation, the standard deviation of the logarithm of the distribution.

The LOGNORMDIST function returns the value of the log-normal cumulative distribution with a given mean and standard deviation at a specified value. It is commonly used in statistics and finance to model data that follows a log-normal distribution. The function takes three arguments: x, which is the value at which to evaluate the distribution; mean, which is the mean of the log-normal distribution; and standard_deviation, which is the standard deviation of the log-normal distribution. The function returns the probability of observing a value less than or equal to x in the log-normal distribution.

The MARGINOFERROR function calculates the amount of random sampling error in a given range of values at a specified confidence level. It is commonly used in statistical analysis to estimate the precision of survey results or sample data. The function takes two arguments: 'range' which represents the range of values to be analyzed, and 'confidence' which specifies the desired confidence level for the estimation. The function returns the margin of error as a numeric value.

The NEGBINOMDIST function calculates the probability of drawing a certain number of failures before a certain number of successes in independent trials. It is based on the negative binomial distribution. The function takes three arguments: num_failures, num_successes, and prob_success. The num_failures argument represents the desired number of failures, the num_successes argument represents the desired number of successes, and the prob_success argument represents the probability of success in each trial.

The NORM.DIST function is used to calculate the probability of a value occurring in a normal distribution. It takes four arguments: x (the value for which to calculate the probability), mean (the mean of the distribution), standard_deviation (the standard deviation of the distribution), and cumulative (a logical value that determines whether to calculate the cumulative probability or the probability density function). The function returns the probability as a decimal value.

The NORMINV function returns the value of the inverse normal distribution function for a specified value, mean, and standard deviation. It is commonly used in statistical analysis to calculate Z-scores, generate random values that follow a normal distribution, and estimate percentiles in a normal distribution.

The PEARSON function calculates the Pearson product-moment correlation coefficient, which measures the linear relationship between two sets of data. It returns a value between -1 and 1, where -1 indicates a strong negative correlation, 0 indicates no correlation, and 1 indicates a strong positive correlation. The function takes two arguments: data_y and data_x, which represent the two sets of data to be analyzed.

The PERCENTILE.INC function is used to calculate the value at a given percentile in a dataset. It is an inclusive function, meaning that it includes the specified percentile value in the calculation. The function takes two arguments: the range of data and the percentile value. The range of data can be a single column or row, or a multi-column or multi-row range. The percentile value should be between 0 and 1, where 0 represents the minimum value in the dataset and 1 represents the maximum value. The function returns the value at the specified percentile.

The PERCENTILE function returns the value at a given percentile of a dataset. It is useful for analyzing data distribution and identifying outliers. The function takes two arguments: 'data' represents the range of cells or array containing the dataset, and 'percentile' is a decimal value between 0 and 1 that specifies the desired percentile.

The PERCENTRANK.INC function is used to calculate the percentage rank (percentile) of a specified value in a dataset. It returns the rank as a decimal value between 0 and 1, inclusive. The function takes three arguments: 'data' represents the dataset or range of values, 'value' is the value for which the percentage rank is calculated, and 'significant_digits' (optional) specifies the number of significant digits to use in the result. If 'significant_digits' is omitted, the default value is 3.

The PERCENTRANK function returns the percentage rank (percentile) of a specified value in a dataset. It calculates the relative position of the value within the dataset, indicating how it compares to other values. The function takes three arguments: the data range, the value for which the percentage rank is calculated, and an optional argument for specifying the number of significant digits in the result. The result is a decimal value between 0 and 1, where 0 represents the lowest rank and 1 represents the highest rank.

The PERMUTATIONA function returns the number of permutations for selecting a group of objects (with replacement) from a total number of objects. It takes two arguments: 'number' represents the total number of objects, and 'number_chosen' represents the number of objects to be selected in each permutation.

The QUARTILE.EXC function is used to calculate the quartile of a dataset. It returns the value nearest to a given quartile, excluding 0 and 4. The quartile_number parameter specifies which quartile to calculate, with 1 representing the first quartile (25th percentile), 2 representing the second quartile (50th percentile or median), and so on.

The QUARTILE.INC function is used to calculate the quartile of a dataset. It returns the value at a specified quartile position in a range of values. The quartile position is determined by the quartile_number parameter, where 1 represents the first quartile, 2 represents the second quartile (median), and so on. The function uses the inclusive method for calculating quartiles, which means it includes the quartile value in the calculation.

The QUARTILE function returns a value nearest to a specified quartile of a dataset. It divides the dataset into four equal parts, with each quartile representing a specific percentage of the data. The quartile_number argument specifies which quartile to calculate (e.g., 1 for the first quartile, 2 for the second quartile, etc.).

The RANK.AVG function is used to determine the rank of a specified value in a dataset. It returns the average rank if there are multiple entries of the same value. The function takes three arguments: 'value' is the value to rank, 'data' is the dataset to rank within, and 'is_ascending' is an optional argument that specifies whether the ranking should be in ascending order (TRUE) or descending order (FALSE).

The RANK.EQ function returns the rank of a specified value in a dataset. If there are multiple entries of the same value, the top rank of the entries will be returned. The function takes three arguments: 'value' is the value to be ranked, 'data' is the range or array containing the dataset, and 'is_ascending' is an optional argument that specifies whether the ranking should be in ascending order (1) or descending order (0). If 'is_ascending' is omitted, the default behavior is ascending order.

The RANK function returns the rank of a specified value in a dataset. It assigns a rank to each value based on its position in the dataset. The rank can be calculated in ascending or descending order, depending on the value of the optional 'is_ascending' parameter. If 'is_ascending' is set to 0, the ranks are assigned in descending order, with the highest value receiving a rank of 1. If 'is_ascending' is set to 1 or omitted, the ranks are assigned in ascending order, with the lowest value receiving a rank of 1.

The RSQ function calculates the square of the Pearson product-moment correlation coefficient. This coefficient measures the strength and direction of the linear relationship between two variables. The RSQ function is commonly used in statistical analysis to assess the goodness of fit for regression models and evaluate the predictive power of forecasting models.

The SKEW.P function calculates the skewness of a dataset that represents the entire population. Skewness is a measure of the asymmetry of the distribution of values in the dataset. The SKEW.P function uses the Pearson's method to calculate skewness, which is suitable for datasets that represent the entire population.

The SKEW function calculates the skewness of a dataset, which describes the symmetry of that dataset about the mean. It measures the degree of asymmetry in the distribution of data points. A positive skewness indicates a longer or fatter tail on the right side of the distribution, while a negative skewness indicates a longer or fatter tail on the left side of the distribution.

The SLOPE function calculates the slope of a linear regression line that best fits the given data points. It measures the relationship between two sets of data by determining the change in the dependent variable (data_y) for a unit change in the independent variable (data_x). The slope represents the rate of change or the steepness of the line.

The STDEV function calculates the standard deviation based on a sample. It measures the amount of variation or dispersion in a set of values. The function takes one or more arguments, which represent the values for which you want to calculate the standard deviation. The more spread out the values are, the higher the standard deviation will be.

The STDEVP function calculates the standard deviation based on an entire population. It is used to measure the amount of variation or dispersion in a dataset. The function takes multiple arguments, each representing a value in the population. It returns the standard deviation as a measure of the spread of the values around the mean.

The STEYX function calculates the standard error of the predicted y-value for each x in the regression of a dataset. It is used to measure the accuracy of the regression model by determining how closely the predicted y-values match the actual y-values. The function takes two arguments: data_y, which represents the dependent variable values, and data_x, which represents the independent variable values. The function returns the standard error of the predicted y-values.

The T.DIST.2T function is used to calculate the two-tailed Student distribution for a given value x. It is commonly used in statistical analysis to determine the probability of observing a value as extreme as x, assuming a Student's t-distribution. The function takes two arguments: x, which is the value for which we want to calculate the distribution, and degrees_freedom, which represents the degrees of freedom for the distribution.

The T.DIST function is used to calculate the right-tailed Student distribution. It takes three arguments: x, degrees_freedom, and cumulative. The degrees_freedom represents the degrees of freedom for the distribution, and cumulative is a logical value that determines whether to calculate the cumulative distribution or the probability density function.

The T.TEST function in Excel is used to calculate the probability associated with Student's t-test. It helps determine whether two samples are likely to have come from the same two underlying populations that have the same mean. The function takes four arguments: range1 and range2 are the two sets of data to be compared, tails specifies the number of distribution tails to use (1 for one-tailed test, 2 for two-tailed test), and type specifies the type of t-test to perform (1 for paired test, 2 for two-sample equal variance test, 3 for two-sample unequal variance test). The function returns the probability of observing the given difference in means under the null hypothesis.

The TTEST function is used to perform a t-test in Excel. It calculates the probability of observing the difference in means between two sets of data, assuming they come from the same population. The function takes four arguments: range1 and range2 are the two sets of data to be compared, tails specifies the number of distribution tails to use (1 for one-tailed test, 2 for two-tailed test), and type specifies the type of t-test to perform (1 for paired test, 2 for two-sample equal variance test, 3 for two-sample unequal variance test). The function returns the probability of observing the difference in means.

The VAR.P function is used to calculate the variance of a population in Excel. It takes a series of values as input and returns the variance. The variance is a measure of how spread out the values in the dataset are. The VAR.P function considers the entire population when calculating the variance, rather than just a sample.

The VAR function is used to calculate the variance of a set of values. Variance measures how spread out the values in a dataset are from the mean. It is a measure of the variability or dispersion of the data. The VAR function takes one or more arguments, which represent the values for which you want to calculate the variance. The function returns the variance as a numeric value.

The WEIBULL.DIST function calculates the probability density or cumulative distribution function for the Weibull distribution. It takes four arguments: x (the value at which to evaluate the distribution), shape (the shape parameter of the distribution), scale (the scale parameter of the distribution), and cumulative (a logical value indicating whether to calculate the cumulative distribution function or the probability density function).

The Z.TEST function is used to calculate the one-tailed P-value of a Z-test with a standard normal distribution. It helps determine the probability that a sample mean is greater than a specified value, assuming a standard normal distribution. The function takes three arguments: 'data' represents the sample data range or array, 'value' represents the hypothesized sample mean, and 'standard_deviation' (optional) represents the population standard deviation. If 'standard_deviation' is not provided, the function assumes a standard deviation of 1.