Descriptive Statistics
Statistics is a game of numbers. When taking about numerical variables, we have to look beyond their natural state of occurrence. We need to apply methods to make meaning out of those numbers. Descriptive statistical methods enter into picture and saves the day just like a knight in shining armor. There are four types of methods which are generally used:
- Measure of Frequency
- Measure of Position
- Measure of Central tendency
- Measure of Dispersion
Measure of Frequency: Let’s take an example: In some utopian world, swiggy’s Sriharsha Majety comes to you and tells you to make sense out of the entries given below which represent a random sample of the number of orders placed on swiggy’s platform in a day. (Caution: The following data is imaginery and does not represent the actual company data and also, Majety will probably be busy in making bucks rather than giving you some random task of performing statistical calculations.)
33,26,43,32,44,44,50,42,44,36,61,50,51,50,76,53,44,77,57,43,29,34,77,50,74,56,67,57,66,80,68,42,48,60,35,45,32,25,74,43,39,55,65,35,61,37,54,41,33,27
The ordered array of the same data is:
25,26,27,29,32,32,33,33,34,35,35,36,37,39,41,42,42,43,43,43,44,44,44,44,45,48,50,50,50,50,51,53,54,55,56,57,57,60,61,61,65,66,67,68,74,74,76,77,77,80
A frequency distribution will divide the data into classes and class interval width can be decided by using following formula:
Class width = (highest value- lowest value)/number of classes
The following table contains the frequency distribution of the above data:
Measure of Position: It determines the position of an observation in relation to the other observations in our sample. Quantiles are not influenced by extreme values in our data.
“Quantiles are cut points dividing the range of the data into contiguous intervals with equal probabilities. Certain quantiles are particularly important. The median of a data set divides the data into two equal parts: the bottom 50% and the top 50%. Quartiles divide the data into four equal parts(25%,50%,75%). Percentiles divide into 100 equal parts. Deciles divide a dataset into 10 equal parts. quintiles divide a dataset into 5 equal parts. There is always one less quantile than the number of groups created.”
For above dataset, the first quartile Q1 = (n+1)/4 ranked value which is 12.75. We round 12.75 to 13 and use the 13th ranked value which is 37. The second quartile Q2 is (n+1)/2 ranked value which is 25.5, which is halfway between 25th and 26th ranked value. Thus the quartile is the average measurements corresponding to these two ranks which is (45+48)/2 = 46.5. The third quartile Q3 is 3(n+1)/4 ranked value which is 38.25. We round 38.25 to 38 and the third quartile is 60.
Measure of Central Tendency: These measures tell us about a central value around which most variables group themselves. These are Mean, Median and Mode. Mean is the average value. Median is the middle value and Mode is the most frequent value.
For Ungrouped data,
Mean = (25+26+27…77+80)/50 = 49.3. Median is equal to second quartile which is 46.5 as calculated above. Mode is most frequent value which is 44 and 50 as both occurs four times.
For Grouped data,
The mean 49.7 calculated for grouped data is only an approximation because we assume that all values in any class is equal to midpoint of that class.
Median Class of the data is 40–49 as 50/2 = 25 falls in this class because cumulative frequency corresponding to the class is 26. Substituting in the formula, we get
Median = 40 + (25–14)*9/12 = 48.25
The largest frequency of occurrence is 12 which lies in class 40–49.Hence mode lies in class 40–49. Substituting values in formula,
Mode = 40+ [(12–10)/(2*12–10–11)]* 9 = 46
Measure of Dispersion: Dispersion measures the spread or variation in the data. They are absolute or relative. The absolute measure of deviation are range, variance, standard deviation, quartile deviation, mean deviation. The relative measure of dispersion are their coefficients. The most popularly used measure of dispersion are given below:
By substituting values of above data in formula, we get
Range = 80–25 = 55. Variance = 222.4592 and SD = 14.92 using sample formula for ungrouped data.
I hope you enjoyed reading it.