Business Analytics Part 4- Analysis of variance (ANOVA)

Sharing is Caring
Share

As mentioned in the first tutorial that these chapters are designed to help you learn and train for SAS Business Analyst certification and other data scientist certifications, we are going to first focus on topics which are more relevant for that and then we will explore the vast ocean of statistical concepts.

if you have gone through the last 3 chapters, then you have already got your foot wet with the basic concepts of data science and have a basic understanding about its operations.

In this chapter, we will primarily focus on ANOVA which is listed as one of the topics on SAS BA certification page and the technique is widely used in data science domain.

Analysis of variance (ANOVA)

Analysis of variance which is also known as ANOVA, is the statistical model or rather, collection of models which is used to measure the significant variation between groups of variable where usually means are used to measure the difference.  It basically inform user about the difference in the mean of multiple groups for statistical significance.

You can use t-test if you are going to compare two groups, you can learn about t-test in our previous chapters.

ANOVA is primarily used when you are going to measure differences between more than 2 groups. ANOVA was developed by R.A Fisher.

In the last chapter while explaining t-test, I have used two groups preparing for SAS certification as an example to elaborate T-Test. Suppose you need to measure the performance for more than 2 groups, you will use ANOVA.

Type of ANOVA

  1. One Way ANOVA
  2. Two Ways ANOVA

Before we further move into ANOVA, let’s look at two important concepts which are integral to ANOVA.

F test

It is a test where distribution is F shaped. It is commonly used in ANOVA when means of the two population, having similar standard deviations, are equal.

Critical value is selected using the f table and then it is used to accept or reject the null hypothesis.

F values are non-negative, distribution is asymmetric and two independent degrees of freedom, one for numerator and one for denominator.

And the second concept is, test statistic.

Test Statistic

F test is used in test statistic where distribution pattern is f shaped. Test statistic is used to measure one attribute and is a standard value calculated from a sample data while performing a hypothesis test.

Let’s go through the following example to understand F Test;

Sample 1 value-mean Squared Sample 2 value-mean Squared Sample 3 value-mean Squared
12 -4.4 19.36 18 -5.6 31.36 25 -1.6 2.56
14 -2.8 7.84 19 -4.6 21.16 26 -0.60 0.36
18 1.2 1.44 21 -2.6 6.76 23 -3.6 12.96
22 5.2 27.04 29 5.4 29.16 28 1.4 1.96
18 1.2 1.44 31 7.4 54.76 31 4.4 19.36
16.8 57.12 23.6 143.2 26.6 37.2

Sum of Squares within Group = Sum of squares Sample 1 + Sum of squares Sample 2+ Sum of squares Sample 3

Sum of Squares within Group: 57.12+143.2+37.2= 237.52

Now calculate total sum of squares using all groups
Square
12 22.33 = -10.33 106.7089
14 22.33 = -8.33 69.3889
18 22.33 = -4.33 18.7489
22 22.33 = -0.329999999999998 0.1089
18 22.33 = -4.33 18.7489
18 22.33 = -4.33 18.7489
19 22.33 = -3.33 11.0889
21 22.33 = -1.33 1.7689
29 22.33 = 6.67 44.4889
31 22.33 = 8.67 75.1689
25 22.33 = 2.67 7.1289
26 22.33 = 3.67 13.4689
23 22.33 = 0.670000000000002 0.4489
28 22.33 = 5.67 32.1489
31 22.33 = 8.67 75.1689
Mean 22.33333333 Total 493.33

 

493.33= Sum of Squares between Group + 237.52

To find out Sum of Squares between Group, mean of each group (sample, sample 1, sample 2 etc) – mean of all the groups (22.33 in this case)

 

Sample 1 16.8-22.33 = -5.53 30.5809
Sample 2 23.6-22.33 = 1.27 1.6129
Sample 3 26.6-22.33 = 4.27 18.2329
Total 50.42
Total x 5(number of units in each sample)
50.42*5 = 252.1 corrected value 255.81

You can also use basic algebra to figure out the value of sum of squares between groups, but I did it manually to give you some idea on how it is done manually. But round off is causing issues here, so we will use 255.81 to rule out errors.

Now Calculate Total/degree of freedom numerator where degree of freedom numerator=number of sample Groups – 1

255.81/2 = 127.905

Now Calculate Sum of Squares within Group/degree of freedom denominator where degree of freedom denominator=number of items within sample Groups – number of samples

237.52/15-3 = 19.79

F is ratio of total/sum of squares within group

 127.905/19.79=6.46

Now We need to determine the critical value using the below table where numerator is at the top and denomincator are the horizontal lines, we need to determine the intersecting points of the table for 2 and 12.

 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 161 200 216 225 230 234 ## 239 # 242 244 ## ## 249 250 251 252 253 254
2 18.5 19 19.2 19.3 19.3 19 19 19.4 19 19 19 19 19 19 19 19 19 19 19.5
3 10.1 9.55 9.28 9.12 9.01 8.9 9 8.85 9 8.8 8.7 9 9 8.6 8.6 8.6 8.6 8.6 8.53
4 7.71 6.94 6.59 6.39 6.26 6.2 6 6.04 6 6 5.9 6 6 5.8 5.8 5.7 5.7 5.7 5.63
5 6.61 5.79 5.41 5.19 5.05 5 5 4.82 5 4.7 4.7 5 5 4.5 4.5 4.5 4.4 4.4 4.36
6 6.99 5.14 4.76 4.53 4.39 4.3 4 4.15 4 4.1 4 4 4 3.8 3.8 3.8 3.7 3.7 3.67
7 5.59 4.74 4.35 4.12 3.97 3.9 4 3.73 4 3.6 3.6 4 3 3.4 3.4 3.3 3.3 3.3 3.23
8 5.32 4.46 4.07 3.84 3.69 3.6 4 3.44 3 3.4 3.3 3 3 3.1 3.1 3 3 3 2.93
9 5.12 4.26 3.86 3.63 3.48 3.4 3 3.23 3 3.1 3.1 3 3 2.9 2.9 2.8 2.8 2.8 2.71
10 4.96 4.1 3.71 3.48 3.33 3.2 3 3.07 3 3 2.9 3 3 2.7 2.7 2.7 2.6 2.6 2.54
11 4.84 3.98 3.59 3.36 3.2 3.1 3 2.95 3 2.9 2.8 3 3 2.6 2.6 2.5 2.5 2.5 2.4
12 4.75 3.9 3.49 3.26 3.11 3 3 2.85 3 2.8 2.7 3 3 2.5 2.5 2.4 2.4 2.3 2.3
13 4.67 3.81 3.41 3.18 3.03 2.9 3 2.77 3 2.7 2.6 3 2 2.4 2.4 2.3 2.3 2.3 2.21
14 4.6 3.74 3.34 3.11 2.96 2.9 3 2.7 3 2.6 2.5 2 2 2.4 2.3 2.3 2.2 2.2 2.13
15 4.54 3.68 3.29 3.06 2.9 2.8 3 2.64 3 2.5 2.5 2 2 2.3 2.3 2.2 2.2 2.1 2.07

 

Now we have;
F(2,12)= 6.46
critical value=3.89

As f value or ratio is greater than critical value and above 5%, null hypothesis will be rejected.

f test formula

f distribution chart

So coming back to ANOVA, it is one of the most widely used techniques in psychological experiments to collect and examine data.

Please find same calculation in the excel format, please note that some of the cells don’t have formulas so you can manually calculate them but most of the functions are already there;

 

Download F test calculations

Assumption of ANOVA

Assumptions of ANOVA are very important in data science field. They are also required for SAS BA exams. So now we will look at assumptions of ANOVA.

  1. Dependent variable are normally distributed and have same variance.
  2. Errors are normally distributed and are independent.

We will further look into the assumptions and one way ANOVA in the next chapter.

Previous Chapter                                                                                                                                    Next Chapter

Sharing is Caring
Share
About akhilendra

Hi, I’m Akhilendra and I write about Product management, Business Analysis, Data Science, IT & Web. Join me on Twitter, Facebook & Linkedin

Comments

  1. Hi,

    This is a wonderfully comprehensive post on SAS Business analytics part 4- analysis of variance. I enjoyed this article. I would love to see some more such interesting article from your side.

    Thanks for writing and I hope that you’ll have a happy weekend.

    Keep up your great work!

Speak Your Mind

*