Naive Bayes
Naïve Bayes is a probabilistic classifier that returns the probability of a test point belonging to a class
P(Ci|x) = P(x|Ci)P(Ci)/ P(x)
where Ci denotes the classes, and X denotes the features of the data point.
Ex:
C1, C2 or C = edible/poisonous
The feature ‘cap-shape’ is represented by X and X can take the values CONVEX, FLAT, BELL, etc
The probability of a CONVEX mushroom being edible, P(C = edible | X = CONVEX) is given by:
P( X = CONVEX | C = edible) . P(C = edible) / P(X = CONVEX)
Comprehension - Naive Bayes with Two Features
S.No Type of mushroom Cap shape
1. Poisonous Convex
2. Edible Convex
3. Poisonous Convex
4. Edible Convex
5. Edible Convex
6. Poisonous Convex
7. Edible Bell
8. Edible Bell
9. Edible Convex
10. Poisonous Convex
11. Edible Flat
12. Edible Bell
1. What are the chances of this happening, i.e. what is the value of P(X = CONVEX)? 8/12
2. What is the probability of the mushroom being CONVEX given it is edible?
P(X = CONVEX | C = edible) = 4/8
P(X = CONVEX | C = edible) means out of all the edible mushrooms, how many are CONVEX. Out of a total of 8 edible mushrooms, 4 are convex. Thus, it is 4/8.
3. What is the probability that the CONVEX mushroom is edible, P(C = edible | X = CONVEX)?
P( X = CONVEX | C = edible) . P(C = edible) / P(X = CONVEX) = (4/8).(8/12) / (8/12) = 4/8.
4. What is the probability of the CONVEX mushroom being poisonous, P(C = poisonous | X = CONVEX)? = 4/8
5. What are the chances of a random mushroom being poisonous, i.e. P(C = poisonous)? 4/12 = 1/3
6. What are the chances of a mushroom being CONVEX given it is poisonous, i.e. P(X = CONVEX | C = poisonous)? 4/4 = 1
Comprehension - Naive Bayes with Multiple Features
Table 2: Mushroom Dataset
No Type of Mushroom Cap.shape Cap.surface
1. Poisonous Convex Scaly
2. Edible Convex Scaly
3. Poisonous Convex Smooth
4. Edible Convex Smooth
5. Edible Convex Fibrous
6. Poisonous Convex Scaly
7. Edible Bell Scaly
8. Edible Bell Scaly
9. Edible Convex Scaly
10. Poisonous Convex Scaly
11. Edible Flat Scaly
12. Edible Bell Smooth
1. What is the numerator of P(C = edible | X = CONVEX, SCALY) = P(edible) x P(CONVEX | edible) x P(SCALY | edible)
2. What is the numerator of P(C = edible | X = CONVEX, SMOOTH) = P(edible) x P(CONVEX | edible) x P(SMOOTH| edible)
3. What is P(CONVEX | edible)? = 4/8
4. What is P(SMOOTH| edible)? = 2/8
5. What is P(CONVEX | poisonous)? 1
6. What is P(SMOOTH| poisonous ? 1/4
Prior, Posterior and Likelihood:
You have been using 3 terms: P(Class = edible / poisonous), P(X | Class) and P(Class | X). Bayesian classification is based on the principle that ‘you combine your prior knowledge or beliefs about a population with the case specific information to get the actual (posterior) probability’.
prior probability:
P(Class = edible) or P(Class = poisonous) is called the prior probability
likelihood:
P(X|Class)
posterior:
P(Class = edible | X)
From above table:
1. The values of P(X|Class). P(Class) where X = (CONVEX, SCALY) for both classes (edible and poisonous) are respectively:
Edible: P(CONVEX | Edible). P(SCALY | EDIBLE). P(Edible) = (4/8)(5/8)(8/12) = 20.8% ,
Poisonous: P(CONVEX | poisonous). P(SCALY | poisonous). P(Poisonous) = (4/4)(3/4)(4/12) = 25%
2. For the (CONVEX, SCALY) mushroom:
The prior is in favor of edible; posterior in favor of poisonous
Prior is 8/12 and 4/12 for edible and poisonous respectively; posterior is 20.8% and 25%.
S.No Class Freq 1 Freq 2 Freq 3 Freq 4
1. Spam free buy limited hurry
2. Ham reply data report prestatn
3. Ham report prestatn file end of day
4. Spam limited file buy click
5. Ham meeting timelines limited documents
6. Spam hurry data buy stock
7. Spam limited sex click viagra
8. Ham prestatnend of day data report
9. Ham reply data prestatn click
10. Spam free reply weekend click
11. Spam limited click free hurry
12. Ham meeting end of day weekend data
13. Spam hurry weekend stock offer
14. Ham report prestatn file end of day
15. Ham free timelines reply offer
Spam Keywords: buy, free, hurry, weekend, stock, offer, viagra, sex, limited, click
Ham Keywords: reply, data, report, presentation, file, end of day, meeting, timelines, delay, documents
1. What is the prior probability of a mail being spam, P(class = spam)? = 7/15
2. What does Naive Bayes assume while classifying spam or ham mails? = That frequency of keywords like hurry, free, offer etc. are conditionally independent of each other
3. Consider an email with the vector of features X = (free, data, weekend, click). What is the likelihood, P(X | spam)
2/7 x 1/7 x 1/7 x 2/7 = 4/2401
4. Consider an email with the vector of features X = (free, data, weekend, click). What is the likelihood, P(X | ham)?
1/8 x 2/8 x 1/8 x 1/8 = 2/4096
5. The value of P(X|Class). P(Class) for class = spam for X = (free, data, weekend, click)?
P(X|Class) = P(X|spam) = 2/7 x 1/7 x 1/7 x 2/7 = 4/2401
P(class = spam) = 7/15
= 7/15 x 4/2401
6. What is the posterior for class = ham (i.e. without division by denominator) for the feature vector X = (free, data, weekend, click)?
P(class = ham| X) = P(class = ham). P(X | class = ham) = (8/15)(2/4096).
7. Which class should be point X = (free, data, weekend, click) be classified into? - SPAM
The (numerators of) posteriors, P(Class | X) for spam and ham are respectively (7/15)(4/2401) and (8/15)(2/4096), spam's being higher.
Confusion Matrix:
Truth Table
Predicted Table Spam Ham
Spam 440 20
Ham 40 500
1. What is the accuracy of the model? 440 + 550 / 440 + 500 + 40 + 20 = 940/1000
2. What is the sensitivity of the model? 440/440+40 = 440/480
3. What is the specificity of the model? 500/500 + 20 = 500/520
4. Given that you do not want to misclassify any genuine emails, which metric should be as high as possible?
Specificity: Fraction of correctly classified hams is measured by specificity (true negative rate).
What is the probability of word “Coffee” appearing in a document which has been classified as "Hot" if we are planning to do a Multinomial Naive Bayes Classification?
No. Document Class
0 Coffee Tea Soup Coffee Coffee Hot
1 Coffee is hot and so is Soup and Tea Hot
2 Espresso is a Hot Coffee and not a Tea Hot
3 Coffee is neither Tea nor Soup Hot
4 Sprite Pepsi Cold Coffee and cold Tea Cold
1.What is the probability of word “Coffee” appearing in a document which has been classified as "Hot" if we are planning to do a Multinomial Naive Bayes Classification? = 6/16
Word Coffee appears 6 times in all documents of class hot ( d0 : 3 , d1:1, d2:1 and d3:1 )
And there are 16 words altogether in the hot class of documents( d0 : 5 , d1: 4 , d2: 4 and d3: 3).
Hence the probability of word Coffee in class hot is 6/16.
2. What is Binarization of a feature vector?
A. Converting all non-zero word count of a feature vector to 1 and leaving zero counts as it is
3. What is the value of P(“ I love cold coffee”|Hot)? 1/24 * 7/24
Feedback :
P(“ I love cold coffee”|Hot) = P(cold|hot)*P(coffee|hot) using Naive Bayes Theorem.
As P(cold|hot) =1/24 and P(coffee|hot)=7/24 the net product would be 1/24 * 7/24
4. A bag A contains 3 Red and 4 Green balls and another bag B contains 4 Red and 6 Green balls. One bag is selected at random and a ball is drawn from it.
If the ball drawn is found Green , find the probability that the bag chosen was A.
Then P(E1) = P(E2) = 1/2.
By hypothesis P(G/E1) = 4/7 and P(G/E2) = 6/10
By Bayes theorem P(E1/G) = (P(G/E1)*P(E1))/P(G)
P=(4/7)x(1/2) / (1/2)x(4/7)+(1/2)x(6/10)= (4/14) / (4/14 + 6/20)=20/41
Naïve Bayes is a probabilistic classifier that returns the probability of a test point belonging to a class
P(Ci|x) = P(x|Ci)P(Ci)/ P(x)
where Ci denotes the classes, and X denotes the features of the data point.
Ex:
C1, C2 or C = edible/poisonous
The feature ‘cap-shape’ is represented by X and X can take the values CONVEX, FLAT, BELL, etc
The probability of a CONVEX mushroom being edible, P(C = edible | X = CONVEX) is given by:
P( X = CONVEX | C = edible) . P(C = edible) / P(X = CONVEX)
Comprehension - Naive Bayes with Two Features
S.No Type of mushroom Cap shape
1. Poisonous Convex
2. Edible Convex
3. Poisonous Convex
4. Edible Convex
5. Edible Convex
6. Poisonous Convex
7. Edible Bell
8. Edible Bell
9. Edible Convex
10. Poisonous Convex
11. Edible Flat
12. Edible Bell
1. What are the chances of this happening, i.e. what is the value of P(X = CONVEX)? 8/12
2. What is the probability of the mushroom being CONVEX given it is edible?
P(X = CONVEX | C = edible) = 4/8
P(X = CONVEX | C = edible) means out of all the edible mushrooms, how many are CONVEX. Out of a total of 8 edible mushrooms, 4 are convex. Thus, it is 4/8.
3. What is the probability that the CONVEX mushroom is edible, P(C = edible | X = CONVEX)?
P( X = CONVEX | C = edible) . P(C = edible) / P(X = CONVEX) = (4/8).(8/12) / (8/12) = 4/8.
4. What is the probability of the CONVEX mushroom being poisonous, P(C = poisonous | X = CONVEX)? = 4/8
5. What are the chances of a random mushroom being poisonous, i.e. P(C = poisonous)? 4/12 = 1/3
6. What are the chances of a mushroom being CONVEX given it is poisonous, i.e. P(X = CONVEX | C = poisonous)? 4/4 = 1
Comprehension - Naive Bayes with Multiple Features
Table 2: Mushroom Dataset
No Type of Mushroom Cap.shape Cap.surface
1. Poisonous Convex Scaly
2. Edible Convex Scaly
3. Poisonous Convex Smooth
4. Edible Convex Smooth
5. Edible Convex Fibrous
6. Poisonous Convex Scaly
7. Edible Bell Scaly
8. Edible Bell Scaly
9. Edible Convex Scaly
10. Poisonous Convex Scaly
11. Edible Flat Scaly
12. Edible Bell Smooth
1. What is the numerator of P(C = edible | X = CONVEX, SCALY) = P(edible) x P(CONVEX | edible) x P(SCALY | edible)
2. What is the numerator of P(C = edible | X = CONVEX, SMOOTH) = P(edible) x P(CONVEX | edible) x P(SMOOTH| edible)
3. What is P(CONVEX | edible)? = 4/8
4. What is P(SMOOTH| edible)? = 2/8
5. What is P(CONVEX | poisonous)? 1
6. What is P(SMOOTH| poisonous ? 1/4
Prior, Posterior and Likelihood:
You have been using 3 terms: P(Class = edible / poisonous), P(X | Class) and P(Class | X). Bayesian classification is based on the principle that ‘you combine your prior knowledge or beliefs about a population with the case specific information to get the actual (posterior) probability’.
prior probability:
P(Class = edible) or P(Class = poisonous) is called the prior probability
likelihood:
P(X|Class)
posterior:
P(Class = edible | X)
From above table:
1. The values of P(X|Class). P(Class) where X = (CONVEX, SCALY) for both classes (edible and poisonous) are respectively:
Edible: P(CONVEX | Edible). P(SCALY | EDIBLE). P(Edible) = (4/8)(5/8)(8/12) = 20.8% ,
Poisonous: P(CONVEX | poisonous). P(SCALY | poisonous). P(Poisonous) = (4/4)(3/4)(4/12) = 25%
2. For the (CONVEX, SCALY) mushroom:
The prior is in favor of edible; posterior in favor of poisonous
Prior is 8/12 and 4/12 for edible and poisonous respectively; posterior is 20.8% and 25%.
S.No Class Freq 1 Freq 2 Freq 3 Freq 4
1. Spam free buy limited hurry
2. Ham reply data report prestatn
3. Ham report prestatn file end of day
4. Spam limited file buy click
5. Ham meeting timelines limited documents
6. Spam hurry data buy stock
7. Spam limited sex click viagra
8. Ham prestatnend of day data report
9. Ham reply data prestatn click
10. Spam free reply weekend click
11. Spam limited click free hurry
12. Ham meeting end of day weekend data
13. Spam hurry weekend stock offer
14. Ham report prestatn file end of day
15. Ham free timelines reply offer
Spam Keywords: buy, free, hurry, weekend, stock, offer, viagra, sex, limited, click
Ham Keywords: reply, data, report, presentation, file, end of day, meeting, timelines, delay, documents
1. What is the prior probability of a mail being spam, P(class = spam)? = 7/15
2. What does Naive Bayes assume while classifying spam or ham mails? = That frequency of keywords like hurry, free, offer etc. are conditionally independent of each other
3. Consider an email with the vector of features X = (free, data, weekend, click). What is the likelihood, P(X | spam)
2/7 x 1/7 x 1/7 x 2/7 = 4/2401
4. Consider an email with the vector of features X = (free, data, weekend, click). What is the likelihood, P(X | ham)?
1/8 x 2/8 x 1/8 x 1/8 = 2/4096
5. The value of P(X|Class). P(Class) for class = spam for X = (free, data, weekend, click)?
P(X|Class) = P(X|spam) = 2/7 x 1/7 x 1/7 x 2/7 = 4/2401
P(class = spam) = 7/15
= 7/15 x 4/2401
6. What is the posterior for class = ham (i.e. without division by denominator) for the feature vector X = (free, data, weekend, click)?
P(class = ham| X) = P(class = ham). P(X | class = ham) = (8/15)(2/4096).
7. Which class should be point X = (free, data, weekend, click) be classified into? - SPAM
The (numerators of) posteriors, P(Class | X) for spam and ham are respectively (7/15)(4/2401) and (8/15)(2/4096), spam's being higher.
Confusion Matrix:
Truth Table
Predicted Table Spam Ham
Spam 440 20
Ham 40 500
1. What is the accuracy of the model? 440 + 550 / 440 + 500 + 40 + 20 = 940/1000
2. What is the sensitivity of the model? 440/440+40 = 440/480
3. What is the specificity of the model? 500/500 + 20 = 500/520
4. Given that you do not want to misclassify any genuine emails, which metric should be as high as possible?
Specificity: Fraction of correctly classified hams is measured by specificity (true negative rate).
What is the probability of word “Coffee” appearing in a document which has been classified as "Hot" if we are planning to do a Multinomial Naive Bayes Classification?
No. Document Class
0 Coffee Tea Soup Coffee Coffee Hot
1 Coffee is hot and so is Soup and Tea Hot
2 Espresso is a Hot Coffee and not a Tea Hot
3 Coffee is neither Tea nor Soup Hot
4 Sprite Pepsi Cold Coffee and cold Tea Cold
1.What is the probability of word “Coffee” appearing in a document which has been classified as "Hot" if we are planning to do a Multinomial Naive Bayes Classification? = 6/16
Word Coffee appears 6 times in all documents of class hot ( d0 : 3 , d1:1, d2:1 and d3:1 )
And there are 16 words altogether in the hot class of documents( d0 : 5 , d1: 4 , d2: 4 and d3: 3).
Hence the probability of word Coffee in class hot is 6/16.
2. What is Binarization of a feature vector?
A. Converting all non-zero word count of a feature vector to 1 and leaving zero counts as it is
3. What is the value of P(“ I love cold coffee”|Hot)? 1/24 * 7/24
Feedback :
P(“ I love cold coffee”|Hot) = P(cold|hot)*P(coffee|hot) using Naive Bayes Theorem.
As P(cold|hot) =1/24 and P(coffee|hot)=7/24 the net product would be 1/24 * 7/24
4. A bag A contains 3 Red and 4 Green balls and another bag B contains 4 Red and 6 Green balls. One bag is selected at random and a ball is drawn from it.
If the ball drawn is found Green , find the probability that the bag chosen was A.
Then P(E1) = P(E2) = 1/2.
By hypothesis P(G/E1) = 4/7 and P(G/E2) = 6/10
By Bayes theorem P(E1/G) = (P(G/E1)*P(E1))/P(G)
P=(4/7)x(1/2) / (1/2)x(4/7)+(1/2)x(6/10)= (4/14) / (4/14 + 6/20)=20/41
This post is so helpfull and attractive.keep updating with more information...
ReplyDeleteData Science Positions
Courses On Data Science
Thank you for sharing such a useful article. I had a great time. This article was fantastic to read. Continue to publish more articles on
ReplyDeleteAI Services
Data Engineering Services