Cross Entropy (Motivation)
•
Biased coin: p(h) = 0.75
•
1st output: #h = 75, #t = 25
•
2
nd
output: #h = 100, #t = 0
•
Which is more likely?
•
Which is more typical?
•
Concept of cross entropy (Kullback-Liebler
divergence)