Cross Entropy (Motivation)
•Biased coin: p(h) = 0.75
•1st output: #h = 75, #t = 25
•2nd output: #h = 100, #t = 0
•Which is more likely?
•Which is more typical?
•Concept of cross entropy (Kullback-Liebler divergence)