-
-
Save swayson/86c296aa354a555536e6765bbe726ff7 to your computer and use it in GitHub Desktop.
""" | |
Specifically, the Kullback–Leibler divergence from Q to P, denoted DKL(P‖Q), is | |
a measure of the information gained when one revises one's beliefs from the | |
prior probability distribution Q to the posterior probability distribution P. In | |
other words, it is the amount of information lost when Q is used to approximate | |
P. | |
""" | |
import numpy as np | |
from scipy.stats import entropy | |
def kl(p, q): | |
"""Kullback-Leibler divergence D(P || Q) for discrete distributions | |
Parameters | |
---------- | |
p, q : array-like, dtype=float, shape=n | |
Discrete probability distributions. | |
""" | |
p = np.asarray(p, dtype=np.float) | |
q = np.asarray(q, dtype=np.float) | |
return np.sum(np.where(p != 0, p * np.log(p / q), 0)) | |
def kl(p, q): | |
"""Kullback-Leibler divergence D(P || Q) for discrete distributions | |
Parameters | |
---------- | |
p, q : array-like, dtype=float, shape=n | |
Discrete probability distributions. | |
""" | |
p = np.asarray(p, dtype=np.float) | |
q = np.asarray(q, dtype=np.float) | |
return np.sum(np.where(p != 0, p * np.log(p / q), 0)) | |
p = [0.1, 0.9] | |
q = [0.1, 0.9] | |
assert entropy(p, q) == kl(p, q) |
@rodrigobdz please note that those are equivalent except for the sign and the formulation of the KL-divergence with np.log(q/p)
hence has a leading negation which is not the case here, meaning the script is correct this way (cf. wikipedia)
Hi
You mentioned about p, q discrete probabilities which you created manually. but in real life meaching learning, what value we can use, e.g. If I am using RandomForest classifier it gives me predict_proba() a probability values can I use them if Yes then would it be P or Q and if P then from where can I get Q vise versa?
Unless I am mistaken. The p != 0 should be q != 0. Because you can multiply by 0 but you cannot divide by 0. And in your fliped KL implementation you are dividing by q not p.
Note that scipy.stats.entropy(pk, qk=None, base=None, axis=0)
does compute KL if qk
is not None.
It should be
np.log(q/p)
instead ofnp.log(p/q)
.