Information theoretic metrics for vectors, and for formulas given a set of models.
surprisal(P,Q) = -log Pr(P|Q)
H(P) = - sum_{s in S} Pr(s|P) * log Pr(s|P)
where the set S consists of all possible points in the DFS space that are fully specified with respect to the atomic propositions; that is, each point s in S constitutes a unique logical combination of all atomic propostions.
DH(P,Q) = H(Q) - H(P)
S(w_i+1) = -log(P(w_i+1|w_1...i)) = log(P(w_1...i)) - log(P(w_1...i+1)) = log(freq(w_1...i)) - log(freq(w_1...i+1))
H(w_i) = -sum_(w_1...i,w_i+1...n) Pr(w_1...i,w_i+1...n|w_1...i) * log(Pr(w_1...i,w_i+1...n|w_1...i))
DH(w_i+1) = H(w_i) - H(w_i+1)
S(w_i+1) = -log(Pr(v(w_1...i+1)|w_1...i))
where v(w_1...i) is the disjunction of all semantics consistent with the prefix w_1...w_i.
H(w_i) = - sum_(foreach s in S) Pr(s|v(w_1...i)) * log(Pr(s|v(w_1...i)))
where v(w_1...i) is the disjunction of all semantics consistent with the prefix w_1...w_i.
DH(w_i+1) = H(w_i) - H(w_i+1)