![]() ![]() ![]() So at least for the kind of data I’m looking at, it doesn’t matter how you calculate entropy, and you might as well use the one-pass algorithm to get the result faster. For the larger data set, all four methods produced the same answer to seven significant figures. I convert the counts to probabilities and use the counts directly, and I sum smallest to largest and largest to smallest.įor the smaller data set, all four methods produced the same answer to nine significant figures. For each data set I computed the entropy four ways: two equations times two orders. Both data sets had approximately a power law distribution with counts varying over seven or eight orders of magnitude. To test the methods discussed here, I used two sets of count data, one on the order of a million counts and the other on the order of a billion counts. Calculate the Shannon entropy/relative entropy of given distribution (s). If adding a sorted list gives essentially the same result when summed in either direction, summing the list in any other order should too. In general, you’ll get better accuracy summing a lot positive numbers by sorting them and adding from smallest to largest and worse accuracy by summing largest to smallest. Could the order of the terms matter? This also applies to the first question of the post if we look at summing the p i log p i. Now to the problem of computing the sum of n i log n i. (If we’re only talking about numbers as big as the number of particles in the universe, their logs will be at most three-digit numbers). Not just astronomically large - astronomically large numbers like the number of particles in the universe are fine - but ridiculously large, numbers whose logarithms approach the limits of machine-representable numbers. Could the numbers above be nearly equal? Maybe if the n i are ridiculously large. Subtracting two nearly equal numbers can result in a loss of precision. ![]() One of the things you learn in numerical analysis is to look carefully at subtractions. So you can sum n i and n i log n i in the same pass. To address the second question, note that If you have a large amount of data, could you compute the entropy in one pass? To apply the equation directly, you’d first need to compute N, then go back through the data again to compute entropy. Second, imagine you don’t have the p i values directly but rather counts n i that sum to N. First, are there any numerical problems computing entropy directly from the equation above? This post looks at a couple questions about computing entropy. (For this post, log will mean log base 2, not natural log.) For a set of positive probabilities p i summing to 1, their entropy is defined as ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |