Sometimes when I'm creating summary statistics for factor variables (usually demographics), I find I need to round percentages a bit. If I round each number individually, I occasionally (and frustratingly) change the total sum. For example, suppose I've got information on how many individuals are in each of four groups:
group.totals = c(13, 39, 16, 11)
and I'd like to report the distribution as a share of the total number of individuals:
(tab = prop.table(group.totals))
## [1] 0.1646 0.4937 0.2025 0.1392
however, I only want to report 2 significant digits after the decimal:
(rounded.tab = round(tab, 2))
## [1] 0.16 0.49 0.20 0.14
Here, the rounding process (annoyingly) changes the sum:
sum(tab)
## [1] 1
sum(rounded.tab)
## [1] 0.99
To fix this (a bit), here's a quick function which rounds a group of numbers together:
round.group = function(vec, digits) {
r.vec = round(vec, digits)
total.resid = sum(vec) - sum(r.vec)
sq.diffs = ((r.vec + total.resid) - vec)^2
indx = which.min(sq.diffs)
r.vec.copy = r.vec
r.vec.copy[indx] = r.vec.copy[indx] + total.resid
out = r.vec.copy
return(out)
}
This solves some of the problems:
(group.rounded.tab = round.group(tab, 2))
## [1] 0.17 0.49 0.20 0.14
sum(group.rounded.tab)
## [1] 1
But has sort of unusual behavior for some inputs:
bug.vec = c(0.4, 0.4, 0.4, 0.4, 9.2, 9.2)
round.group(bug.vec, 0)
## [1] 2 0 0 0 9 9
Despite being a bit buggy, this function does well enough for my purposes.. if you'd like to find a better version, or are generally interested, here's a link to a nice discussion on group rounding at stackoverflow.
http://stackoverflow.com/questions/792460/how-to-round-floats-to-integers-while-preserving-their-sum
No comments:
Post a Comment