Friday, May 2, 2014

Function for rounding a group of numbers

Sometimes when I'm creating summary statistics for factor variables (usually demographics), I find I need to round percentages a bit. If I round each number individually, I occasionally (and frustratingly) change the total sum. For example, suppose I've got information on how many individuals are in each of four groups:

group.totals = c(13, 39, 16, 11)

and I'd like to report the distribution as a share of the total number of individuals:

(tab = prop.table(group.totals))
## [1] 0.1646 0.4937 0.2025 0.1392

however, I only want to report 2 significant digits after the decimal:

(rounded.tab = round(tab, 2))
## [1] 0.16 0.49 0.20 0.14

Here, the rounding process (annoyingly) changes the sum:

sum(tab)
## [1] 1
sum(rounded.tab)
## [1] 0.99

To fix this (a bit), here's a quick function which rounds a group of numbers together:

round.group = function(vec, digits) {
    r.vec = round(vec, digits)
    total.resid = sum(vec) - sum(r.vec)
    sq.diffs = ((r.vec + total.resid) - vec)^2
    indx = which.min(sq.diffs)
    r.vec.copy = r.vec
    r.vec.copy[indx] = r.vec.copy[indx] + total.resid
    out = r.vec.copy
    return(out)
}

This solves some of the problems:

(group.rounded.tab = round.group(tab, 2))
## [1] 0.17 0.49 0.20 0.14
sum(group.rounded.tab)
## [1] 1

But has sort of unusual behavior for some inputs:

bug.vec = c(0.4, 0.4, 0.4, 0.4, 9.2, 9.2)
round.group(bug.vec, 0)
## [1] 2 0 0 0 9 9

Despite being a bit buggy, this function does well enough for my purposes.. if you'd like to find a better version, or are generally interested, here's a link to a nice discussion on group rounding at stackoverflow.

http://stackoverflow.com/questions/792460/how-to-round-floats-to-integers-while-preserving-their-sum

No comments:

Post a Comment