Sometimes when I'm creating summary statistics for factor variables (usually demographics), I find I need to round percentages a bit. If I round each number individually, I occasionally (and frustratingly) change the total sum. For example, suppose I've got information on how many individuals are in each of four groups:

```
group.totals = c(13, 39, 16, 11)
```

and I'd like to report the distribution as a share of the total number of individuals:

```
(tab = prop.table(group.totals))
```

```
## [1] 0.1646 0.4937 0.2025 0.1392
```

however, I only want to report 2 significant digits after the decimal:

```
(rounded.tab = round(tab, 2))
```

```
## [1] 0.16 0.49 0.20 0.14
```

Here, the rounding process (annoyingly) changes the sum:

```
sum(tab)
```

```
## [1] 1
```

```
sum(rounded.tab)
```

```
## [1] 0.99
```

To fix this (a bit), here's a quick function which rounds a group of numbers together:

```
round.group = function(vec, digits) {
r.vec = round(vec, digits)
total.resid = sum(vec) - sum(r.vec)
sq.diffs = ((r.vec + total.resid) - vec)^2
indx = which.min(sq.diffs)
r.vec.copy = r.vec
r.vec.copy[indx] = r.vec.copy[indx] + total.resid
out = r.vec.copy
return(out)
}
```

This solves some of the problems:

```
(group.rounded.tab = round.group(tab, 2))
```

```
## [1] 0.17 0.49 0.20 0.14
```

```
sum(group.rounded.tab)
```

```
## [1] 1
```

But has sort of unusual behavior for some inputs:

```
bug.vec = c(0.4, 0.4, 0.4, 0.4, 9.2, 9.2)
round.group(bug.vec, 0)
```

```
## [1] 2 0 0 0 9 9
```

Despite being a bit buggy, this function does well enough for my purposes.. if you'd like to find a better version, or are generally interested, here's a link to a nice discussion on group rounding at stackoverflow.

http://stackoverflow.com/questions/792460/how-to-round-floats-to-integers-while-preserving-their-sum

## No comments:

## Post a Comment