How to format floating point number with exactly 2 significant digits in bash?
This answer to the first linked question has the almost-throwaway line at the end:
See also
%g
for rounding to a specified number of significant digits.
So you can simply write
printf "%.2g" "$n"
(but see the section below on decimal separator and locale, and note that non-Bash printf
need not support %f
and %g
).
Examples:
$ printf "%.2g\n" 76543 0.0076543
7.7e+04
0.0077
Of course, you now have mantissa-exponent representation rather than pure decimal, so you'll want to convert back:
$ printf "%0.f\n" 7.7e+06
7700000
$ printf "%0.7f\n" 7.7e-06
0.0000077
Putting all this together, and wrapping it in a function:
# Function round(precision, number)
round() {
n=$(printf "%.${1}g" "$2")
if [ "$n" != "${n#*e}" ]
then
f="${n##*e-}"
test "$n" = "$f" && f= || f=$(( ${f#0}+$1-1 ))
printf "%0.${f}f" "$n"
else
printf "%s" "$n"
fi
}
(Note - this function is written in portable (POSIX) shell, but assumes that printf
handles the floating-point conversions. Bash has a built-in printf
that does, so you're okay here, and the GNU implementation also works, so most GNU/Linux systems can safely use Dash).
Test cases
radix=$(printf %.1f 0)
for i in $(seq 12 | sed -e 's/.*/dc -e "12k 1.234 10 & 6 -^*p"/e' -e "y/_._/$radix/")
do
echo $i "->" $(round 2 $i)
done
Test results
.000012340000 -> 0.000012
.000123400000 -> 0.00012
.001234000000 -> 0.0012
.012340000000 -> 0.012
.123400000000 -> 0.12
1.234 -> 1.2
12.340 -> 12
123.400 -> 120
1234.000 -> 1200
12340.000 -> 12000
123400.000 -> 120000
1234000.000 -> 1200000
A note on decimal separator and locale
All the working above assumes that the radix character (also known as the decimal separator) is .
, as in most English locales. Other locales use ,
instead, and some shells have a built-in printf
that respects locale. In these shells, you may need to set LC_NUMERIC=C
to force the use of .
as radix character, or write /usr/bin/printf
to prevent use of the built-in version. This latter is complicated by the fact that (at least some versions) seem to always parse arguments using .
, but print using the current locale settings.
TL;DR
Just copy and use the function sigf
in the section A reasonably good "significant numbers" function:
. It is written (as all code in this answer) to work with dash.
It will give the printf
approximation to the integer part of N with $sig
digits.
About the decimal separator.
The first problem to solve with printf is the effect and use of the "decimal mark", which in US is a point, and in DE is a comma (for example). It is a problem because what works for some locale (or shell) will fail with some other locale. Example:
$ dash -c 'printf "%2.3f\n" 12.3045'
12.305
$ ksh -c 'printf "%2.3f\n" 12.3045'
ksh: printf: 12.3045: arithmetic syntax error
ksh: printf: 12.3045: arithmetic syntax error
ksh: printf: warning: invalid argument of type f
12,000
$ ksh -c 'printf "%2.2f\n" 12,3045'
12,304
One common (and incorrect solution) is to set LC_ALL=C
for the printf command. But that sets the decimal mark to a fixed decimal point. For locales where a comma (or other) is the common used character that is a problem.
The solution is to find out inside the script for the shell running it what is the locale decimal separator. That is quite simple:
$ printf '%1.1f' 0
0,0 # for a comma locale (or shell).
Removing zeros:
$ dec="$(IFS=0; printf '%s' $(printf '%.1f'))"; echo "$dec"
, # for a comma locale (or shell).
That value is used to change the file with the list of tests:
sed -i 's/[,.]/'"$dec"'/g' infile
That makes the runs on any shell or locale automatically valid.
Some basics.
It should be intuitive to cut the number to be formatted with the format %.*e
or even %.*g
of printf. The main difference between using %.*e
or %.*g
is how they count digits. One use the full count, the other needs the count less 1:
$ printf '%.*e %.*g' $((4-1)) 1,23456e0 4 1,23456e0
1,235e+00 1,235
That worked well for 4 significant digits.
After the number of digits has been cut from the number, we need an additional step to format numbers with exponents different than 0 (as it was above).
$ N=$(printf '%.*e' $((4-1)) 1,23456e3); echo "$N"
1,235e+03
$ printf '%4.0f' "$N"
1235
This works correctly. The count of the integer part (at the left of the decimal mark) is just the value of the exponent ($exp). The count of decimals needed is the number of significant digits ($sig) less the amount of digits already used on the left part of the decimal separator:
a=$((exp<0?0:exp)) ### count of integer characters.
b=$((exp<sig?sig-exp:0)) ### count of decimal characters.
printf '%*.*f' "$a" "$b" "$N"
As the integral part for the f
format has no limit, there is in fact no need to explicitly declare it and this (simpler) code works:
a=$((exp<sig?sig-exp:0)) ### count of decimal characters.
printf '%0.*f' "$a" "$N"
First trial.
A first function that could do this in a more automated way:
# Function significant (number, precision)
sig1(){
sig=$(($2>0?$2:1)) ### significant digits (>0)
N=$(printf "%0.*e" "$(($sig-1))" "$1") ### N in sci (cut to $sig digits).
exp=$(echo "${N##*[eE+]}+1"|bc) ### get the exponent.
a="$((exp<sig?sig-exp:0))" ### calc number of decimals.
printf "%0.*f" "$a" "$N" ### re-format number.
}
This first attempt works with many numbers but will fail with numbers for which the amount of available digits is less than the significant count requested and the exponent is less than -4:
Number sig Result Correct?
123456789 --> 4< 123500000 >--| yes
23455 --> 4< 23460 >--| yes
23465 --> 4< 23460 >--| yes
1,2e-5 --> 6< 0,0000120000 >--| no
1,2e-15 -->15< 0,00000000000000120000000000000 >--| no
12 --> 6< 12,0000 >--| no
It will add many zeros which are not needed.
Second trial.
To solve that we need to clean N of the exponent and any trailing zeros. Then we can get the effective length of digits available and work with that:
# Function significant (number, precision)
sig2(){ local sig N exp n len a
sig=$(($2>0?$2:1)) ### significant digits (>0)
N=$(printf "%+0.*e" "$(($sig-1))" "$1") ### N in sci (cut to $sig digits).
exp=$(echo "${N##*[eE+]}+1"|bc) ### get the exponent.
n=${N%%[Ee]*} ### remove sign (first character).
n=${n%"${n##*[!0]}"} ### remove all trailing zeros
len=$(( ${#n}-2 )) ### len of N (less sign and dec).
len=$((len<sig?len:sig)) ### select the minimum.
a="$((exp<len?len-exp:0))" ### use $len to count decimals.
printf "%0.*f" "$a" "$N" ### re-format the number.
}
However, that is using floating point math, and "nothing is simple in floating point": Why don’t my numbers add up?
But nothing in "floating point" is simple.
printf "%.2g " 76500,00001 76500
7,7e+04 7,6e+04
However:
printf "%.2g " 75500,00001 75500
7,6e+04 7,6e+04
Why?:
printf "%.32g\n" 76500,00001e30 76500e30
7,6500000010000000001207515928855e+34
7,6499999999999999997831226199114e+34
And, also, the command printf
is a builtin of many shells.
What printf
prints may change with the shell:
$ dash -c 'printf "%.*f" 4 123456e+25'
1234560000000000020450486779904.0000
$ ksh -c 'printf "%.*f" 4 123456e+25'
1234559999999999999886313162278,3840
$ dash ./script.sh
123456789 --> 4< 123500000 >--| yes
23455 --> 4< 23460 >--| yes
23465 --> 4< 23460 >--| yes
1.2e-5 --> 6< 0.000012 >--| yes
1.2e-15 -->15< 0.0000000000000012 >--| yes
12 --> 6< 12 >--| yes
123456e+25 --> 4< 1234999999999999958410892148736 >--| no
A reasonably good "significant numbers" function:
dec=$(IFS=0; printf '%s' $(printf '%.1f')) ### What is the decimal separator?.
sed -i 's/[,.]/'"$dec"'/g' infile
zeros(){ # create an string of $1 zeros (for $1 positive or zero).
printf '%.*d' $(( $1>0?$1:0 )) 0
}
# Function significant (number, precision)
sigf(){ local sig sci exp N sgn len z1 z2 b c
sig=$(($2>0?$2:1)) ### significant digits (>0)
N=$(printf '%+e\n' $1) ### use scientific format.
exp=$(echo "${N##*[eE+]}+1"|bc) ### find ceiling{log(N)}.
N=${N%%[eE]*} ### cut after `e` or `E`.
sgn=${N%%"${N#-}"} ### keep the sign (if any).
N=${N#[+-]} ### remove the sign
N=${N%[!0-9]*}${N#??} ### remove the $dec
N=${N#"${N%%[!0]*}"} ### remove all leading zeros
N=${N%"${N##*[!0]}"} ### remove all trailing zeros
len=$((${#N}<sig?${#N}:sig)) ### count of selected characters.
N=$(printf '%0.*s' "$len" "$N") ### use the first $len characters.
result="$N"
# add the decimal separator or lead zeros or trail zeros.
if [ "$exp" -gt 0 ] && [ "$exp" -lt "$len" ]; then
b=$(printf '%0.*s' "$exp" "$result")
c=${result#"$b"}
result="$b$dec$c"
elif [ "$exp" -le 0 ]; then
# fill front with leading zeros ($exp length).
z1="$(zeros "$((-exp))")"
result="0$dec$z1$result"
elif [ "$exp" -ge "$len" ]; then
# fill back with trailing zeros.
z2=$(zeros "$((exp-len))")
result="$result$z2"
fi
# place the sign back.
printf '%s' "$sgn$result"
}
And the results are:
$ dash ./script.sh
123456789 --> 4< 123400000 >--| yes
23455 --> 4< 23450 >--| yes
23465 --> 4< 23460 >--| yes
1.2e-5 --> 6< 0.000012 >--| yes
1.2e-15 -->15< 0.0000000000000012 >--| yes
12 --> 6< 12 >--| yes
123456e+25 --> 4< 1234000000000000000000000000000 >--| yes
123456e-25 --> 4< 0.00000000000000000001234 >--| yes
-12345.61234e-3 --> 4< -12.34 >--| yes
-1.234561234e-3 --> 4< -0.001234 >--| yes
76543 --> 2< 76000 >--| yes
-76543 --> 2< -76000 >--| yes
123456 --> 4< 123400 >--| yes
12345 --> 4< 12340 >--| yes
1234 --> 4< 1234 >--| yes
123.4 --> 4< 123.4 >--| yes
12.345678 --> 4< 12.34 >--| yes
1.23456789 --> 4< 1.234 >--| yes
0.1234555646 --> 4< 0.1234 >--| yes
0.0076543 --> 2< 0.0076 >--| yes
.000000123400 --> 2< 0.00000012 >--| yes
.000001234000 --> 2< 0.0000012 >--| yes
.000012340000 --> 2< 0.000012 >--| yes
.000123400000 --> 2< 0.00012 >--| yes
.001234000000 --> 2< 0.0012 >--| yes
.012340000000 --> 2< 0.012 >--| yes
.123400000000 --> 2< 0.12 >--| yes
1.234 --> 2< 1.2 >--| yes
12.340 --> 2< 12 >--| yes
123.400 --> 2< 120 >--| yes
1234.000 --> 2< 1200 >--| yes
12340.000 --> 2< 12000 >--| yes
123400.000 --> 2< 120000 >--| yes