什麼時候使用浮點數,什麼時候使用雙精度點


203

根據我的編程經驗,我經常需要決定是否對實數使用float或double。有時我會選擇浮動,有時我會選擇雙重,但這確實讓我感覺更加主觀。如果我要捍衛自己的決定,我可能不會給出合理的理由。

何時使用float以及何時使用double?您是否總是使用double,只有在存在內存限制時才選擇float?或者,除非精度要求要求您使用double,否則您始終使用float?在浮點數和雙數之間,關於基本算術的計算複雜度是否存在實質性差異?使用float或double的利弊是什麼?你甚至用過長雙嗎?

-4

Usually, I use the float type when I don't need much precision — for example, for money — which is wrong, but is what I'm used to wrongly do.

On the other hand, I use double when I need more precision, for example for complex mathematical algorithms.

The C99 standard says this:

There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double.

I never really used long double, but I don't use C/C++ so much. Usually I use dynamically typed languages like Python, where you don't have to care about the types.

For further information about Double vs Float, see this question at SO.


193

The default choice for a floating-point type should be double. This is also the type that you get with floating-point literals without a suffix or (in C) standard functions that operate on floating point numbers (e.g. exp, sin, etc.).

float should only be used if you need to operate on a lot of floating-point numbers (think in the order of thousands or more) and analysis of the algorithm has shown that the reduced range and accuracy don't pose a problem.

long double can be used if you need more range or accuracy than double, and if it provides this on your target platform.

In summary, float and long double should be reserved for use by the specialists, with double for "every-day" use.


10

Use double for all your calculations and temp variables. Use float when you need to maintain an array of numbers - float[] (if precision is sufficient), and you are dealing with over tens of thousands of float numbers.

Many/most math functions or operators convert/return double, and you don't want to cast the numbers back to float for any intermediate steps.

E.g. If you have an input of 100,000 numbers from a file or a stream and need to sort them, put the numbers in a float[].


44

There is rarely cause to use float instead of double in code targeting modern computers. The extra precision reduces (but does not eliminate) the chance of rounding errors or other imprecision causing problems.

The main reasons I can think of to use float are:

  1. You are storing large arrays of numbers and need to reduce your program's memory consumption.
  2. You are targeting a system that doesn't natively support double-precision floating point. Until recently, many graphics cards only supported single precision floating points. I'm sure there are plenty of low-power and embedded processors that have limited floating point support too.
  3. You are targeting hardware where single-precision is faster than double-precision, and your application makes heavy use of floating point arithmetic. On modern Intel CPUs I believe all floating point calculations are done in double precision, so you don't gain anything here.
  4. You are doing low-level optimization, for example using special CPU instructions that operate on multiple numbers at a time.

So, basically, double is the way to go unless you have hardware limitations or unless analysis has shown that storing double precision numbers is contributing significantly to memory usage.


3

For real world problems the sampling threshold of your data is important when answering this question. Similarly, the noise floor is also important. If either is exceeded by your data type selection, no benefit will come from increasing precision.

Most real world samplers are limited to 24 bit DAC s. Suggesting that 32 bits of precision on real world calculations should be adequate where the significand is 24 bits of precision.

Double precision comes at the cost of 2x memory. Therefore limiting the use of doubles over floats could drastically cut the memory footprint/bandwidth of running applications.


5

Some platforms (ARM Cortex-M2, Cortex-M4 etc) don't support double (It can always be checked in the reference manual to your processor. If there is no compilation warnings or errors, it does not mean that code is optimal. double can be emulated.). That is why you may need to stick to int or float.

If that is not the case, I would use double.

You can check the famous article by D. Goldberg ("What Every Computer Scientist Should Know About Floating-Point Arithmetic"). You should think twice before using floating-point arithmetic. There is a pretty big chance they are not needed at all in your particular situation.

http://perso.ens-lyon.fr/jean-michel.muller/goldberg.pdf


-3

The choice of what variable to use between float and double depends on the accuracy of the data required. If an answer is required to have negligible difference from the actual answer, number of decimal places required will be many thus will dictate that double to be in use.Float will chop off some decimal places part thus reducing the accuracy.