Be careful with floats!

Let's consider a simple example in C where we sum floating-point numbers and compare the result to what we might expect. This example will illustrate how floating-point precision issues can lead to unexpected results.

We will sum the floating point number 0.1 ten times and compare the result to 1.0. Intuitively, one might expect that summing 0.1 ten times should give exactly 1.0.

#include <stdio.h>

int main() {
    double sum = 0.0;

    for (int i = 0; i < 10; i++) {
        sum += 0.1;
    }

    if (sum == 1.0) {
        printf("Sum is exactly 1.0\n");
    } else {
        printf("Sum is not exactly 1.0, it is %f\n", sum);
    }

    return 0;
}

The code sums 0.1 ten times and checks if the result is 1.0. Due to the way floating point numbers are represented, the sum might not be exactly 1.0.

The output will likely indicate that the sum is not exactly 1.0. The actual sum printed will be very close to 1.0, but not exactly 1.0, due to the precision issues with floating-point arithmetic.

Comparing floating point values in C can be tricky due to the way these values are represented and handled in computer systems. Here's an overview of the issues and best practices:

Why Not Compare Floating Points Directly?

  1. Precision Issues: Floating point numbers are represented in a format that cannot precisely represent all decimal numbers. This leads to small rounding errors. For example, the result of a computation like 0.1 + 0.2 might not be exactly 0.3.

  2. Representation: Floating points are stored in a binary format that can result in unexpected differences even for seemingly identical values.

What Can Go Wrong?

  • False Mismatches: Due to precision issues, two values that should be equal might not be the same in their binary representation. This can lead to incorrect comparison results.

  • Inaccurate Computations: Operations involving floating points can accumulate errors, leading to significant discrepancies over time.

How to compare:

  1. Use a Tolerance Value (Epsilon): Instead of checking for equality, check if the absolute difference between the numbers is less than a small threshold value.

     #include <math.h>
     #define EPSILON 1e-9
    
     int floatCompare(double a, double b) {
         return fabs(a - b) < EPSILON;
     }
    
  2. Relative Comparison: Sometimes it's better to use a relative difference comparison, especially when dealing with very large or very small numbers.

     int relativeCompare(double a, double b) {
         return fabs(a - b) <= EPSILON * fmax(fabs(a), fabs(b));
     }
    

Corner Cases and Sample Code:

  1. Zero Comparison: Directly comparing to zero can be misleading due to precision errors.

     if (fabs(myFloat) < EPSILON) {
         // Treat as zero
     }
    
  2. Large Numbers: For very large numbers, the epsilon value might need to be scaled accordingly.

  3. Denormalized Numbers: Very small floating point numbers can become 'denormalized', leading to a loss of precision. Be cautious when comparing values close to the limits of floating-point representation.

  4. NaN and Infinity: Be aware that floating points can represent 'Not a Number' (NaN) and infinity. Comparisons involving these special values need special handling.

     if (isnan(myFloat) || isinf(myFloat)) {
         // Special handling for NaN and infinity
     }
    
  5. Accumulated Errors in Loops: Repeated operations can accumulate errors. Be careful when comparing the results of iterative processes.