# A large percentage of the bugs found in software are due to improper data handling. This column analyses a few important types of ‘data bugs’ as well as some well-known historical examples.

bvery program manipulates data. Mistakes or problems in this data manipulation can result in numerous bugs. Data bugs can occur due to a number of reasons. For instance, because of the wrong choice of data types (e.g., using an integer instead of float, and vice versa) or the loss of data due to conversions between data types (e.g., downcast from a float value to an integer value). They could also occur because of implementation constraints in handling certain data type values (e.g., integers will overflow if the value is beyond the range it can hold), and the peculiarities of language constructs in handling certain data types (e.g., bugs because of boxing and unboxing operations in languages like gava). We’ll understand some of these problems with the help of coding or historic examples.

Perhaps Y2h is the most widely known example of data-related bugs. Date- and time-related problems seem to be common in computer systems. For example, consider the problem of the year 2038 in UkIu systems, which is the result of having chosen a shorter data type than required. Most UkIu/Linux systems use a 32-bit signed integer to represent time ( typedef’ed as time_t in C). For each second, the integer is incremented. As we know, the range of values a signed integer can represent Ls −231 to +231 − 1, wKLFK Ls −2147483648 WR 2147483647. This represents the time starting from 00:00:00 ganuary 1, 1970 (WKH WLPH wKHn 81,; wDs GHYHlRSHG) WR 3:14:07 ganuary 19, 2038. So, once the time counter value reaches 2147483647, LW wLll RYHUflRw WR − 2147483648.

This is known as ‘The Year 2038 Problem’. For example, it can cause bugs and problems in the applications that use system time for calculations. Assume that you apply for a 30-year home loan, and the application your bank uses runs on a UkIu/ Linux machine and makes use of this system time. The calculation of your loan’s interest will give wrong results because of the Year 2038 problem. If the long type were used for storing the time, we could have avoided this problem—the range of the long data type is perhaps sufficient for representing the time for thousands of years into the future. Hence, this is classified as a data bug.

The peculiarities of data types, and their behaviour in computers, can also cause bugs. A good example is the EHKDYLRXU RI flRDWLnJ-SRLnW nXPEHUs, wKLFK FRXlG RIWHn EH unintuitive for programmers. ‘Catastrophic cancellation’ RFFXUs wKHn wH sXEWUDFW WwR nHDUly HTXDl flRDWLnJpoint numbers. Consider the values x = 0.54617 and y = 0.54601; the exact difference in value is d = 0.00016. However, if four-digit arithmetic with rounding is used, then, x’ = 0.5462, y’ = 0.5460 and d’ = 0.0002. Hence the relative error is:

| d – d’ | / | d | = 0.25

This is quite large! When a large number of computations are performed with such an error, the final accumulated error can be huge. To give a historical HxDPSlH, Ln 1984, LW wDs GLsFRYHUHG WKDW WKH 9DnFRXYHU sWRFN HxFKDnJH wDs XnGHU-YDlXHG Ey 48 SHU FHnW, compared to its real value, because of an accumulating error in using floating-point numbers! This problem was because of the slow accumulation of round-off errors over time. The computation was performed using three decimal places instead of four; and instead of rounding, truncation was used.

Because digital computers have to store numbers in D fixHG-sLzH PHPRUy DUHD (sDy 4 EyWHs IRU Dn LnWHJHU), there are a range of values they cannot store; when you DWWHPSW WR sWRUH YDlXHs RXWsLGH WKH UDnJH, RYHUflRw FDn RFFXU. AnRWKHU GLIfiFXlWy Ln sWRULnJ YDlXHs Ls WKDW WKH range of values an integer can store is not symmetrical. For example, a signed byte (it has 8 bits of storage sSDFH) FDn sWRUH WKH UDnJH RI YDlXHs -128 WR +127. 1RWH the asymmetry in these two values—it is -128 and

S.G.Ganesh