OpenSource For You

A large percentage of the bugs found in software are due to improper data handling. This column analyses a few important types of ‘data bugs’ as well as some well-known historical examples.

-

bvery program manipulate­s data. Mistakes or problems in this data manipulati­on can result in numerous bugs. Data bugs can occur due to a number of reasons. For instance, because of the wrong choice of data types (e.g., using an integer instead of float, and vice versa) or the loss of data due to conversion­s between data types (e.g., downcast from a float value to an integer value). They could also occur because of implementa­tion constraint­s in handling certain data type values (e.g., integers will overflow if the value is beyond the range it can hold), and the peculiarit­ies of language constructs in handling certain data types (e.g., bugs because of boxing and unboxing operations in languages like gava). We’ll understand some of these problems with the help of coding or historic examples.

Perhaps Y2h is the most widely known example of data-related bugs. Date- and time-related problems seem to be common in computer systems. For example, consider the problem of the year 2038 in UkIu systems, which is the result of having chosen a shorter data type than required. Most UkIu/Linux systems use a 32-bit signed integer to represent time ( typedef’ed as time_t in C). For each second, the integer is incremente­d. As we know, the range of values a signed integer can represent Ls −231 to +231 − 1, wKLFK Ls −2147483648 WR 2147483647. This represents the time starting from 00:00:00 ganuary 1, 1970 (WKH WLPH wKHn 81,; wDs GHYHlRSHG) WR 3:14:07 ganuary 19, 2038. So, once the time counter value reaches 2147483647, LW wLll RYHUflRw WR − 2147483648.

This is known as ‘The Year 2038 Problem’. For example, it can cause bugs and problems in the applicatio­ns that use system time for calculatio­ns. Assume that you apply for a 30-year home loan, and the applicatio­n your bank uses runs on a UkIu/ Linux machine and makes use of this system time. The calculatio­n of your loan’s interest will give wrong results because of the Year 2038 problem. If the long type were used for storing the time, we could have avoided this problem—the range of the long data type is perhaps sufficient for representi­ng the time for thousands of years into the future. Hence, this is classified as a data bug.

The peculiarit­ies of data types, and their behaviour in computers, can also cause bugs. A good example is the EHKDYLRXU RI flRDWLnJ-SRLnW nXPEHUs, wKLFK FRXlG RIWHn EH unintuitiv­e for programmer­s. ‘Catastroph­ic cancellati­on’ RFFXUs wKHn wH sXEWUDFW WwR nHDUly HTXDl flRDWLnJpo­int numbers. Consider the values x = 0.54617 and y = 0.54601; the exact difference in value is d = 0.00016. However, if four-digit arithmetic with rounding is used, then, x’ = 0.5462, y’ = 0.5460 and d’ = 0.0002. Hence the relative error is:

| d – d’ | / | d | = 0.25

This is quite large! When a large number of computatio­ns are performed with such an error, the final accumulate­d error can be huge. To give a historical HxDPSlH, Ln 1984, LW wDs GLsFRYHUHG WKDW WKH 9DnFRXYHU sWRFN HxFKDnJH wDs XnGHU-YDlXHG Ey 48 SHU FHnW, compared to its real value, because of an accumulati­ng error in using floating-point numbers! This problem was because of the slow accumulati­on of round-off errors over time. The computatio­n was performed using three decimal places instead of four; and instead of rounding, truncation was used.

Because digital computers have to store numbers in D fixHG-sLzH PHPRUy DUHD (sDy 4 EyWHs IRU Dn LnWHJHU), there are a range of values they cannot store; when you DWWHPSW WR sWRUH YDlXHs RXWsLGH WKH UDnJH, RYHUflRw FDn RFFXU. AnRWKHU GLIfiFXlWy Ln sWRULnJ YDlXHs Ls WKDW WKH range of values an integer can store is not symmetrica­l. For example, a signed byte (it has 8 bits of storage sSDFH) FDn sWRUH WKH UDnJH RI YDlXHs -128 WR +127. 1RWH the asymmetry in these two values—it is -128 and

 ??  ?? S.G.Ganesh
S.G.Ganesh

Newspapers in English

Newspapers from India