A large per­cent­age of the bugs found in soft­ware are due to im­proper data han­dling. This col­umn analy­ses a few im­por­tant types of ‘data bugs’ as well as some well-known his­tor­i­cal ex­am­ples.

OpenSource For You - - GUEST COLUMN JOY OF PROGRAMMING -

bvery pro­gram ma­nip­u­lates data. Mis­takes or prob­lems in this data ma­nip­u­la­tion can re­sult in nu­mer­ous bugs. Data bugs can oc­cur due to a num­ber of rea­sons. For in­stance, be­cause of the wrong choice of data types (e.g., us­ing an in­te­ger in­stead of float, and vice versa) or the loss of data due to con­ver­sions be­tween data types (e.g., down­cast from a float value to an in­te­ger value). They could also oc­cur be­cause of im­ple­men­ta­tion con­straints in han­dling cer­tain data type val­ues (e.g., in­te­gers will over­flow if the value is be­yond the range it can hold), and the pe­cu­liar­i­ties of lan­guage con­structs in han­dling cer­tain data types (e.g., bugs be­cause of box­ing and un­box­ing op­er­a­tions in lan­guages like gava). We’ll un­der­stand some of these prob­lems with the help of cod­ing or his­toric ex­am­ples.

Per­haps Y2h is the most widely known ex­am­ple of data-re­lated bugs. Date- and time-re­lated prob­lems seem to be com­mon in com­puter sys­tems. For ex­am­ple, con­sider the prob­lem of the year 2038 in UkIu sys­tems, which is the re­sult of hav­ing cho­sen a shorter data type than re­quired. Most UkIu/Linux sys­tems use a 32-bit signed in­te­ger to rep­re­sent time ( type­def’ed as time_t in C). For each sec­ond, the in­te­ger is in­cre­mented. As we know, the range of val­ues a signed in­te­ger can rep­re­sent Ls −231 to +231 − 1, wKLFK Ls −2147483648 WR 2147483647. This rep­re­sents the time start­ing from 00:00:00 gan­uary 1, 1970 (WKH WLPH wKHn 81,; wDs GHYHlRSHG) WR 3:14:07 gan­uary 19, 2038. So, once the time counter value reaches 2147483647, LW wLll RYHUflRw WR − 2147483648.

This is known as ‘The Year 2038 Prob­lem’. For ex­am­ple, it can cause bugs and prob­lems in the ap­pli­ca­tions that use sys­tem time for cal­cu­la­tions. As­sume that you ap­ply for a 30-year home loan, and the ap­pli­ca­tion your bank uses runs on a UkIu/ Linux ma­chine and makes use of this sys­tem time. The cal­cu­la­tion of your loan’s in­ter­est will give wrong re­sults be­cause of the Year 2038 prob­lem. If the long type were used for stor­ing the time, we could have avoided this prob­lem—the range of the long data type is per­haps suf­fi­cient for rep­re­sent­ing the time for thou­sands of years into the fu­ture. Hence, this is clas­si­fied as a data bug.

The pe­cu­liar­i­ties of data types, and their be­hav­iour in com­put­ers, can also cause bugs. A good ex­am­ple is the EHKDYLRXU RI flRDWLnJ-SRLnW nXPEHUs, wKLFK FRXlG RIWHn EH un­in­tu­itive for pro­gram­mers. ‘Cat­a­strophic can­cel­la­tion’ RFFXUs wKHn wH sXEWUDFW WwR nHDUly HTXDl flRDWLnJ­point num­bers. Con­sider the val­ues x = 0.54617 and y = 0.54601; the ex­act dif­fer­ence in value is d = 0.00016. How­ever, if four-digit arith­metic with round­ing is used, then, x’ = 0.5462, y’ = 0.5460 and d’ = 0.0002. Hence the rel­a­tive er­ror is:

| d – d’ | / | d | = 0.25

This is quite large! When a large num­ber of com­pu­ta­tions are per­formed with such an er­ror, the fi­nal ac­cu­mu­lated er­ror can be huge. To give a his­tor­i­cal HxDPSlH, Ln 1984, LW wDs GLsFRYHUHG WKDW WKH 9DnFRXYHU sWRFN HxFKDnJH wDs XnGHU-YDlXHG Ey 48 SHU FHnW, com­pared to its real value, be­cause of an ac­cu­mu­lat­ing er­ror in us­ing float­ing-point num­bers! This prob­lem was be­cause of the slow ac­cu­mu­la­tion of round-off er­rors over time. The com­pu­ta­tion was per­formed us­ing three dec­i­mal places in­stead of four; and in­stead of round­ing, trun­ca­tion was used.

Be­cause dig­i­tal com­put­ers have to store num­bers in D fixHG-sLzH PHPRUy DUHD (sDy 4 EyWHs IRU Dn LnWHJHU), there are a range of val­ues they can­not store; when you DWWHPSW WR sWRUH YDlXHs RXWsLGH WKH UDnJH, RYHUflRw FDn RFFXU. AnRWKHU GLI­fiFXlWy Ln sWRULnJ YDlXHs Ls WKDW WKH range of val­ues an in­te­ger can store is not sym­met­ri­cal. For ex­am­ple, a signed byte (it has 8 bits of stor­age sSDFH) FDn sWRUH WKH UDnJH RI YDlXHs -128 WR +127. 1RWH the asym­me­try in these two val­ues—it is -128 and

S.G.Ganesh

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.