OpenSource For You

When programmer­s are in a hurry to get work done, they sometimes practice ‘copy-paste programmin­g’. In this column, we take a closer look at the consequenc­es of this practice.

-

Awidely known secret among programmer­s, who are under pressure to be ‘productive’ and churn out hundreds and thousands of lines of code every day, is to make extensive use of two commands: Ctrl+C and Ctrl+V ! So just how much code is duplicated in real world software? We don’t know about closed source software, but for open source software, a study showed that 8.7 per cent of GCC, 19 per cent of X Windows, 22.7 per cent of Linux and 29 per cent of JDK consisted of duplicate code! For any seasoned programmer, these numbers are not at all a surprise. So, instead of talking about why code duplicatio­n occurs, let’s discuss the impact or consequenc­es RI FRSy-SDsWH SrRJrDPPLQ­J. BHIRrH WKDW, OHW Xs ErLHfly JR RvHr WKH kind of code duplicates (a.k.a. code clones) that abound.

Type 1 clones: AQ HxDFW FRSy wLWKRXW PRGLfiFDWL­RQs, except for white spaces, new lines and comments.

Type 2 clones: Syntactica­lly identical copy, with only variable, type or function names changed.

Type 3 clones: A FRSy wLWK PRGLfiFDWL­RQs, sXFK Ds statements changed, added or removed.

Apart from these three types of clones, we also have a fourth kind: code segments that semantical­ly do the same thing, but are syntactica­lly different. These clones cannot be detected automatica­lly by clone analysers, but need to be found manually.

A common mistake while copying code is that programmer­s copy code, but forget to make the relevant changes necessary for the copied code to be used in the context into which it has been copied. For this reason, Type 1 and Type 2 clones can result in bugs.

All types of code clones are undesirabl­e, but Type 3 FORQHs (FRSy wLWK PRGLfiFDWL­RQs), DOsR NQRwQ Ds ‘LQFRQsLsWH­QW clones’, are especially prone to bugs. This is because in Type 3 clones, a code block is copied, and changes are made that are inconsiste­nt to the original intent of the code segment. In this case, the code is syntactica­lly correct, but semantical­ly incorrect, resulting in bugs. For example, in Eclipse 3.2.2, WKH fiOH FeatureExp­ortWizard.java had code identical to code in PluginExpo­rtWizard.java, indicating copied code. Further, there was a statement target.appendChil­d(export); that was missing in FeatureExp­ortWizard.java, which led to an LQFRrrHFWO­y IRrPHG ;0L fiOH; WKLs SrREOHP wDs DOsR fiOHG Ds D bug (ID 155070).

There is another problem with code clones. If the original code segment has a bug, and if the code is copied, the bug propagates! For example, a defect in Mozilla (Bug ID 217604) is a code block containing a bug that was copied in 12 places! So if the same piece of code is copied in 10 different places, obviously LW’s GLIfiFXOW WR PDNH FKDQJHs WR DOO WKH GXSOLFDWHG FRGH sHJPHQWs wKHQ fixLQJ WKH SrREOHP. 6R, PRsW SrRJrDPPHr­s fix RQOy WKH FRGH FORQH SLHFH RQ wKLFK D EXJ wDs rDLsHG—LQ DOO RWKHr XQfixHG clones, the bug lurks, only to be discovered much later. Hence, code clones can affect the reliabilit­y of applicatio­ns as well, which most programmer­s don’t understand or appreciate.

If the code is duplicated in many different places, it becomes PRrH GLIfiFXOW WR XQGHrsWDQG Rr FRPSrHKHQG WKH FRGH. :Ky? 7KH human mind can hold only a limited number of chunks or items in working memory (known as the ‘Seven plus or minus two rule’), so the amount of informatio­n we can process at a time is severely limited. Because code duplicatio­n tends to ‘bloat’ code, it increases the complexity of the software code base. Hence, the main impact of code clones is on the maintainab­ility of the applicatio­n.

How do we know which code clones are serious, and which RQHs WR DGGrHss firsW? A SrDFWLFDO wDy Ls WR SrLRrLWLsH HDFK sHW RI code clones, based on the following formula:

Now, how does one detect code clones? These days, realworld (open as well as closed source) software applicatio­ns easily FrRss D PLOOLRQ OLQHs RI FRGH. 0DQXDOOy fiQGLQJ GXSOLFDWH FRGH segments is impossible in such large code bases, and the only practical option is to use automated clone detection tools. Given the importance of detecting code clones, it is not surprising to see a proliferat­ion of automated tools—both commercial and open source—to detect duplicate code. For example, Simian is a commercial tool (see www.harukizaem­on.com/simian) and PMD’s CPD (Copy Paste Detector) (see pmd.sourceforg­e.net/ cpd.html) is an open source tool. However, remember that clone

 ??  ?? S.G.Ganesh
S.G.Ganesh

Newspapers in English

Newspapers from India