A Sneak Peek into the World of Garbage Col­lec­tion

This is not about hy­giene and san­i­ta­tion. In Java, garbage refers to un­ref­er­enced ob­jects. Java mem­ory can be made more ef­fi­cient by re­mov­ing th­ese ob­jects. Garbage col­lec­tion is the process of re­claim­ing the run­time un­used mem­ory, au­to­mat­i­cally.

OpenSource For You - - Contents -

There was a time when de­vel­op­ers were un­der con­stant pres­sure to write ef­fi­cient code that kept a check on the mem­ory us­age and clean­ing up of un­used or idle re­sources. This con­straint not only added to the woes of the de­vel­oper but also made the code lengthy and repet­i­tive. Java, the ob­ject ori­ent lan­guage, came up with a bril­liant mech­a­nism for garbage col­lec­tion. It eased the load on coders by run­ning a dae­mon thread that au­to­mat­i­cally cleaned the mem­ory as and when re­quired and made it avail­able for the pro­gram.

Now, the ques­tion is: where does Java keep the ob­ject? Well, the an­swer is that since Java abides by the OOPs par­a­digm, it stores ev­ery­thing as an ob­ject in the Heap mem­ory. When­ever ob­jects are cre­ated, they oc­cupy some space in the Heap and are fur­ther ref­er­enced as per us­age. From time to time, JVM marks ob­jects that are idle for a long time or are not ref­er­enced, and se­lects them for dele­tion. Garbage col­lec­tion in Java is track­ing down all the ob­jects that are still used, and mark­ing the rest as garbage.

Be­fore get­ting into garbage col­lec­tion, let’s un­der­stand the mem­ory pool of Java, i.e., how the Heap is di­vided into dif­fer­ent seg­ments and where ob­jects are present within the Heap at dif­fer­ent times. To start with, the Heap is di­vided into the Young and Tenured gen­er­a­tions, along with the Per­mGen space. The Young gen­er­a­tion is fur­ther di­vided into Eden, Sur­vivor1 and Sur­vivor2. When cre­ated, ob­jects re­side in Eden; if garbage col­lec­tion (GC) does not re­sult in suf­fi­cient mem­ory space in­side Eden, then the ob­ject is al­lo­cated to the Old gen­er­a­tion. When garbage is be­ing col­lected from Eden, the GC runs from the root to all reach­able ob­jects and marks them as alive—this process is called mark­ing. Af­ter mark­ing, all the live ob­jects are copied from Eden to Sur­vivor spa­ces, af­ter which the whole of Eden is empty and can be reused to al­lo­cate more ob­jects. This en­tire process is named as Mark and Copy. Be­side Eden lie the Sur­vivor spa­ces called 1 and 2, one of which is al­ways empty. It is this empty space that will start get­ting filled once the GC works on Eden.

The copy­ing of live ob­jects in Sur­vivor space is re­peated sev­eral times, un­til some ob­jects are con­sid­ered ma­ture enough and are pro­moted to the Old gen­er­a­tion, where they re­side un­til they be­come un­reach­able. In con­trast to the Young gen­er­a­tion, the size of the Old gen­er­a­tion is quite large, and the fre­quency of GC here is less than in the Young gen­er­a­tion. When GC takes place in the Old gen­er­a­tion, the fol­low­ing takes place: a) Reach­able ob­jects are marked by set­ting the marked bit next to all ob­jects ac­ces­si­ble through GC roots; b) All un­reach­able ob­jects are deleted; and c) The con­tent of old space is com­pacted by copy­ing live ob­jects con­tigu­ously

to the be­gin­ning of the Old space. Next comes the Per­ma­nent Gen­er­a­tion, a.k.a. Per­mGen, which ex­isted prior to Java

8 and was used to store the meta­data in­for­ma­tion of the ob­jects. With Java 8, the class def­i­ni­tions are now loaded into metas­pace, which is lo­cated in the na­tive mem­ory and does not in­ter­fere with the reg­u­lar Heap ob­jects. By de­fault, the metas­pace size is only lim­ited by the amount of na­tive mem­ory avail­able to the Java process.

Mov­ing on, GC is mainly cat­e­gorised into three types, namely, Mi­nor GC, Ma­jor GC and Full GC. Mi­nor GC works on the Young gen­er­a­tion and is trig­gered when the JVM is un­able to al­lo­cate the space for a new ob­ject. Be­sides, it does trig­ger the stop-the-world pause sus­pend­ing all the live threads. Ma­jor GC runs on the Old Gen­er­a­tion while the full GC works on clean­ing the en­tire Heap.

Java im­ple­ments dif­fer­ent al­go­rithms to achieve GC and, to un­der­stand them, let’s start with some ba­sic ter­mi­nol­ogy. Mark­ing reach­able ob­jects: GC de­fines some spe­cific ob­jects as GC roots. Ex­am­ples are lo­cal vari­ables and in­put pa­ram­e­ters of cur­rently ex­e­cut­ing meth­ods, ac­tive threads, static field of the loaded classes, JNI ref­er­ences, etc. This phase calls the stop-the-world pause.

Re­mov­ing un­used ob­jects: Re­moval of un­used ob­jects is some­what dif­fer­ent for dif­fer­ent GC al­go­rithms, but all such al­go­rithms can be di­vided into three groups— sweep­ing, com­pact­ing and copy­ing.

Sweep­ing or ‘mark and sweep’ means that af­ter the mark­ing phase is com­plete, all space oc­cu­pied by un­vis­ited ob­jects is con­sid­ered free and can thus be reused to al­lo­cate new ob­jects.

Com­pact or mark-sweep-com­pact solves the short­com­ings of ‘mark and sweep’ by mov­ing all marked – and thus alive – ob­jects to the be­gin­ning of the mem­ory re­gion.

Copy or ‘mark and copy’ al­go­rithms are very sim­i­lar to the ‘mark and com­pact’ as they too re­lo­cate all live ob­jects. The im­por­tant dif­fer­ence is that the tar­get of re­lo­ca­tion is a dif­fer­ent mem­ory re­gion as a new home for sur­vivors.

We are done with all the ba­sics and the ter­mi­nol­ogy. Let’s now ex­plore the dif­fer­ent al­go­rithms of GC. Ba­si­cally, th­ese can be put in four cat­e­gories—namely, Se­rial, Par­al­lel, CMS and G1.

Se­rial GC: In this, the GC uses mark-copy for the Young Gen­er­a­tion and mark-sweep-com­pact for the Old Gen­er­a­tion. Both the col­lec­tors are sin­gle threaded and trig­ger stopthe-world pauses, stop­ping all ap­pli­ca­tion threads. This GC al­go­rithm can­not thus take ad­van­tage of mul­ti­ple CPU cores com­monly found in mod­ern hard­ware.

Par­al­lel GC: This uses mark-copy in the Young Gen­er­a­tion and mark-sweep-com­pact in the Old Gen­er­a­tion. Both Young and Old col­lec­tions trig­ger stop-the-world events, stop­ping all ap­pli­ca­tion threads from per­form­ing garbage col­lec­tion. It uses mul­ti­ple threads and, thus, is suit­able on multi-core ma­chines in cases where your pri­mary goal is to in­crease through­put. High through­put is achieved due to ef­fec­tive us­age of system re­sources in two ways.

Fig­ure 6: Dif­fer­ent al­go­rithms for garbage col­lec­tion

Fig­ure 2: Mark­ing live ob­jects in the Heap

Fig­ure 4: Mem­ory sta­tus af­ter com­pact­ing

Fig­ure 1: The mem­ory struc­ture of Heap

Fig­ure 5: Mem­ory sta­tus af­ter copy­ing

Fig­ure 3: Mem­ory sta­tus af­ter sweep

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.