In this month’s col­umn, we dis­cuss a few more ques­tions on ma­chine learn­ing and deep learn­ing.

OpenSource For You - - Contents - Sandya Man­nar­swamy

As we have been do­ing over the last cou­ple of months, we will con­tinue to dis­cuss a few more in­ter­view ques­tions in this month’s col­umn as well, fo­cus­ing on top­ics in ma­chine learn­ing. Let us start off with a sim­ple ques­tion.

1. Let us as­sume that you have a neu­ral net­work in which there is only an in­put layer (with a sin­gle in­put X be­ing a num­ber) and an out­put layer with a sin­gle out­put neu­ron. There are no hid­den lay­ers. You can as­sume any kind of out­put layer ac­ti­va­tion, be it lin­ear, sig­moid, soft­max, tanh or relU. The out­put needs to be 0 for X < 6 and 1 for X > 6. How would you de­sign your neu­ral net­work?

2. Is it pos­si­ble for a clas­si­fier to have both high bias and high vari­ance? In that case, would the train­ing set er­ror be greater than the test set er­ror or vice versa? How can you recog­nise such a sit­u­a­tion —whether it is pos­si­ble for a clas­si­fier to have both high bias and high vari­ance?

3. You are work­ing on de­vel­op­ing a part of a large im­age clas­si­fi­ca­tion sys­tem of in­door house­hold im­ages taken from dif­fer­ent cam­eras placed in­side the house. Such a sys­tem can be part of a smart home project or for home se­cu­rity con­sid­er­a­tions. Since many of your cus­tomers have pet cats, your sys­tem needs to be able to recog­nise cats. You have de­signed a ma­chine learn­ing sys­tem that can recog­nise cat im­ages from non-cat im­ages. Note that the house­hold cam­eras be­ing used in the setup are quite cheap, and can end up pro­duc­ing poor qual­ity im­ages due to vari­a­tions in light­ing, cam­era mo­tion, etc. Since there are zil­lions of cat im­ages on the In­ter­net, you have de­cided to train your sys­tem us­ing pet im­ages from the In­ter­net, which are typ­i­cally high qual­ity. You have down­loaded 100,000 la­belled pet im­ages from the In­ter­net to train your cat-recog­ni­tion clas­si­fier. You have also ob­tained 5000 home-cam­era gen­er­ated im­ages, which may in­clude cat im­ages as well. You are plan­ning to use 2500 of these 5000 im­ages as the val­i­da­tion set for tun­ing the hy­per­pa­ram­e­ters. The re­main­ing 2500 im­ages would be used as the test set for fi­nal eval­u­a­tion. Af­ter you have fin­ished de­vel­op­ing the ini­tial model and are about to start the next step, which is the hy­per­pa­ram­e­ter tun­ing ex­er­cise, your friends tell you that your method­ol­ogy is in­cor­rect be­cause your train­ing set and val­i­da­tion set come from dif­fer­ent data dis­tri­bu­tions and, hence, it is likely that your cho­sen model would per­form badly on the test set. Would you agree with them? If not, ex­plain why. If yes, ex­plain how you would change your method­ol­ogy. Note that you can­not in­crease the amount of in­door house­hold cam­era im­age data, which is only 5000 im­ages, of which a few hun­dred are cat im­ages.

4. You are de­sign­ing a clas­si­fier which will an­a­lyse data re­lat­ing to Ben­galuru traf­fic fines and build a clas­si­fi­ca­tion sys­tem which will pre­dict whether the fine for a par­tic­u­lar vi­o­la­tion will be paid or not. The in­put data set has in­for­ma­tion on the of­fender, such as the name, res­i­den­tial street name, house num­ber, lo­cal­ity pin code, whether the per­son is a re­peat of­fender, age, gen­der, in­for­ma­tion re­gard­ing what type of traf­fic vi­o­la­tion oc­curred, the date of vi­o­la­tion, where it hap­pened, in­for­ma­tion re­gard­ing the fine such as the amount, any late fee, ad­min­is­tra­tive charges, etc. Note that the dif­fer­ent vari­ables in the in­put data set are of dif­fer­ent data types, with the type of traf­fic vi­o­la­tion rep­re­sented in string for­mat, the res­i­den­tial street num­ber be­ing a nu­mer­i­cal quan­tity, and the amount of the fine be­ing a float­ing point num­ber. The tar­get vari­able you are pre­dict­ing is a bi­nary vari­able which is 1 if the fine will be paid, or 0 if the fine will not be paid. You have care­fully an­a­lysed the data

set, and have in­cluded both the age of the of­fender and the res­i­den­tial street num­ber as fea­tures in your clas­si­fier. Re­mem­ber that both of these are nu­meric quan­ti­ties. While you are con­fi­dent of your sys­tem work­ing well, your friends sug­gest that us­ing the ‘street num­ber’ as the nu­meric fea­ture is in­cor­rect. They sug­gest that it has to be com­bined with the ‘street name’ and the com­bined quan­tity needs to be en­coded as a ‘cat­e­gor­i­cal’ vari­able. Do you agree with your friends? If yes, ex­plain why. If not, ex­plain why you be­lieve their rea­son­ing is in­cor­rect.

5. Let us con­sider Ques­tion (3) from above. You de­cided to add 3000 im­ages from the in­door cam­eras to the train­ing set of 100,000 In­ter­net pet im­ages. You then de­cided to use 1000 im­ages from the in­door cam­eras as the val­i­da­tion/de­vel­op­ment set for hy­per-pa­ram­e­ter tun­ing. The re­main­ing 1000 in­door cam­era im­ages will be used as the test set. Given that you have only 3000 im­ages in your train­ing set which are ac­tu­ally from the ap­pli­ca­tion, for which you are build­ing your clas­si­fier, you still have the is­sue of train­ing and val­i­da­tion/test data dis­tri­bu­tions be­ing dif­fer­ent. If your trained model does not do well on the val­i­da­tion/de­vel­op­ment data set, how would you find out whether it is be­cause of un­der-fit­ting or not be­ing able to gen­er­alise well? Es­sen­tially, can you come up with a method of split­ting your in­put data set so that you can find and fix the is­sue of the clas­si­fier un­der-fit­ting to the train­ing data?

6. One way to ad­dress the is­sue men­tioned in Ques­tion (5) above is to split the train­ing set into two parts, with one part be­ing used for train­ing and the other part known as the train­ing-dev set, which will be used to eval­u­ate whether the clas­si­fier can fit well with the data com­ing from the same dis­tri­bu­tion as the train­ing data. Note that this split of ‘train­ing-dev’ data set is needed only when the train­ing data dis­tri­bu­tion and val­i­da­tion/test data dis­tri­bu­tion are dif­fer­ent. The pur­pose of the ‘train­ing-dev’ data set is to help de­ter­mine whether the clas­si­fier can gen­er­alise well to un­seen held-out data com­ing from the same dis­tri­bu­tion as the train­ing data. You are told that the train­ing er­ror is 4 per cent and train­ingdev er­ror is 9 per cent. You have built your clas­si­fier us­ing an ‘N layer feed for­ward’ neu­ral net­work. Now how would you im­prove the per­for­mance of the clas­si­fier on the train­ing-dev set? Would you choose to in­crease the num­ber of hid­den lay­ers or would you choose to in­crease the size of the train­ing data? Ex­plain the ra­tio­nale be­hind your choice.

7. In last month’s col­umn, we had dis­cussed full-batch gra­di­ent de­scent, mini-batch gra­di­ent de­scent and sto­chas­tic gra­di­ent de­scent for the op­ti­mi­sa­tion of the cost func­tion of the neu­ral net­works. In sto­chas­tic gra­di­ent de­scent and mini-batch gra­di­ent de­scent, the op­ti­mi­sa­tion process can lead to lo­cal os­cil­la­tions as it tries to move to­wards the min­i­mum. This can re­sult in the al­go­rithm tak­ing a larger num­ber of steps to­wards con­ver­gence. One way of ad­dress­ing this is­sue is to mod­ify the gra­di­ent de­scent al­go­rithm to in­clude mo­men­tum. Gra­di­ent de­scent al­go­rithms with mo­men­tum ‘RMSProp’ and ‘Adam’ are all dif­fer­ent vari­ants which ad­dress this is­sue. Ba­si­cally, in­stead of up­dat­ing the pa­ram­e­ters us­ing only the gra­di­ents com­puted in the cur­rent it­er­a­tion, a weighted av­er­age of the gra­di­ents from pre­vi­ous it­er­a­tions as well is used for the up­date. Do you need gra­di­ent de­scent with mo­men­tum if you are told that your cost func­tion is con­vex?

8. You are plan­ning to con­sider us­ing mini-batch gra­di­ent de­scent with mo­men­tum. Should you keep the mo­men­tum term larger or smaller dur­ing the ini­tial part of the train­ing? Ex­plain the ra­tio­nale be­hind your choice. 9. Con­sider the data set pre­sented in ques­tion (4) about the Ben­galuru traf­fic fine pay­ment pre­dic­tion prob­lem. This data set has a num­ber of fea­tures which are on dif­fer­ent scales. Your friends sug­gest that you should nor­malise the in­puts first. Can you ex­plain the process of in­put nor­mal­i­sa­tion? Does it im­prove pre­dic­tion ac­cu­racy? If yes, ex­plain how. If not, what is the rea­son be­hind nor­mal­is­ing the in­puts to a neu­ral net­work? We all know that weights W in a feed for­ward neu­ral net­work should not be ini­tialised to zero. Can you ex­plain why? Given this con­straint, how would you de­cide the ini­tial­i­sa­tion of the weights? Can you use any ran­dom val­ues?

10. You have been asked to set up a deep learn­ing sys­tem that can recog­nise faces of em­ploy­ees and send sig­nals to open the en­trance gate to the lab­o­ra­tory, as the em­ployee ap­proaches the gate. You have heard a lot about the ad­van­tages of end-to-end deep learn­ing. In this case, if you de­cide to go with the end-to-end deep learn­ing ap­proach, you only pro­vide the im­age of the em­ployee ap­proach­ing the door and the neu­ral net­work pro­vides an open/close sig­nal out­put which can be used to drive the en­trance gate. The al­ter­na­tive is to em­ploy a pipeline ap­proach, wherein the first phase iden­ti­fies the ‘face part’ from the im­age and puts a bound­ing box around it. The next phase does the face match recog­ni­tion to check if the ap­proach­ing per­son is a val­i­dated em­ployee of the com­pany. Which ap­proach would you choose and ex­plain the ra­tio­nale be­hind your choice.

If you have any favourite pro­gram­ming ques­tions/ soft­ware top­ics that you would like to dis­cuss on this fo­rum, please send them to me, along with your so­lu­tions and feed­back, at sandyas­m_AT_ya­hoo_DOT_­com.

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.