# CodeS­port

## In this month’s col­umn, we dis­cuss a few com­puter sci­ence in­ter­view ques­tions.

OpenSource For You - - Contents -

Over the past cou­ple of months, we have been dis­cussing the prob­lem of de­tect­ing du­pli­cate ques­tions in com­mu­nity ques­tion an­swer­ing (CQA) fo­rums us­ing the Quora Ques­tion Pair Dataset. In this month’s col­umn, we take a break from this topic and, as re­quested by our read­ers, dis­cuss a few in­ter­view ques­tions, fo­cus­ing on al­go­rithms, ma­chine learn­ing and text an­a­lyt­ics.

1. In ma­chine learn­ing, there are three classes of er­rors – namely, train­ing er­ror, val­i­da­tion er­ror and test er­ror. Can you dif­fer­en­ti­ate be­tween each of these three classes of er­rors? Dur­ing train­ing, we fo­cus on re­duc­ing the train­ing er­ror. In real world de­ploy­ment, when the clas­si­fier is em­ployed on un­seen data, how can re­duc­ing train­ing er­rors help to pre­dict bet­ter?

2. There are two clas­si­fiers you have trained for the given im­age clas­si­fi­ca­tion prob­lem. Clas­si­fier A has a train­ing er­ror of 4.2 per cent and Clas­si­fier B has a train­ing er­ror of 0.8 per cent. If you are us­ing Clas­si­fiers A and B on a held-out test data set, what can you say about the test ac­cu­ra­cies of A and B? Would B have a bet­ter test ac­cu­racy than A? Pro­vide the ra­tio­nale be­hind your an­swer.

3. Can you ex­plain the con­cept of gen­er­al­i­sa­tion in the con­text of ma­chine learn­ing? What are the mea­sures that can be used to ex­plain the gen­er­al­i­sa­tion abil­ity of learn­ing sys­tems?

4. What is meant by the Vap­nik–Cher­vo­nenkis (VC) di­men­sion in ma­chine learn­ing? Why is this an im­por­tant clas­si­fier? What is the VC di­men­sion of a lin­ear clas­si­fier?

5. Auto-en­coders are a class of un­su­per­vised deep learn­ing sys­tems that take an in­put X and pro­duce an out­put Y, which is as close to X as pos­si­ble. Au­toen­coders can be a deep multi-layer neu­ral net­work. Since they try to re­pro­duce the in­put given to them at the out­put, are they learn­ing an iden­tity func­tion? What’s the ra­tio­nale be­hind your an­swer?

6. As you know, while train­ing a deep learn­ing net­work, the op­ti­mi­sa­tion ob­jec­tive is to min­imise the loss func­tion de­fined for that clas­si­fi­ca­tion prob­lem. Given that non-con­vex op­ti­mi­sa­tion func­tions can have mul­ti­ple lo­cal min­ima, do firstorder gra­di­ent de­scent based op­ti­mi­sa­tion meth­ods en­sure that we reach a global min­i­mum? If you think they do, ex­plain how. If not, ex­plain why. 7. Dis­tri­bu­tional se­man­tics is based on the idea that se­man­tic sim­i­lar­i­ties can be char­ac­terised based on the dis­tri­bu­tional prop­er­ties of the words in a large cor­pus. This is pop­u­larly ex­pressed by the state­ment, “A word is known by the com­pany it keeps.” If two words have sim­i­lar distri­bu­tions or sim­i­lar con­texts, then they have sim­i­lar mean­ings. Can you give a cou­ple of ex­am­ples of the ap­pli­ca­tions of dis­tri­bu­tional se­man­tics in nat­u­ral lan­guage pro­cess­ing?

8. Tra­di­tion­ally, text in­puts to sta­tis­ti­cal NLP sys­tems have used one hot vec­tor rep­re­sen­ta­tion of words.

For in­stance, given a vo­cab­u­lary of size V, each word is rep­re­sented by a vec­tor of size V, where only one in­dex is set to one, cor­re­spond­ing to that word, and all the re­main­ing in­dices in the vec­tor are set to zero. This is known as the ‘one hot vec­tor rep­re­sen­ta­tion’. How­ever, of late, vec­tor space word mod­els are be­com­ing quite pop­u­lar as al­ter­na­tive in­put rep­re­sen­ta­tions. Mod­els such as Word2Vec (https://radim­re­hurek.com/gen­sim/mod­els/word2vec. html) and Glove (http://nlp.stan­ford.edu/projects/ glove/) are quite pop­u­lar these days for in­put rep­re­sen­ta­tion in sta­tis­ti­cal NLP sys­tems, as op­posed to rep­re­sent­ing words as one hot vec­tor. What is the ad­van­tage of us­ing vec­tor space word mod­els over tra­di­tional one hot vec­tor rep­re­sen­ta­tion? Are there any dis­ad­van­tages with such vec­tor space mod­els? 9. You are given a cor­pus and you have cre­ated the word em­bed­dings for your cor­pus us­ing the pop­u­lar Word2Vec tech­nique. Now you need to rep­re­sent each sen­tence in your cor­pus by means of a vec­tor. You need to com­pose the word vec­tors of the words in your sen­tence to gen­er­ate a sen­tence vec­tor

rep­re­sen­ta­tion. What are the pos­si­ble com­po­si­tional mod­els you can ap­ply to gen­er­ate the sen­tence vec­tor?

10. In in­for­ma­tion re­trieval, you are given a set of doc­u­ments and a query. You are asked to re­trieve the set of doc­u­ments that are most rel­e­vant to the query. Many of the com­mon in­for­ma­tion re­trieval tech­niques are built around the mea­sure of TF-IDF, where TF stands for the term fre­quency, i.e., the num­ber of times a query key­word ap­pears in a doc­u­ment. It is ob­vi­ous that one can com­pute the term fre­quency for the query terms, and re­turn a ranked list of doc­u­ments in which the query terms ap­pear most fre­quently. If this suf­fices to re­turn the most rel­e­vant doc­u­ments, why do we need the mea­sure of the in­verse doc­u­ment fre­quency (IDF)? Can you il­lus­trate with an ex­am­ple why IDF is needed in in­for­ma­tion re­trieval?

11. You have been given the prob­lem of sen­ti­ment anal­y­sis, where, given a sen­tence, you need to clas­sify it as bear­ing pos­i­tive, neg­a­tive or neu­tral sen­ti­ments. You have been given large quan­ti­ties of la­belled data of ex­am­ple sen­tences, with their tar­get sen­ti­ment po­lar­ity an­no­tated. Given that there can be words in the test in­put sen­tences that are not present in the train­ing set ex­am­ples, how would your ap­proach han­dle the ‘out of vo­cab­u­lary’ words in the test sen­tences? If you de­cide to just ig­nore the ‘out of vo­cab­u­lary’ words, what would be the im­pact on the test set clas­si­fi­ca­tion ac­cu­racy? Can you de­vise a bet­ter ap­proach to han­dling this is­sue?

12. In ma­chine learn­ing, the met­rics used in com­par­ing the clas­si­fier’s per­for­mance among dif­fer­ent mod­els are pre­ci­sion, re­call, area un­der the Re­ceiver Op­er­at­ing Char­ac­ter­is­tic (ROC) curve, and the con­fu­sion ma­trix. Can you ex­plain each of these terms?

13. You are asked to build a sys­tem that can do ‘parts of speech’ tag­ging for a given sen­tence. While ‘parts of speech’ tag­ging as­signs in­de­pen­dent la­bels for each of the words in the sen­tence, it uses the se­quence of words it has seen in the con­text to de­cide on the cur­rent la­bel. For ex­am­ple, con­sider the sen­tence: “Robert found the saw on the ta­ble and was tak­ing it out­side when he saw the snake.” The first oc­cur­rence of ‘saw’ in the above sen­tence is a verb phrase whereas the sec­ond oc­cur­rence of it is a noun phrase. By look­ing at the se­quence of words oc­cur­ring around the spe­cific word, a POS tag­ging sys­tem can dis­tin­guish be­tween the tags it should as­sign for each of the two oc­cur­rences of the word ‘saw’. Hence, POS tag­ging is an ex­am­ple of se­quence la­belling. Two of the im­por­tant al­go­rithms for se­quence la­belling are Hid­den Markov Mod­els and Con­di­tional Ran­dom Fields. Can you ex­plain each of them?

14. If you are asked to do se­quence la­belling with a deep neu­ral net­work, what type of neu­ral net­work would you em­ploy? Ex­plain the rea­son be­hind your choice.

15. Given an ar­ray of N in­te­gers, you are asked to find out whether the ar­ray is an arith­metic se­ries. An arith­metic se­ries is a se­quence of num­bers where there is a con­stant dif­fer­ence be­tween any two con­sec­u­tive num­bers of the se­ries. What is the com­plex­ity of your func­tion? Now, you are given a mod­i­fied prob­lem for which your task is to con­vert a given ar­ray into an arith­metic se­quence by drop­ping off those ar­ray el­e­ments that pre­vent it from be­ing an arith­metic se­quence. How would you solve this prob­lem? What is the time com­plex­ity of your func­tion?

16. You are given a list of N in­te­gers, where the num­bers can only be pos­i­tive. You are now be­ing asked to write code to cre­ate two lists of size N/2 such that the sum of the el­e­ments of the two sub-lists are equal, if at all it is pos­si­ble to cre­ate such sub-lists. Your func­tion should re­turn the sum of the sub-list el­e­ments if it is pos­si­ble to split the orig­i­nal list into two such sub-lists or re­turn zero if that is not pos­si­ble. What is the time com­plex­ity of your func­tion? (Note that you can move the el­e­ments of the list.) You are now told that the orig­i­nal list can con­tain ei­ther pos­i­tive or neg­a­tive in­te­gers and again told to solve the above prob­lem. What are the changes you would make to your ear­lier so­lu­tion?

17. Given that the same words can have dif­fer­ent mean­ings (this is known as pol­y­semy), how do in­for­ma­tion re­trieval sys­tems fig­ure out what is the spe­cific in­for­ma­tion that is be­ing searched for? For ex­am­ple, con­sider a user who searches for ‘win­dows’. How would the search en­gine de­ter­mine whether he/she is look­ing for Mi­crosoft Win­dows or glass win­dows? Blindly dis­play­ing search re­sults for both mean­ings is likely to im­pact the user ex­pe­ri­ence. If you are the de­signer of the search en­gine, ex­plain how you would ap­proach this prob­lem? Con­sider the re­lated prob­lem. When the user types in the search query as ‘au­to­mo­bile’, how does the search en­gine in­clude Web pages that talk about cars as well?

18. What is an en­sem­ble of clas­si­fiers? We find that, in many cases, an en­sem­ble of clas­si­fiers achieves a bet­ter per­for­mance com­pared to any in­di­vid­ual clas­si­fier model. Can you ex­plain the in­tu­ition be­hind the per­for­mance im­prove­ments achieved by en­sem­ble clas­si­fiers?

19. A well-known phrase used in ma­chine learn­ing and statis­tics is ‘cor­re­la­tion is not the same as cau­sa­tion’. Can you give an ex­am­ple to il­lus­trate this state­ment?

20. There are three kinds of an­a­lyt­ics – namely, pre­scrip­tive an­a­lyt­ics, de­scrip­tive an­a­lyt­ics and pre­dic­tive an­a­lyt­ics. Can you ex­plain each of these terms and how they are dif­fer­ent from each other?

If you have any favourite pro­gram­ming ques­tions/soft­ware top­ics that you would like to dis­cuss on this fo­rum, please send them to me, along with your so­lu­tions and feed­back, at sandyas­m_AT_ya­hoo_DOT_­com. Wish­ing all our read­ers happy cod­ing un­til next month!

Sandya Man­nar­swamy

### Newspapers from India

© PressReader. All rights reserved.