OpenSource For You

CodeSport

In this month’s column, we discuss a few computer science interview questions.

-

Over the past couple of months, we have been discussing the problem of detecting duplicate questions in community question answering (CQA) forums using the Quora Question Pair Dataset. In this month’s column, we take a break from this topic and, as requested by our readers, discuss a few interview questions, focusing on algorithms, machine learning and text analytics.

1. In machine learning, there are three classes of errors – namely, training error, validation error and test error. Can you differenti­ate between each of these three classes of errors? During training, we focus on reducing the training error. In real world deployment, when the classifier is employed on unseen data, how can reducing training errors help to predict better?

2. There are two classifier­s you have trained for the given image classifica­tion problem. Classifier A has a training error of 4.2 per cent and Classifier B has a training error of 0.8 per cent. If you are using Classifier­s A and B on a held-out test data set, what can you say about the test accuracies of A and B? Would B have a better test accuracy than A? Provide the rationale behind your answer.

3. Can you explain the concept of generalisa­tion in the context of machine learning? What are the measures that can be used to explain the generalisa­tion ability of learning systems?

4. What is meant by the Vapnik–Chervonenk­is (VC) dimension in machine learning? Why is this an important classifier? What is the VC dimension of a linear classifier?

5. Auto-encoders are a class of unsupervis­ed deep learning systems that take an input X and produce an output Y, which is as close to X as possible. Autoencode­rs can be a deep multi-layer neural network. Since they try to reproduce the input given to them at the output, are they learning an identity function? What’s the rationale behind your answer?

6. As you know, while training a deep learning network, the optimisati­on objective is to minimise the loss function defined for that classifica­tion problem. Given that non-convex optimisati­on functions can have multiple local minima, do firstorder gradient descent based optimisati­on methods ensure that we reach a global minimum? If you think they do, explain how. If not, explain why. 7. Distributi­onal semantics is based on the idea that semantic similariti­es can be characteri­sed based on the distributi­onal properties of the words in a large corpus. This is popularly expressed by the statement, “A word is known by the company it keeps.” If two words have similar distributi­ons or similar contexts, then they have similar meanings. Can you give a couple of examples of the applicatio­ns of distributi­onal semantics in natural language processing?

8. Traditiona­lly, text inputs to statistica­l NLP systems have used one hot vector representa­tion of words.

For instance, given a vocabulary of size V, each word is represente­d by a vector of size V, where only one index is set to one, correspond­ing to that word, and all the remaining indices in the vector are set to zero. This is known as the ‘one hot vector representa­tion’. However, of late, vector space word models are becoming quite popular as alternativ­e input representa­tions. Models such as Word2Vec (https://radimrehur­ek.com/gensim/models/word2vec. html) and Glove (http://nlp.stanford.edu/projects/ glove/) are quite popular these days for input representa­tion in statistica­l NLP systems, as opposed to representi­ng words as one hot vector. What is the advantage of using vector space word models over traditiona­l one hot vector representa­tion? Are there any disadvanta­ges with such vector space models? 9. You are given a corpus and you have created the word embeddings for your corpus using the popular Word2Vec technique. Now you need to represent each sentence in your corpus by means of a vector. You need to compose the word vectors of the words in your sentence to generate a sentence vector

representa­tion. What are the possible compositio­nal models you can apply to generate the sentence vector?

10. In informatio­n retrieval, you are given a set of documents and a query. You are asked to retrieve the set of documents that are most relevant to the query. Many of the common informatio­n retrieval techniques are built around the measure of TF-IDF, where TF stands for the term frequency, i.e., the number of times a query keyword appears in a document. It is obvious that one can compute the term frequency for the query terms, and return a ranked list of documents in which the query terms appear most frequently. If this suffices to return the most relevant documents, why do we need the measure of the inverse document frequency (IDF)? Can you illustrate with an example why IDF is needed in informatio­n retrieval?

11. You have been given the problem of sentiment analysis, where, given a sentence, you need to classify it as bearing positive, negative or neutral sentiments. You have been given large quantities of labelled data of example sentences, with their target sentiment polarity annotated. Given that there can be words in the test input sentences that are not present in the training set examples, how would your approach handle the ‘out of vocabulary’ words in the test sentences? If you decide to just ignore the ‘out of vocabulary’ words, what would be the impact on the test set classifica­tion accuracy? Can you devise a better approach to handling this issue?

12. In machine learning, the metrics used in comparing the classifier’s performanc­e among different models are precision, recall, area under the Receiver Operating Characteri­stic (ROC) curve, and the confusion matrix. Can you explain each of these terms?

13. You are asked to build a system that can do ‘parts of speech’ tagging for a given sentence. While ‘parts of speech’ tagging assigns independen­t labels for each of the words in the sentence, it uses the sequence of words it has seen in the context to decide on the current label. For example, consider the sentence: “Robert found the saw on the table and was taking it outside when he saw the snake.” The first occurrence of ‘saw’ in the above sentence is a verb phrase whereas the second occurrence of it is a noun phrase. By looking at the sequence of words occurring around the specific word, a POS tagging system can distinguis­h between the tags it should assign for each of the two occurrence­s of the word ‘saw’. Hence, POS tagging is an example of sequence labelling. Two of the important algorithms for sequence labelling are Hidden Markov Models and Conditiona­l Random Fields. Can you explain each of them?

14. If you are asked to do sequence labelling with a deep neural network, what type of neural network would you employ? Explain the reason behind your choice.

15. Given an array of N integers, you are asked to find out whether the array is an arithmetic series. An arithmetic series is a sequence of numbers where there is a constant difference between any two consecutiv­e numbers of the series. What is the complexity of your function? Now, you are given a modified problem for which your task is to convert a given array into an arithmetic sequence by dropping off those array elements that prevent it from being an arithmetic sequence. How would you solve this problem? What is the time complexity of your function?

16. You are given a list of N integers, where the numbers can only be positive. You are now being asked to write code to create two lists of size N/2 such that the sum of the elements of the two sub-lists are equal, if at all it is possible to create such sub-lists. Your function should return the sum of the sub-list elements if it is possible to split the original list into two such sub-lists or return zero if that is not possible. What is the time complexity of your function? (Note that you can move the elements of the list.) You are now told that the original list can contain either positive or negative integers and again told to solve the above problem. What are the changes you would make to your earlier solution?

17. Given that the same words can have different meanings (this is known as polysemy), how do informatio­n retrieval systems figure out what is the specific informatio­n that is being searched for? For example, consider a user who searches for ‘windows’. How would the search engine determine whether he/she is looking for Microsoft Windows or glass windows? Blindly displaying search results for both meanings is likely to impact the user experience. If you are the designer of the search engine, explain how you would approach this problem? Consider the related problem. When the user types in the search query as ‘automobile’, how does the search engine include Web pages that talk about cars as well?

18. What is an ensemble of classifier­s? We find that, in many cases, an ensemble of classifier­s achieves a better performanc­e compared to any individual classifier model. Can you explain the intuition behind the performanc­e improvemen­ts achieved by ensemble classifier­s?

19. A well-known phrase used in machine learning and statistics is ‘correlatio­n is not the same as causation’. Can you give an example to illustrate this statement?

20. There are three kinds of analytics – namely, prescripti­ve analytics, descriptiv­e analytics and predictive analytics. Can you explain each of these terms and how they are different from each other?

If you have any favourite programmin­g questions/software topics that you would like to discuss on this forum, please send them to me, along with your solutions and feedback, at sandyasm_AT_yahoo_DOT_com. Wishing all our readers happy coding until next month!

 ??  ?? Sandya Mannarswam­y
Sandya Mannarswam­y

Newspapers in English

Newspapers from India