OpenSource For You

CODE SPORT

In this month’s column, we discuss some computer science interview questions.

- By: Sandya Mannarswam­y

As requested by a few of our readers, we decided to feature practice questions for computer science college students, for their upcoming on-campus and off-campus interviews. One small piece of advice to those preparing for interviews: Make sure that you are very clear about the fundamenta­l concepts. Interviewe­rs will not hold it against you if you don’t know a specific algorithm. But not being able to answer some of the key fundamenta­l concepts in different areas of computer science would definitely be an obstacle to clearing interviews. Also, make sure that you have updated your GitHub and SourceForg­e pages on which you might have shared some of your coding projects. Contributi­ng to open source and having interestin­g, even if small, coding projects online in your GitHub pages would definitely be a big plus. In this month’s column, we will discuss a medley of interview questions covering a wide range of topics such as algorithms, operating systems, computer architectu­re, machine learning and data science.

1. Given the advances in memory-driven computing, let us assume that you have been given an enterprise Class 16 core server with a huge 2TB RAM. You have been asked to design an intrusion detection system which can analyse the network logs of different systems and identify potential intrusion candidates. Note that you have to monitor close to 256 systems and analyse their network logs for potential issues. Let us also assume that you have been told which standard state-of-art approach (for intrusion detection) you would need to implement.

Now the question is: Would you design the intrusion detection applicatio­n as a scale up or a scale out applicatio­n? Can you justify your choice? Would your choice change if, instead of the enterprise Class 16 server with 2TB RAM, you are given multiple standard workstatio­ns with 32GB RAM.

2. You have been asked to build a supervised binary classifica­tion system for detecting spam emails. Basically, given an email, your system should be able to classify it as spam or non-spam. You are given a training data set containing 400,000 rows and 10,000 columns (features). The 10,000 features include both numerical variables and categorica­l variables. Now, you need to create a model using this training data set. Consider the two options given in Question 1 above, namely a server with 2TB RAM or a simple workstatio­n with 32GB RAM. For each of these two options, can you explain how your approach would change in training a model? (Clue: The workstatio­n is memory constraine­d, hence you will need to figure out ways to reduce your data set. Think about dimensiona­lity reduction, sampling, etc.) Now, assume that you are able to successful­ly create two different models — one for the enterprise server and another for the workstatio­n. Which of these two models would do better on a held-out test data set? Is it always the case that the model you built for the enterprise class server with huge RAM would have a much lower test error than the model you built for the workstatio­n? Explain the rationale behind your answer. Here is a tougher bonus question: For this specific computer system which has a huge physical memory, do you really need paging mechanisms in the operating system? If you are told that all your computer systems would have such huge physical memory, would you be able to completely eliminate the virtual memory

(VM) sub-system from the operating system? If you cannot completely eliminate the VM sub-system, what are the changes in paging algorithms (both data structures and page replacemen­t algorithms) that you would like to implement so that the cost of the virtual memory subsystem is minimised for this particular server?

3. You were asked to create a binary classifica­tion system using the supervised learning approach, to analyse

X-ray images when detecting lung cancer. You have implemente­d two different approaches and would like to evaluate both to see which performs better. You find that Approach 1 has an accuracy of 97 per cent and Approach 2 has an accuracy of 98 per cent. Given that, would you choose Approach 2 over Approach 1? Would accuracy alone be a sufficient criterion in your selection? (Clue: Note that the cancer image data set is typically imbalanced – most of the images are non-cancerous and only a few would be cancerous.)

4. Now consider Question 3 again. Instead of getting accuracies of 98 per cent and 97, both your approaches have a pretty low accuracy of 59 per cent and 65 per cent. While you are debating whether to implement a totally different approach, your friend suggests that you use an ensemble classifier wherein you can combine the two classifier­s you have already built and generate a prediction. Would the combined ensemble classifier be better in performanc­e than either of the classifier­s? If yes, explain why. If not, explain why it may perform poorly compared to either of the two classifier­s. When is an ensemble classifier a good choice to try out when individual classifier­s are not able to reach the desired accuracy?

5. You have built a classifica­tion model for email spam such that you are able to reach a training error which is very close to zero. Your classifica­tion model was a deep neural network with five hidden layers. However, you find that the validation error is not small but around 25. What is the problem with your model?

6. Regularisa­tion techniques are widely used to avoid overfittin­g of the model parameters to training data so that the constructe­d network can still generalise well for unseen test data. Ridge regression and Lasso regression are two of the popular regularisa­tion techniques. When would you choose to use ridge regression and when would you use Lasso? Specifical­ly, you are given a data set, in which you know that out of 100 features, only 10 to 15 of them are significan­t for prediction, based on your domain knowledge. In that case, would you choose ridge regression over Lasso?

7. Now let us move away from machine learning and data science to computer architectu­re. You are given a computer system with 16 cores, 64GB RAM, 512KB per core L1 cache (both instructio­n and data caches are separate and each is of size 512KB), 2MB per core L2 cache, and 1GB L3 cache which is shared among all cores. L1 cache is direct-mapped whereas L2 and L3 caches are four-way set associativ­e. Now you are told to run multiple MySQL server instances on this machine, so that you can use this machine as a common backend system for multiple Web applicatio­ns. Can you characteri­se the different types of cache misses that would be encountere­d by the MySQL instances?

8. Now consider Question 7 above, where you are not given the cache sizes of L1, L2 and L3, but have been asked to figure out the approximat­e sizes of L1, L2 and L3 caches. You are given the average cache access times (hit and miss penalty for each access) for L1, L2 and L3 caches. Other system parameters are the same. Can you write a small code snippet which can determine the approximat­e sizes of L1, L2 and L3 caches? Now, is it possible to figure out the cache sizes if you are not aware of the average cache access times?

9. Consider that you have written two different versions of a multi-threaded applicatio­n. One version extensivel­y uses linked lists and the second version uses mainly arrays. You know that the memory accesses performed by your applicatio­n do not exhibit good locality due to their inherent nature. You also know that your memory accesses are around 75 per cent reads. Now you have the choice between two computer systems — one with an L3 cache size of 24MB shared across all eight cores, and another with an L3 cache size of 2MB independen­t for each of the eight cores. Are there any specific reasons why you would choose System 1 rather than System 2 for each of your applicatio­n versions?

10. Can you explain the stochastic gradient descent algorithm? When would you prefer to use the coordinate gradient descent algorithm instead of the regular stochastic gradient algorithm? Here’s a related question. You are told that the cost function you are trying to optimise is non-convex. Would you still be able to use the co-ordinate descent algorithm?

If you have any favourite programmin­g questions/software topics that you would like to discuss on this forum, please send them to me, along with your solutions and feedback, at sandyasm_AT_yahoo_DOT_com. Till we meet again next month, wishing all our readers a wonderful and productive month ahead!

 ??  ?? Sandya Mannarswam­y
Sandya Mannarswam­y

Newspapers in English

Newspapers from India