CODE SPORT
In this month’s column, we discuss some computer science interview questions.
As requested by a few of our readers, we decided to feature practice questions for computer science college students, for their upcoming on-campus and off-campus interviews. One small piece of advice to those preparing for interviews: Make sure that you are very clear about the fundamental concepts. Interviewers will not hold it against you if you don’t know a specific algorithm. But not being able to answer some of the key fundamental concepts in different areas of computer science would definitely be an obstacle to clearing interviews. Also, make sure that you have updated your GitHub and SourceForge pages on which you might have shared some of your coding projects. Contributing to open source and having interesting, even if small, coding projects online in your GitHub pages would definitely be a big plus. In this month’s column, we will discuss a medley of interview questions covering a wide range of topics such as algorithms, operating systems, computer architecture, machine learning and data science.
1. Given the advances in memory-driven computing, let us assume that you have been given an enterprise Class 16 core server with a huge 2TB RAM. You have been asked to design an intrusion detection system which can analyse the network logs of different systems and identify potential intrusion candidates. Note that you have to monitor close to 256 systems and analyse their network logs for potential issues. Let us also assume that you have been told which standard state-of-art approach (for intrusion detection) you would need to implement.
Now the question is: Would you design the intrusion detection application as a scale up or a scale out application? Can you justify your choice? Would your choice change if, instead of the enterprise Class 16 server with 2TB RAM, you are given multiple standard workstations with 32GB RAM.
2. You have been asked to build a supervised binary classification system for detecting spam emails. Basically, given an email, your system should be able to classify it as spam or non-spam. You are given a training data set containing 400,000 rows and 10,000 columns (features). The 10,000 features include both numerical variables and categorical variables. Now, you need to create a model using this training data set. Consider the two options given in Question 1 above, namely a server with 2TB RAM or a simple workstation with 32GB RAM. For each of these two options, can you explain how your approach would change in training a model? (Clue: The workstation is memory constrained, hence you will need to figure out ways to reduce your data set. Think about dimensionality reduction, sampling, etc.) Now, assume that you are able to successfully create two different models — one for the enterprise server and another for the workstation. Which of these two models would do better on a held-out test data set? Is it always the case that the model you built for the enterprise class server with huge RAM would have a much lower test error than the model you built for the workstation? Explain the rationale behind your answer. Here is a tougher bonus question: For this specific computer system which has a huge physical memory, do you really need paging mechanisms in the operating system? If you are told that all your computer systems would have such huge physical memory, would you be able to completely eliminate the virtual memory
(VM) sub-system from the operating system? If you cannot completely eliminate the VM sub-system, what are the changes in paging algorithms (both data structures and page replacement algorithms) that you would like to implement so that the cost of the virtual memory subsystem is minimised for this particular server?
3. You were asked to create a binary classification system using the supervised learning approach, to analyse
X-ray images when detecting lung cancer. You have implemented two different approaches and would like to evaluate both to see which performs better. You find that Approach 1 has an accuracy of 97 per cent and Approach 2 has an accuracy of 98 per cent. Given that, would you choose Approach 2 over Approach 1? Would accuracy alone be a sufficient criterion in your selection? (Clue: Note that the cancer image data set is typically imbalanced – most of the images are non-cancerous and only a few would be cancerous.)
4. Now consider Question 3 again. Instead of getting accuracies of 98 per cent and 97, both your approaches have a pretty low accuracy of 59 per cent and 65 per cent. While you are debating whether to implement a totally different approach, your friend suggests that you use an ensemble classifier wherein you can combine the two classifiers you have already built and generate a prediction. Would the combined ensemble classifier be better in performance than either of the classifiers? If yes, explain why. If not, explain why it may perform poorly compared to either of the two classifiers. When is an ensemble classifier a good choice to try out when individual classifiers are not able to reach the desired accuracy?
5. You have built a classification model for email spam such that you are able to reach a training error which is very close to zero. Your classification model was a deep neural network with five hidden layers. However, you find that the validation error is not small but around 25. What is the problem with your model?
6. Regularisation techniques are widely used to avoid overfitting of the model parameters to training data so that the constructed network can still generalise well for unseen test data. Ridge regression and Lasso regression are two of the popular regularisation techniques. When would you choose to use ridge regression and when would you use Lasso? Specifically, you are given a data set, in which you know that out of 100 features, only 10 to 15 of them are significant for prediction, based on your domain knowledge. In that case, would you choose ridge regression over Lasso?
7. Now let us move away from machine learning and data science to computer architecture. You are given a computer system with 16 cores, 64GB RAM, 512KB per core L1 cache (both instruction and data caches are separate and each is of size 512KB), 2MB per core L2 cache, and 1GB L3 cache which is shared among all cores. L1 cache is direct-mapped whereas L2 and L3 caches are four-way set associative. Now you are told to run multiple MySQL server instances on this machine, so that you can use this machine as a common backend system for multiple Web applications. Can you characterise the different types of cache misses that would be encountered by the MySQL instances?
8. Now consider Question 7 above, where you are not given the cache sizes of L1, L2 and L3, but have been asked to figure out the approximate sizes of L1, L2 and L3 caches. You are given the average cache access times (hit and miss penalty for each access) for L1, L2 and L3 caches. Other system parameters are the same. Can you write a small code snippet which can determine the approximate sizes of L1, L2 and L3 caches? Now, is it possible to figure out the cache sizes if you are not aware of the average cache access times?
9. Consider that you have written two different versions of a multi-threaded application. One version extensively uses linked lists and the second version uses mainly arrays. You know that the memory accesses performed by your application do not exhibit good locality due to their inherent nature. You also know that your memory accesses are around 75 per cent reads. Now you have the choice between two computer systems — one with an L3 cache size of 24MB shared across all eight cores, and another with an L3 cache size of 2MB independent for each of the eight cores. Are there any specific reasons why you would choose System 1 rather than System 2 for each of your application versions?
10. Can you explain the stochastic gradient descent algorithm? When would you prefer to use the coordinate gradient descent algorithm instead of the regular stochastic gradient algorithm? Here’s a related question. You are told that the cost function you are trying to optimise is non-convex. Would you still be able to use the co-ordinate descent algorithm?
If you have any favourite programming questions/software topics that you would like to discuss on this forum, please send them to me, along with your solutions and feedback, at sandyasm_AT_yahoo_DOT_com. Till we meet again next month, wishing all our readers a wonderful and productive month ahead!