Of mice and Python

It’s great for everyone to use open source, but with science it’s especially important Amy Boyle on Python in science

2016-01-19 -

Amy Boyle used to work at Washington State University in Vancouver developing code that talked to mice. Recently she moved jobs to New Relic – a company that provides tools for realtime monitoring of web applications – where she’s a software engineer. She describes herself as a coder with a love of science and cookies. We had the pleasure of catching up with her right after she gave a great talk on digital signal processing with Python.

Linux Format: So I guess we’re both ex-academics, but I’m sure your work in the ivory towers was more interesting than mine. Why don’t you start by telling us what you did at WSU?

Amy Boyle: I worked in an auditory neuroscience research lab as a software developer. So whatever needed to be done as far as code in the lab – that was my job. That involved creating a new data acquisition system, which I had to do from scratch. I had to modify the data analysis code to fit whatever were the needs of the experiment or whatever the scientists needed. I also made a lot of GUIs for existing code, too. LXF: What tools did you use to achieve this? AB: Mostly Matlab and Python, I did a little bit of Java with the GUI stuff. When I got to start my own project I got to choose what to use and I chose Python. There’s a lot of legacy Matlab code in the lab, but I made the choice because I was told ‘the program must do this, I don’t care how you do it’. I looked at the ecosystem of the technologies out there and what I felt was going to be good for doing science with. I noticed that there were a lot of really good Python libraries for that purpose and a pretty big scientific community in Python.

LXF: Which of these libraries do you use most frequently?

AB: If you’re going to be doing anything connected with math or science then you’re going to be using NumPy. Basically all of the other libraries are built on top of it. Scipy is another one, they have a lot of the great functions and algorithms that can be re-used – a lot of the common functions that you need are found there. I just gave my talk on digital signal processing and I was pulling from the signal library in SciPy that you can just use, and it’s great.

Python is known for not being super-fast, but those libraries actually leverage Fortran, I believe, it depends on how you optimise things and what system you are running on. But they’re much faster underneath and you’re able

to take advantage of lowerlevel languages to get things to execute in a reasonable amount of time.

LXF: How did you get into coding in the first place?

AB: My degree was a double-major, oh wait, that’s not true – that’s what I tell people in the US, but you’re from the UK so you’ll understand what I mean when I say combined honours in computer science and physiology. I went to the University of Glasgow.

LXF: Hey, I was born in Glasgow, actually.

AB: That’s cool, er, ‘braw’ I mean. Yeah, I basically took computer science as an elective subject at high school and just loved programming. Then when it came to doing my degree I couldn’t let it go, and that sort of just carried on. I still wanted to do science research though, but I knew coding skills would be valuable there too. When I originally got a job as a research assistant I quickly just became the coder and the software developer, transitioning into that role completely. Now that I’m out of academic research I’m entirely doing the software engineering part. I love doing that part, it’s fun to be able to solve problems in smaller chunks, which very often you can’t do in science: even though you’re still solving problems it tends to be more of a long game, and sometimes it just doesn’t work out. And that can be frustrating. LXF: Yeah, especially when you’ve got people demanding that you produce publishable research at regular intervals. At least with programming you generally know that there is a solution out there, but arriving at it might involve a modicum of banging your head off the desk. AB: Sometimes that doesn’t work out too, but at least when you’re solving the bigger problem there are a lot more small victories on the way there. It’s reassuring.

LXF: Have you found that there are tools or modules that are specific to neuroscience? AB: Well, there’s a lot of proprietary stuff, but one thing that’s worth mentioning is NEO (Neural Ensemble Objects). NEO supports a lot of the proprietary formats and allows collaborators to share data much more easily. It’s part of a larger consortium called Neural Ensemble. The hardest problem I think is that there’s no real standardisation. So when I was designing my data acquisition system I was wondering if there was a data format that I could just re-use and be compatible with everyone else. The answer to that turns out to be no: there’s about fifteen different formats and everyone has their own preference. So that’s why I looked into NEO, but I couldn’t

On the merits of PyLadies “You don’t feel like you’re representing your entire gender by asking a question.”

Open source for all “It’s great for everybody to use open source … but with science it’s especially important.”

quite make it fit with what we needed to do, and it didn’t really offer anything new that we did need, so I just went ahead and made my own format.

LXF: What sort of sized data sets are you working with?

AB: Usually just gigabytes – we do Basic Science, which means understanding the very fundamental things, specifically how neurons work. So we’re doing brain recordings of a single organism, and in a single session we’ll cover a period of three to eight hours. This is time series data; it’s a trace of a single neuron so you’ll end up with a baseline that goes up and down. You’ll be able to see action potential spikes in these recording windows, and you’ll just save thousands upon thousands of those into a single file. The biggest files we end up with are usually about 5GB, which is still a lot of data, but much more manageable/portable than other data sets.

LXF: I like your pythonic tattoo. Where’d you get it?

AB: Yeah it’s just temporary, we had a bunch of them at my work and so I bought some for the PyLadies booth, and also decided to apply one to my arm. LXF: Are you familiar with Sage? AB: I haven’t used it extensively, but it’s great. I used it a little bit when I was getting into

Python, but then I just started using Python with the libraries that I required.

LXF: Sage is a beast [Ed – it’s a Python interface that connects several diverse science programs together]. I found that just maintaining an up-to-date install thereof was challenging enough.

AB: Plus there are, or were – I haven’t looked at it for a while – a couple of awkward syntax differences which can be pretty annoying when you’re working from example code. Sage had notebooks (interactive documents) first though, before the IPython notebook came around there was originally Sage. That was a major development, and it’s sort of its own thing now: its not just IPython anymore – the Jupyter project encompasses the whole notebook system. So now you can do all kinds of other languages, like Julia, in the same notebook system.

LXF: How have you found being a woman in science, and how do you now find being a woman in technology?

AB: That’s an interesting question actually because at my old lab almost half of us were women, which is a little bit different from technology. And it’s a different culture there as well, I know it can vary a lot from lab to lab, and from science to science. There are a lot more women in the biological sciences, so it tends to be a better atmosphere than most. With my new job – and I don’t have the specifics – but women are definitely, by a considerable margin, in the minority in engineering. But

PyLadies has been really great for me; in some ways its probably the reason I’m still in software development. I mean in my old job I didn’t really get to talk to other programmers, but I could always look online or go to other Python user groups. It’s just really great to be able to go and talk to other women: Like it or not, the dynamics of how we’re raised and how we are means that we get more time to speak and be heard in a group of women. And it’s nice to get that kind of feedback, plus we meet regularly so there’s a social aspect to it as well, going out and hanging out and coding with people that understand where we’re coming from.

LXF: There’s no question programming is hard, so anything that makes for a more comfortable teaching and learning environment has got to be good.

AB: Definitely, I think in particular fostering an environment where you don’t have to be afraid of asking a dumb question, or you don’t feel like you’re representing your entire gender by asking a question.

LXF: Can you tell us a little about the data you gather…

AB: Well, for the signal processing that I was doing we studied mice, so we have mouse vocalisations and when we design stimuli, we’ll shift and transform those. So we’ll start with a control, and then change one tiny little factor to see if there’s any changes in the mouse’s brain to see if that’s significant to how our brains process sound. So I actually use DSP to make those sounds sound exactly like they’re supposed to: I had to do custom calibration for the speakers and that involved getting the speaker transform and applying that filter to all outgoing stimuli, and then designing a custom digital filter and having that be something that you can run at the start of every experiment. Because of the nature of the equipment and when you’re working with ultrasonic frequencies, positioning matters so much: if you bump anything then you basically have to rerun the entire calibration. So having that calibration be an integrated part of the data acquisition system was really important as it needed to be able to recreate how the auditory stimuli actually occurred so that the natural focalisations aren’t super distorted.

LXF: Has the glacial but inexorable uptake of Python 3 had any impact on your work?

AB: When I first started to write my data acquisition system – so this is going back a couple of years – I tried to use Python 3, thinking that I should use the most up-to-date version. Unfortunately, I was using a lot of external libraries, not all of which turned out to be Python 3-compatible. I tried to work around and fix this but there was so much back-andforth that I just ended up taking all my code,

back-porting it to Python 2.7 and from there on out I never had any problems.

LXF: A lot of major projects have shifted, but I think 3-compatibility is still a problem.

AB: It definitely is an issue, and so is the other direction – there’s still a lot of people stuck on Python 2.6. So at New Relic there’s a policy that all code has to be compatible with 2.6, 2.7 and 3.3. Our whole codebase is written to work with all of these, ambidextrously if you like.

LXF: It’s simultaneously disappointing and amusing that Python 3 has seen such lacklustre adoption over its seven-year lifespan. Many people, particularly those that don’t care about Unicode strings, don’t really see it as an upgrade. I think the core interpreter actually ends up being slower than in 2.7, due to more uniformity and less case-specific optimisations, and that perturbs some people.

AB: I mean there’s so much stuff that was written for 2.7 that’s going to be around for a long time yet. Some of it would be easy to port, maybe they just need to change their print statements, but a lot of people tend to adopt an ‘if it’s not broken, don’t break it’ mentality. Then I guess there are also some projects that have been abandoned by the original developer or developers and no one’s been brave enough to port it into modernity. LXF: Tell me about your job at New Relic.

AB: I’m working on the Python team there, so I’m working on their Python agent, which is the part of the code that does the instrumentation via the Python app. So you’ll download that agent and connect it with your own app and it’s the part that gathers all the metrics, all the data and sends that up to the New Relic servers. New Relic then processes all that data and presents it to the users. So basically I’m writing the code that’s going to be crawling around other people’s apps and harvesting their data.

LXF: Are there any notable Python projects that you think deserve a mention?

AB: Well, there was this one project I was working on where the bottleneck was graphing, just displaying data. So the de facto standard for plotting data in Python is called MatPlotLib: it’s great, there are tons of examples, and I used in my talks – but it’s not the fastest. I needed my data acquisition system to be super-responsive, so I tried to speed up Matplotlib a little bit, but it was really holding up my program. But then I found this library called PyQtGraph which is really fast.

LXF: You said that Py-Cute-Graph, is that the industry-accepted pronunciation?

AB: Ha ha, yes everyone calls that library ‘Q-Tee’, but it’s a lowercase ‘t’ and not an acronym.

But whatever, I don’t actually care, I say both. Anyways, because it uses Qt’s GraphicsView framework, it’s much faster than Matplotlib, so if you want to write a desktop application and need to graphic things fast, then it is fantastic.

LXF: I’ve recently become a big fan of all things Qt5, having moved all my desktops first to LXQt and now to Plasma 5. Does PyQtGraph support Qt5?

AB: Not when I was using it, and I still don’t think it does yet. I mean when I was working with it Qt5 was a lot newer than say, Python 3 was new, so comparatively few things actually could use it. My point of view was sort of the utilitarian: ‘I’m just going to use whatever works best for me’. I’m pretty sure there weren’t even Python wrappers for Qt5 back then, so even if I wanted to use it in another project, I couldn’t. It takes a while for these technologies to trickle down and actually become usable in other projects.

LXF: I’ve dabbled with Matplotlib, and I agree that it’s really powerful and all the examples they give are great. But syntax-wise it seems really unPythonic, there’s all this weird new syntax and decorators, it’s kind of like learning another language.

AB: Yeah, that’s because it’s designed to ween people off Matlab. I’d been using Matlab prior to using Python so that was fine for me, because I was used to all the weird plotting commands. A lot of labs are still stuck using Matlab because that’s what, traditionally, everyone has used. So they designed that library that way to make for a smooth transition process. That worked out well for my lab because there’s a lot of Matlab code. I was pretty much the only coder and I thought that this is a skill that’s only going to become more important and I think basically everyone in a lab should learn. So I started to lead coding workshops and I primarily taught using Python. One or two of the members had a little bit of Matlab experience, so that turned out quite nicely for them, having some vaguely familiar syntax.

LXF: I’ve also dabbled with Octave, which touts itself as being a drop-in Matlab replacement doesn’t it?

AB: Yes, I guess it goes one step further than Matplotlib in that respect. I also have only dabbled with it, and to be honest I really don’t know why more people don’t use it. I mean it’s not quite a drop-in replacement, especially if you’re working with extra toolboxes. A lot of the graphical user interface stuff I was doing in Matlab also doesn’t port across nicely. But I used Octave for a machine learning course a while ago, and it worked perfectly for that. What’s really cool is that the MEX (Matlab executable) functionality from Matlab has been recreated, so it’s easy to work with external C, C++ or Fortran functions.

LXF: So another avenue down which we can almost escape from proprietary software?

AB: Yes, exactly. I mean it’s great for everybody to use open source wherever they can, but with science it’s especially important: We’re trying to advance that knowledge and we should be sharing as much of the knowledge as possible to make that process efficient, especially when we’re using taxpayer dollars. We have a responsibility to be more efficient with that money and to avoid any duplication of effort, or, worse, any obscuring of our results and methods. These things are important.

Of mice and Python

It’s great for everyone to use open source, but with science it’s especially important Amy Boyle on Python in science

Newspapers in English

Newspapers from Australia