Static analysis development
Discover how to build securer web apps in Tim Armstrong’s new series, who starts with the essentials of Static Analysis and CI/CD.
Build more secure web apps in Tim Armstrong’s new series, who starts with the essentials of Static Analysis and CI/CD.
As a developer or software engineer, having enough time to work on cleaning up technical debt and fixing vulnerabilities is difficult. It generally requires that your project manager understands the risks and why prioritising the clean-up of technical debt is important. Thereby ensuring that your workplace isn’t the next company to be lambasted in the media for being the target of a cyberattack (or worse, leaking PII client data in a massive security breach). Project managers as a whole have a hard time comparing the risks of an attack to the benefits of a new feature. The feature is quantifiable, while the risk of getting hacked is not (especially if you don’t have the tooling to realise that you’ve been attacked).
This tutorial covers how you can integrate static analysis into your source-code management to identify, quantify, and prevent vulnerabilities in your code while improving general code readability and maintainability. This will enable project managers to obtain insights into any extant vulnerabilities or technical debt in the code, while simultaneously helping developers and engineers write better code. We’ll be focusing on Python, but there are alternatives to any tooling used for every language. The tutorial will also be using GitLab as the source-code host and CI/CD solution. This is to make things approachable without the cost or complexity of closed source platforms.
Because this tutorial isn’t about GitLab’s built-in oneclick solutions (although these can be a good place to start if you don’t have time to set up your own pipeline) the final result of this tutorial is a functional static analysis stage for a CI/CD pipeline along with an understanding of what you can gain from building this into your workflow.
Linting hell
The first stage of any good pipeline is linting. This is a form of static analysis that dates back to the 70s, and is one of the most useful and versatile methods to identify and prevent bugs in code from reaching production. Lint derives its name from the fluff that forms pill-shaped “bugs” on clothing. It should come as no surprise then that the goal of linting is to find bugs (speaks for itself), stylistic errors (clean code is easier to spot flaws in during peer reviews), and potentially vulnerable constructs (vulnerable to code injections).
So a quick search on essentially any search engine for ‘python linter’ will bring up a handful of articles and multiple links to pylint, so you’d be forgiven that there isn’t really much happening in this space. In reality, this couldn’t be further from the truth; other projects just don’t seem to be very good at SEO. The python linting space is well represented with everything from PEP compliance to McCabe Cyclomatic Complexity scanners (a fancy name for a tool that counts how many branches there are through a function).
To make life simpler, a group of developers has built a fantastic tool called Prospector that wraps a curated list of some of the best python linters and scrapes the output of each tool into a common format.
Before we get started our pipeline let’s give Prospector a try locally. As it’s on PyPi this is as easy as
pip install prospector[with_everything]==1.3.1
( pip3 if you’re running dual-stack Python 2.x & 3.x). In the directory of any python project run
prospector ./
This will bring up a few findings that should be fairly simple to fix in the code.
Constant craving
GitLab’s integrated CI/CD solution is easy to use, free for most small- and mid-projects, and incredibly powerful. It’s also convenient because it’s got most of
we need to get started all in one place. Of course, if you prefer GitHub then you’re not left out in the cold, as you can use GitHub Actions (or a third party like CircleCI) to achieve a similar result. The core focus of this tutorial isn’t the platform used, it’s the pipeline that we’re building and the components that we’re using.
That said, you can find the sample code for this tutorial at https://gitlab.com/plaintextnerds/web-appsecurity-vulnerable-code and the finished result at
https://gitlab.com/plaintextnerds/web-app-securitytutorial1-lxf279, so let’s get started. The sample code repo contains a simple Django project that has many common vulnerabilities. Through the course of this series you’ll be able to identify and fix them. So while you can certainly follow along using your own project repo, it’s recommended to use the sample code.
In GitLab, you define your pipelines using a file named .gitlab-ci.yml. Without getting into too much detail about the syntax here (which is incredibly extensive and flexible), there are four key elements to the syntax: stages, jobs, images and scripts. In short, scripts run in containers that are encapsulated in jobs; you can group multiple jobs to run in parallel as a stage; Stages are run sequentially; if any script in a stage raises a non-zero exit code then the pipeline stops at that stage. If you want to go further then you can then define ‘workflow’, ‘rules’ and even add jobs that only run when other jobs fail.
Working with CI/CD is easy. To kick things off, fork the sample code repository in GitLab and do a:
git clone
from a terminal. Once that’s all synced you’ll want to create a new file called .gitlab-ci.yml (Brief reminder: files that start with a . on Linux machines are “hidden” so if you can’t find it after creation. Check if you’ve got “show hidden files” enabled) and open the file in your editor of choice.
First add the ‘stages’ directive, where the tags are defined for grouping jobs. The order here is the order of execution so you might want to think about what order makes the most sense for you. Generally speaking, because static analysis is the fastest to run and doesn’t usually require any build stages to have completed, it’s best to put it first.
For now, you just need to add the stage you’re working on. Which, for lack of a better name, can be called ‘static-analysis’. Resulting in a directive that looks like this:
stages:
- static-analysis
Next, you need to define the job. Reading through the GitLab-CI documentation can be a little unclear when you get started, so let’s take a look at an example first.
prospector: stage: static-analysis image: name: ckleemann/prospector:latest entrypoint:
- ‘’ before_script:
- pip install --upgrade pip
- pip install -r ./src/requirements.txt script:
- prospector ./src allow_failure: true
Breaking this down then, defining a job is done by adding a named object to the root of the YAML. You can use essentially any name so long as it’s valid YAML and not a reserved word (such as ‘stages'). In this object, you’ll need to set the ‘stage’ that the ‘job’ should be run during, the docker container ‘image’ to use, as well as any scripts to run.
The example above uses a handy pre-built docker container for Prospector maintained by someone who goes by the username of ckleemann on GitHub, so give them some love with a star on the repo. In order to ensure that the libraries used in the example code are installed, it overrides the entrypoint , defines a before_ script to install the libraries listed in the requirements. txt, and defines a script to run.
In both the before_script and script blocks you can define multiple commands to run line by line. So if there is other preparation to be done for your projects then this is where you add extra lines.
If you git add .gitlab-ci.yml then git commit -m “added gitlab-ci” and finally git push that up to GitLab it should run the pipeline for the first time. You’ll be able to monitor it’s progress live or comeback to view the result later.
If you’ve got a lot of libraries in your requirements.txt,
or some of the libraries take exceptionally long to install, or perhaps a private PyPi server hosting custom libraries, then it can be worth creating your own docker file that rolls the before_script steps in there directly, making use of the build cache to reduce your run time.
Tuning Prospector
Out of the box Prospector should detect a small handful of faults in the sample code, which doesn’t seem too bad and should be an easy fix. So, now that you’ve got the pipeline up and running it’s time to enhance
Prospector’s configuration, enabling some of the security modules that aren’t enabled by default, and fixing a PEP257 conflict. To do this you’ll want a copy of the current default config, running
prospector --strictness high --show-config
locally, which will dump this into a YAML-formatted section of its output called Profile . Copy this into a file new file in the root of the project called prospectorprofile.yml.
There are a few issues with the default config here, so open this file in your favourite editor, ready to make a few changes. This first is that Prospector essentially disables a lot of the PEP8 scanner, given that you’ll be fixing 100 per cent of PEP8 violations with another tool later on. Go ahead and replace the entire disable list of the PEP8 block with an empty pair of square brackets such that it reads
disable: [ ]
The second is a conflict between D212 and D213, so in the disable block of the pep257 object add - D212 . Finally, the max-parents is too low for a Django project so ideally set that to 8 , and the max-line-length is a bit out-of-date, so setting that to around 120 brings it in line with modern screen resolutions.
Next, you’ll want to explicitly enable the modules you want to run by setting the run directive in each of the objects to true . For example, enabling Bandit would result in the ‘bandit’ object looking like:
bandit: disable: [] enable: [] options: {} run: true
A good list of modules to start when tuning your
prospector profile is Bandit, Dodgy, McCabe, Mypy, PEP257, Pyflakes, Pylint and Pyroma. There is some overlap between modules in this list, but each module provides sufficient unique results that it’s worth some duplication of any overlapping messages or double findings. Enabling each of these modules and running prospector locally with the command prospector -P prospector-profile.yml ./src
should reveal a much higher number of findings. To roll this into your pipeline open up your .gitlab-ci.yml and update the prospector command. At this point, your .gitlab-ci.yml should look like this: stages:
- static-analysis prospector: stage: static-analysis image: name: ckleemann/prospector:latest entrypoint:
- ‘’ before_script:
- pip install --upgrade pip
- pip install -r ./src/requirements.txt script:
- prospector -P ./prospector-profile.yml ./src
Now you’re all set to git add both the prospectorprofile.yml and the changes to your .gitlab-ci.yml, then git commit -m “Custom prospector profile”
and finally git push .
Back over in GitLab now, you should see the pipeline running. When it’s done you should get the same report as you got from your local machine.
Because pipelines like this one are likely to fail for a while, as you explain all the vulnerabilities and risks you’ve found in the codebase to the project management so that they can give it (hopefully) appropriate priorities, you probably want to add an allow_failure: True directive to the prospector object so that it doesn’t block merge requests.
Grep on steroids
There are, of course, cases where certain vulnerabilities slip through the gaps while tools like Bandit are being updated with the latest definitions. Or perhaps you want to enforce some custom rules, and for these cases there is Semgrep. This is a fantastic tool for detecting and (in some cases) automatically fixing issues in your code. As the name implies, it’s somewhat comparable to grep, if
grep was on some serious steroids.
Out of the box, it comes with a mind-blowing number of validation rules (including community contributed ones). But what makes it really powerful is the ability to quickly create your own rules.
Adding Semgrep to your pipeline is pretty easy; they even provide instructions for basically every CI/CD solution on the website. To have it run in parallel with
Prospector during the ‘static-analysis’ stage of your CI/ CD add the following object to your .gitlab-ci.yml: semgrep: stage: static-analysis image: returntocorp/semgrep-agent:v1 script: semgrep-agent variables:
INPUT_CONFIG: p/security-audit p/secrets Committing and pushing this change will trigger the pipeline. However, this time you should see blocks for both Semgrep and Prospector running in parallel.
Of course, Semgrep also has some serious superpowers, namely the Semgrep App. This enables
centrally managed policies so that you can maintain a common set of rules and checks across all of your organisation’s projects as well as keeping track of findings. Using the Semgrep app and its integration with GitLab (or any other source code hosting solution for that matter) enables you to add new rules to do things like automatically apply fixes problematic code during the code review. Which is definitely worth the money.
Hooks and crannies
So now you’ve got your pipeline running, and you’ve got a massive list of new issues to deal with, so where do you start? Fortunately, a lot of the issues detected in the code can be automatically fixed and by doing this with pre-commit hooks you can ensure that the problems don’t come back.
So what are pre-commit hooks? In Git, hooks are executables (most commonly shell scripts) that git will run at certain times in the life-cycle, as defined by their name. As such, pre-commit hooks are ones that run prior to the commit process whenever you call git commit
They can do anything from modifying the commit message to editing (or injecting code). In this capacity, they’re commonly used to do things like code formatting automatically.
Precommitments
A popular python utility for creating managing precommit hooks is, unsurprisingly, called Pre-Commit. To install Pre-Commit run the command:
pip install pre-commit
(once again, remembering to use pip3 if you’re running dual-stack Python 2.x & 3.x). Once it’s installed create a file called .pre-commit-config.yaml. In this file, you can define the tasks that you want to run. For now, go ahead and set it up to run Black (a popular python code formatter), which will fix all of the formatting errors and ensure that any code committed doesn’t introduce new formatting errors. To do this open the .pre-commitconfig.yaml file and add the following lines:
repos:
- repo: git@github.com:/psf/black rev: 21.5b2 hooks:
- id: black language_version: python3
Then run:
git add .pre-commit-config.yaml
followed by:
pre-commit run --all-files
On the first run, this will show that Black has “Failed” – this is because it needed to modify one or more files, which you’ll now also need to run:
git add
Finally, run:
pre-commit install
To have it automatically run every time you run:
git commit
Do note, however, that this will cause commits to fail if there are files that Black needed to change. To compensate for this you would need to run:
git commit
For a second time. Pushing this up now with:
git push should result in a cleaner result from Prospector (with all of the PEP8 issues resolved). That’s it for this issue. In the next issue we’ll expand on this pipeline, adding automated dependency and container scanning.