Build a software analysis pipeline
In the second part of our web application security series, Tim Armstrong takes us through the essentials of software composition analysis.
In the second instalment of our
Web Application Security series, Tim Armstrong takes us through the essentials of Software Composition Analysis.
Software developers the world over have a hardenough time maintaining and securing their own code, so it’s fairly common for the libraries and docker containers used, especially in large projects, to be a few versions behind. When was the last time you actually audited 100 per cent of the code for all of the software used in any of your projects. Never, right? You don’t have time, you’re not an expert in every language, and by the time you were done you’d need to do it all again. Software composition analysis (SCA) solves this problem by effectively doing this for you.
In this tutorial, you’ll learn how to use a number of SCA tools to protect your code by extending the CI/CD pipeline created in the first part of this series, where we learned about static analysis and setting up a pipeline in GitLab CI. You can get a copy of where we left off by forking the repository at https://gitlab.com/ plaintextnerds/web-app-security-tutorial1-lxf279, but we highly recommend picking up a copy of the previous issue and following that first if you can.
SCA tools such as Snyk, WhiteSource, Gemnasium and Dependabot scan your dependencies and containers for vulnerable versions, with the goal of either updating it for you via a pull request (PR) or notifying you of the issue. Each of them works in slightly different ways, uses different databases, and presents the data in different ways, so finding the right one for you takes some exploration. To this end, this tutorial will be looking at Dependabot which is open source, and Snyk which is reasonably priced and offers a free option for individuals and open source projects.
Snyk it to them!
Snyk is a hosted solution, so to get started you’re going to need to create an account. You can do this by going to https://app.snyk.io/login and selecting the identity provider of your choice. There isn’t a direct registration option, which could be an issue for people who don’t trust any of the providers listed, but the selection is pretty big so it shouldn’t be a problem.
Next, you’ll be presented with the option to select the location of your source code. On this landing page the choices are GitHub and Bitbucket, but because this tutorial is using GitLab you’ll need to click the full “list of integrations” link. From here you can select GitLab.
Because of the nature of what Snyk is doing, in order to get it working you need to give it a personal access token with API Scope privileges. This is the highest level of privilege that you can grant a token in Gitlab. So if you’re working with confidential code (in other words, it’s not an open source project) it’s best to set up a dedicated account for it so that Snyk is acting as its own user. This is best practice when dealing with any thirdparty integrations in case the API key is leaked somehow and you need to identify unauthorised modifications easily.
To create a personal access token in GitLab go to https://gitlab.com/-/profile/personal_access_tokens
(GitLab account required) in a new tab. Give the token a name in the Token Name field, check the box to grant it API scope, and click ‘Create personal access token’. Confusingly this inserts an element (that contains the token) into the page just above the section where you entered the details. This only shows up once, so if you refresh or leave the page before you’ve copied it you’ll need to delete the token and recreate it.
Now that you have your token, head back over to the Snyk tab, paste it in the box and hit Save. Now you’ve got it linked you’ll need to add the project to Snyk, so go ahead and hit the button, which will take you to a page where you can select any of the projects the GitLab user has access to. Select your fork of the Web App Security tutorial code and click the ‘Add selected repositories’ button. This should find the requirements.txt file and start scanning.
Looking at the results from the scan you can see that the version of Django used has a known vulnerability – specifically a SQL injection pathway known as CVE2021-35042. This vulnerability was found in the time between the time of writing the previous tutorial and this one, which exemplifies the importance of having good SCA tools in your pipeline!
Plug the vulnerability gap
Hitting the ‘Fix this vulnerability’ button finds the smallest upgrade that resolves the vulnerability. If it’s a major version (assuming semantic versioning –
Alternatively, you could wait for Snyk to create the Merge Request for you when it next does its scheduled scan. Every time it does a scheduled scan Snyk automatically creates a Merge Request (if one doesn’t exist already) for any problems found.
By default, Snyk scans your Repo once a day (and once per Merge Request) which is pretty helpful because it means that, unlike a pipeline workflow, the Merge Requests are still being created even if you haven’t worked on a project in a while.
Hopping over to the Dependencies tab and hitting All Dependencies shows that not only is Snyk detecting the dependencies that you’ve defined, but also the dependencies of those dependencies (so-called transitive dependencies) by constructing a graph of each library’s requirements. This ensures that you’re also protected for issues that are deep in the tree. When using pip and requirements.txt this is more of a best guess solution though, so using a locking dependency manager like Poetry or PIPEnv can improve the reliability here by providing Snyk with all the information it needs to know exactly which versions you’re using.
That old Dependabot
So Snyk is certainly useful, but perhaps you work at an organisation where giving a third party complete access to your source code isn’t acceptable for whatever reason. This is where Dependabot comes in, because it can run as a standalone in your CI/CD pipeline where you can keep everything isolated.
A short while after the original Dependabot was bought out by Microsoft’s GitHub, a GitLab-flavoured fork (https://gitlab.com/dependabot-gitlab/ dependabot) was created by Andrejs Cunskis which has since been sponsored by JetBrains. There are a number of supported ways to get Dependabot up and running, but in this case you’ll be needing the standalone mode because it has to be built into the CI/CD pipeline. To do this the first thing you’ll need to do is create a folder called .gitlab in the project directory. In that folder create a file called dependabot.yml.
For this project, the minimum that you’ll need to define in the dependabot.yml file is
version: 2 updates:
- package-ecosystem: “pip” directory: “/src” schedule:
interval: “daily” ```
While required by the file spec, the schedule
directive isn’t going to limit the run to once a day when using Dependabot in the Ci/CD pipeline (despite being set to “daily” ).
Next, you’ll need to provide Dependabot with a Personal Access Token with API scope credentials –it’s exactly the same process as with Snyk. You’ll need to go to https://gitlab.com/-/profile/personal_access_tokens
in a new tab. Give the token a name in the Token Name field, check the box to grant it API scope, and click ‘Create personal access token’. This will insert an element (that contains the token) into the page just above the section where you entered the details.
Copy this token and head back over to the project, then select Settings>CI/CD and under the Variables section click the Add variable button. Paste the token into the Value field, set the key to SETTINGS__GITLAB_ ACCESS_TOKEN and ensure that both the ‘Protect variable’ and ‘Mask variable’ boxes are checked. Then press the ‘Add variable’ button.
These checkboxes tell GitLab to redact the variable if it’s detected in CI/CD logs and to only provide it when
the CI/CD Pipeline is running on a protected branch. This is important because failing to protect this variable like this would mean that anyone who can push to your project (such as external contributors) could get a hold of your token, and use it to do whatever they wanted through the GitLab API as if they were you.
Finally, you’ll need to update the .gitlab-ci.yml. To the stages section add the line - composition-analysis and then below that you’ll need to add the .dependabotgitlab template, which is as follows: .dependabot-gitlab: stage: composition-analysis image: name: docker.io/andrcuns/dependabot-gitlab:0.4.4 entrypoint: [""] variables:
GIT_STRATEGY: none
RAILS_ENV: production SETTINGS__STANDALONE: “true” SETTINGS__GITLAB_URL: $CI_SERVER_URL only:
- main
- merge_requests before_script:
- cd /home/dependabot/app script:
- bundle exec rake “dependabot:update[$CI_ PROJECT_PATH,$PACKAGE_ MANAGER,$DIRECTORY]”
Then add the dependabot-pip job as follows: dependabot-pip: extends: .dependabot-gitlab variables: PACKAGE_MANAGER: pip DIRECTORY: /src
When you’re all done the file should look something like this: stages:
- static-analysis
- composition-analysis .dependabot-gitlab: stage: composition-analysis image: name: docker.io/andrcuns/dependabot-gitlab:0.4.4 entrypoint: [""] variables:
GIT_STRATEGY: none
RAILS_ENV: production SETTINGS__STANDALONE: “true” SETTINGS__GITLAB_URL: $CI_SERVER_URL before_script:
- cd /home/dependabot/app
script:
- bundle exec rake “dependabot:update[$CI_ PROJECT_PATH,$PACKAGE_ MANAGER,$DIRECTORY]” dependabot-pip: extends: .dependabot-gitlab variables: PACKAGE_MANAGER: pip DIRECTORY: /src . . .
(Where the ... is the static analysis jobs from the previous tutorial)
Finally, git add the .gitlab-ci.yml and the .gitlab/ dependabot.yml files, then git commit -m “Added Dependabot SCA stage” and git push the changes up GitLab.
What’s interesting to note here is that the variables in the .gitlab-ci.yml file that point to the same values as the ones in the dependabot.yml are selectors for that configuration. This means that if you wanted to extend this to support scanning a Docker Container you would need to add a directive to the dependabot.yml like in the following code:
- package-ecosystem: “docker” directory: “/” schedule:
interval: “daily”
and a selector to the gitlab-ci.yml:
dependabot-docker: extends: .dependabot-gitlab variables: PACKAGE_MANAGER: docker DIRECTORY: /
which would then ensure that your docker files are kept up to date with the latest security patches.
Dependabot or depend-on-Snyk?
Out of the box, this Dependabot implementation is going to check for dependencies with known vulnerabilities on every commit to the ‘main’ branch and any commit to a branch referenced in a Merge Request. If there’s a patch required and there isn’t an open Merge Request created by it to fix the issue, then it’ll create Merge Requests containing the minimum possible change.
It’s also possible to configure a scheduled pipeline run to trigger the Dependabot scan jobs periodically. This means that, just like Snyk, Merge Requests will still be opened to keep you up to date even if you’re not actively working on the project at the time.
No doubt as the original Dependabot gets more integrated with GitHub, this will diverge from the GitLab fork, which is likely to remain an outside project with low integration despite the maintainer now working at GitLab. This is because GitLab has been working on its own fully integrated paid solution, Gemnasium, since 2018. Unfortunately however, at the time of writing GitLab doesn’t offer a free tier of Gemnasium, so if you wanted to use it for your open-source projects then you need to build everything from source and set up a similar pipeline to the Dependabot one that’s shown in this article.
Signs of the Dependabot divergence are already starting to be visible, with the GitHub integration now
being a single click operation and the inline reports in its Pull Requests providing information about the vulnerability along with a “Compatability score”. GitHub’s version of Dependabot and the tooling being built around it is becoming a significant threat to GitLab’s claim of being the “leading integrated product for the entire DevOps lifecycle”. GitLab’s failure to offer free access to the whole stack for open source projects could cost them a lot of marketing power.
It will be interesting to see whether or not GitLab will allow the continued work on the Dependabot fork since hiring its maintainer. Will this become a viable option for Open Source projects that want to stay out of Microsoft’s ecosystem, or is it going to wither away?
DependaNOT!
While the net result is the same for both of the solutions covered in this tutorial (a new PR created automatically that upgrades the Django version) - Dependabot (both this open source GitLab version and the GitHub version) lacks the full feature set and depth of user experience offered by Snyk. Key features such as reporting, licence checking, and issue tracking that are found in Snyk (WhiteSource, and to an extent GitLab’s “Ultimate” package), are currently not available in the GitLab version of Dependabot and are still not up to a competitive level in the GitHub version.
If you’re working in the financial sector (or anywhere that handles payment card details for that matter), then it’s a no-brainer. You’re actually required to have the reporting capabilities and maintain a “vulnerability management program” in order to comply with card
payment industry standards such as PCI-DSS. So getting hold of a pre-built compliant solution will save you a lot of time and money. Snyk’s offering really makes a lot of sense when you consider the time it takes to set up, the features it provides, and their support for the Open Source community. Not to mention they have some fantastic plugins for popular IDEs to help you prevent problems from occurring early on in the development life-cycle.
However, if you’re working in a small company that doesn’t directly handle customer-centric elements such as credit cards or personally identifiable information, or are busy developing some kind of super-secret project that doesn’t need compliance reporting, then you can probably get by with something like the Dependabot solution provided in this tutorial.
Setting up dedicated accounts for bots like Snyk or Dependabot makes auditing changes easier. Because you know that the bot should only ever change specific files, it becomes easy to identify malicious activity.