Process and Not Tools Will Secure the Digital Supply Chain
As we return to work after the holiday season, I’m expecting my inbox to receive a flood of email from vendors touting they can stop supply chain compromises such as the Solarwinds hacks. I think anything outside of access management tools is likely a stretch. Detecting compromised code without prior knowledge of the compromise is likely not possible but there are other approaches that could work.
When I say detecting compromised code, I’m limiting the scope to: (i) automated detection, that can be used to (ii) evaluate software before purchase or deployment. Manual solutions that require extensive human involvement will work but aren’t scalable when you consider the hundreds or thousands of pieces of software that are in the typical enterprise environment. Detection after deployment by a monitoring platform is also not interesting because it’s after the fact, the bad code is already in your environment. The objective here is to make sure third party software is safe before it gets installed. I’m also not talking about code with actual vulnerabilities in it; that’s a different problem space solved with different capabilities.
Why Tools (alone) won’t work
If you already understand the limitations of scanning tools then skip this section.
Let’s start with why automated detection won’t work. Runtime detection, which is observing the running software, can detect malicious behaviour when that specific behaviour is known and occurs during the period of observation; for example, the application connects to a server that isn’t on a list of allowed servers (or connects to a known bad server). However, that runtime detection can be easily defeated; all the threat actor needs to do is have the malicious software delay its malicious action several days. This is one of several techniques used to evade runtime malware detection.
We could scan the code for malicious instructions and this is where we run into a fundamental computer science issue: the halting problem. You can read my overview on it but in short, there is no algorithm that can detect the set of all possible computer viruses. For the sake of this draft note, take it as a given that reliably scanning for malicious code is not a generally solvable problem. Sure, perfection is the enemy of good (enough) but any missed test case introduces a hole to drive a proverbial attack train through. If we’re going to make reliance on a code scanning tool, we want to be able to make reasonable reliance. In the practical sense, such a scanning tool would need to evaluate every line of code and every possible input (both data and environmental, chaining every single line of code that would execute before it) and determine its intent. You can easily imagine such a tool producing many false positives and negatives to the point of unreliability or being a massive time waste. The sheer number of combinations of output for any moderately sized code base is staggering large.
How to deliver secure source code?
What we don’t know at this point is how SolarWind’s code was compromised, so the following is broad spectrum advice. We know from Solarwinds' SEC filing that “the illicit modification happened in their software build system, and was not visible in their source code” (quote from Yubico’s excellent post on the breach and lessons learned). The following is broad spectrum advice on secure code management:
- If you’re writing software, then here are a few things to think about: If the threat actor just manipulated the code directly then improve Access Control - limit, at least write access, to your source code repositories. Use a pull request like mechanism so that code changes need to be specifically accepted. Every developer works off a copy of the same code base and then the specific edits they make are differentially incorporated into the main copy. Treat the user account that can approve commits into a specific code repository as a privileged account deserving of the same protections as a domain admin account.
- If the threat actor compromised a developer then defend against it using access control and Explainable Changes - every change request against code needs to be linked back to a feature improvement or bug fix and each change should be limited to implementing only that which is required. Changes need to be small; cramming a lot of new functionality into a code base in a single change should be strongly discouraged and rejected for refactoring. Any changes that can’t be linked back to a feature improvement or bug fix should be investigated by the product security team.
- If the threat actor modified existing code to introduce an exploitable vulnerability then static analysis and unit testing. This is the one scenario where scanning is beneficial. It’s also likely that such a change won’t be an explainable change: ”Hey Developer, why did you change code in file X that has nothing to do with all the other changes you made”. Unit
- If the threat actor injected malicious code into a code stored in a public package library then use internal private package libraries. Developers should not be bringing in libraries from any source. You’ll need a process around bringing in new libraries and making them part of your code base.
- If the threat actor injected malicious code into the build process so it wasn’t visible in the source code then secure your development environment using Reproducible Build Environments - Reproducible Builds are a concept from Open Source software that addresses the question of how do we know that source code we see produced the executable program we use? The central practice is to give everyone clear instructions so that they can reproduce the technical environment used to compile the code into a final working program. Using a reproducible build environment means that no matter who compiles the code they will get the exact same output (the final executable software) and if they compare against the software releases provided by others they should be identical; any difference indicates a potential problem. How does this apply to proprietary software? Use build environments where the configuration is managed by code (Chef, Saltstack, Puppet, Ansible, etc…) and then control access to the configuration scripts in the same way you control access to sensitive source code. When you compile for production, spin-up a second build environments from scratch and compare against the output from the primary build environment. If they don’t match, then you’ve got an investigation on your hands.
- If the threat actor injected malicious code into the distribution package post build then use code signing as part of your build process along with write once read many file stores for your distribution endpoint.
- If the threat actor manipulated the release package and signed it with a stolen certificate then secure your signing certificates by placing your code signing certificates in isolated systems that are kept offline until you need to sign a release. Alternatively store them on a security key.
None of the above is bullet proof but it has the benefit of shrinking the attack surface to something that is manageable and possibly achievable. You only have to monitor specific things now, not every single line of code.
On top that there’s actual secure software development practices that focus on developing code that is free (or at least less likely) to have actual vulnerabilities.
How to defend against supply chain attacks if you’re the buyer?
Ask your third parties if they do the things in the preceding section and contract with them accordingly. Then:
- If you’re a small organization with a single digit security team then limit yourself to cloud services and major software providers for on-premises. Effectively you’re transferring risk that way as you can’t afford to review and manage all the things in your environment.
- If you’re a medium size enterprise with a low double digit security team, stay true to the advice for a small team but where you need to have proprietary applications inside your environment then fully document the network data flows and isolate these third party elements with strict machine level firewall rules (you have a software defined network, right?). On end user systems restrict software installs to only authorized software; at least that way you’re not bringing in random malware and backdoors into your environment.
- If you’re a large enterprise then do all of the preceding and also look at the recommendations from Reversing Labs. I don’t agree with all of them but there are some good ones in there if you can afford it.