One of the reasons I’m spending so much of my free time writing code (and neglecting my wife and dogs, much to their chagrin and my isolation) is that I’m trying to personalize the lessons of developing code, and developing secure code, that I preach as part of my day-to-day job.
I’ve been seeing a lot of references to “don’t trust user input”, and I’ve been trying to figure out what I’m supposed to do in managed code. What I’m really after are some code samples or some prescriptive guidelines.
Of all the resources I know of on the subject, I suspect the best guidance I’ll find is in the book 19 Deadly Sins Of Software Security: Programming Flaws and How To Fix Them (Howard, LeBlanc, Viega). I flipped through this a couple of months ago and while it seemed heavily weighted towards unmanaged code (C and C++), I seem to remember a reasonable amount of mention of managed code as well.
When I dug into the table of contents, there wasn’t any one chapter entitled “don’t trust user input”. Instead there’s titles like “Sin 1: Buffer Overruns“, “Sin 2: Format String Problems“, “Sin 3: Integer Overflows“, “Sin 4: SQL Injection“, “Sin 5: Command Injection” and “Sin 14: Improper File Access“. [I believe these are all the sins that relate to trusting user input, but I’m sure that’s hardly all the ways that trusted user input can be harmful to your code’s health!]
Sin 1: Buffer Overruns
So it looks like this is the most significant of all the Sins to consider when developing managed code. Not only does it encapsulate the kind of thinking that should be applied to other Sins, but that it’s the most prevalent issue to expect in managed code and it applies to all types of managed code applications.
While I’ve understood for years what a buffer overrun means in general, I’ve never paid too much attention to thinking through exactly how to implement protections against buffer overruns. What’s worse is, the guidance for managed code developers in this book isn’t exactly crystal-clear (at least, not to a relative novice like me):
C# enables you to perform without a net by declaring unsafe sections; however, while it provides easier interoperability with the underlying operating system and libraries written in C/C++, you can make the same mistakes you can in C/C++. If you primarily program in higher-level languages, the main action item for you is to continue to validate data passed to external libraries, or you may act as the conduit to their flaws.
So what does this mean to the managed code developer? Am I reading this right, that we should only have to worry about calls to unmanaged code, and that all managed code functions are perfectly fine as-is? Or is this also trying to say that any calls between assemblies, whether managed-managed code or managed-unmanaged code, should be equally guarded so that all passed buffers are checked?
Let’s assume for the moment that it’s the former, and that only when we’re calling into an unmanaged code (PInvoke) function do we need to worry about protecting against buffer overruns. Should we assume that every single PInvoke needs to be protected against buffer overruns, no matter what? Or should we focus instead on following external user inputs, tracing them through our code, and only put guard code in place at one or more of those chained calls, when that external input will actually intersect with a PInvoke function?
Put another way, does this advice mean we should focus on the “back end” (protecting every PInvoke), or should we focus on the “front end” (tracing external input to any PInvoke)?
I have no real appreciation for this space, and I can imagine good reasons for taking either approach. However, I also don’t relish the thought of either approach. I’d hate to have to try to trace every external input all the way through the twisty paths that it’ll often take — what a nightmare for a large codebase (what a grueling code review that’d be)! On the other hand, it seems really inefficient to have to wrap every PInvoke in some form of guard code (or worse, wrap every call to the PInvoke – thus duplicating the extra code over and over, and still leaving yourself open to overlooking one or more critical calls).
And hey — if every PInvoke should always be wrapped in anti-overrun guard code, then shouldn’t the Microsoft employee who runs PInvoke.net be aware of that, and be ensuring that such guard code is included in every PInvoke signature that’s documented on that site? Based on this reasoning, I’d have to believe that it’s not practical — or not even theoretically effective — to try to protect against buffer overruns in the PInvoke signatures.
Quick Analysis of the Rest of the “User Input” Sins
Sin 2: Format String Problems
It sounds like the only significant effect of this Sin on managed code is when reading in input from external files. The recommended “guard code” is to try to be sure you’re reading in the file you want (and not some path– or filename–spoofed substitute).
Sin 3: Integer Overflows
It sounds like the only time this is a problem in managed code is when performing calculations inside unmanaged code. If I’m reading this right, the recommended “guard code” would check that the integer values passed into the unmanaged code call are in fact integer values.
Sin 4: SQL Injection
I’m not touching any SQL databases or data access libraries, so this is irrelevant to my current investigations. If it’s relevant for you, go read everything you can on the subject — it’s a doozy.
Sin 5: Command Injection
No .NET languages are mentioned in this chapter, but I would imagine that anytime a “shell execute” type command is instantiated, this vulnerability could be present. In such cases, I would follow the same advice they give: “You can either validate everything you’re going to ship off to the external process, or you can just validate the parts that are input from untrusted sources. Either one is fine, as long as you’re thorough about it.”
Sin 14: Improper File Access
It sounds like there’s no easy “rules” to implement as guard code for this class of flaw, but rather to be hyper-vigilant anytime managed code calls System.IO.File or StreamReader methods.