DevOps status report: HackOregon 2019 season

One of my colleagues on the HackOregon project this year sent around “Nice post on infrastructure as code and what getting solid infra deploys in place can unlock” https://www.honeycomb.io/blog/treading-in-haunted-graveyards/

I felt immediately compelled to respond, saying:

Provocative thinking, and we are well on our way I’d say.

I’ve been the DevOps lead for HackOregon for three years now, and more often than not delivering 80% of the infrastructure each year – the CI/CD pipeline, the automation scripts for standardizing and migrating configuration and data into the AWS layers, and the troubleshooting and white-glove onboarding of each project’s teams where they touch the AWS infrastructure.

There’s great people to work with too – on the occasions when they’ve got the bandwidth to help debug some nasty problem, or see what I’ve been too bleary-eyed to notice is getting in our way, it’s been gratifying to pair up and work these challenges through to a workable (if not always elegant) solution.

My two most important guiding principles on this project have been:

  • Get project developers productive as soon as possible – ensure they have a Continuous Deployment pipeline that gets their project into the cloud, and allows them to see that it works so they can quickly see when a future commit breaks it
  • “working > good > fast” – get something working first, make it “good” (remove the hard-coding, the quick workarounds) second, then make it automated, reusable and documented

We’re married pretty solidly to the AWS platform, and to a CloudFormation-based orchestration model.  It’s evolved (slowly) over the years, as we’ve introspected the AWS Labs EC2 reference architecture, and as I’ve pulled apart the pieces of that stack one by one and repurposed that architecture to our needs.

Getting our CloudFormation templates to a place where we can launch an entirely separate test instance of the whole stack was a huge step forward from “welp, we always gotta debug in prod”. That goal was met about a month ago, and the stack went from “mysterious and murky” to “tractably refactorable and extensible”.

Stage two was digging deep enough into the graveyard to understand how the ECS parts fit together, so that we could swap EC2 for Fargate on a container-by-container basis. That was a painful transition but ultimately paid off – we’re well on our way, and can now add containerised tasks without also having to juggle a whole lot of maintenance of the EC2 boxes that are a velocity-sapping drag on our progress.

Stage 3 has been refactoring our ECS service templates into a standardised single template used by whole families of containerised tasks, from a spray of copypasta hard-coded replicas that (a) had to be curated by hand (much like our previous years’ containerised APIs has to be maintained one at a time), and (b) buried the lede on what unique configuration was being used in each service. Any of the goofy bits you need to know ahead of deploying the next container are now obvious and all in one place, the single master.yaml.

I can’t speak for everyone, but I’ve been pretty slavish about pushing all CF changes to the repo in branches and merging when the next round of stable/working infra has been reached. There’s always room for improvement, however:

  • smaller changes are always better
  • we could afford more folks who are trained and comfortable with the complex orchestration embedded in our infrastructure-as-code
  • which would mean being able to conduct good reviews before merge-to-master
  • I’d be interested in how we can automate the validation of commit-timed-upgrades (though that would require more than a single mixed-use environment).

Next up for us are tasks like:

  • refactoring all the containers into a separate stack (out of master.yaml)
  • parameterising the domains used for ALB routing
  • separating production assets from the development/staging environment
  • separating a core infra layer from the staging vs production side-by-side assets
  • refactoring the IAM provisions in our deployment (policies and attached roles)
  • pulling in more of the coupled resources such as DNS, certs and RDS into the orchestration source-controlled code
  • monitoring and alerting for real-time application health (not just infra-delivery health)
  • deploying *versioned* assets (not just :latest which becomes hard to trace backwards) automatically and version-locking the known-good production configuration each time it stabilises
  • upgrading all the 2017 and 2018 APIs to current deployment compatibility (looking for help here!)
  • assessing orchestration tech to address gaps or limitations in our current tools (e.g. YAML vs. JSON or TOML, pre-deploy validation, CF-vs.-terraform-vs-Kubernetes)
  • better use of tagging?
  • more use of delegated IAM permissions to certain pieces of the infra?

This snapshot of where we’re at doesn’t capture the full journey of all the late nights, painful rabbit holes and miraculous epiphanies

Advertisements

Mike, what books do you suggest for getting up to speed on the Lean approach?

Recent question to me from a newbie to my Lean Coffee meetup:

Hey, I was wondering if you could suggest any books for getting up to speed on the LEAN approach? I’m reading ‘The Lean Startup’ by Eric Ries, and like it a lot. Anything that you’ve really enjoyed? Thanks 🙂

Thought I’d share he thoughts I passed to David:

Hi David, I’m not much of a prose guy – unconferences, meet-ups, lean coffee and experiential training work better for me. I’ve skimmed Lean Startup and that was interesting in parts; I’ve also read the Phoenix Project (devops focus and pretty terrible plotting and characters but a weirdly compelling outlook on incremental, value-based organisations – and it’s apparently a rewrite of the plot of The Goal which I’ve heard is also good in this vein).

Neal Peterson (who’s a regular attendee and runs a parallel Lean Coffee of the North) is our resident Lean Guru. I’d reach out to him too.

If you haven’t already, I’d strongly recommend getting involved with Agile PDX – many meet-up opportunities each month, and an invite-only (only because we haven’t automated self-signup) Slack group you’d be welcome to join.

Occupied Neurons, Santa edition: lessons for software engineering

How to Be An Insanely Successful Software Manager

https://hackernoon.com/how-to-be-an-insanely-successful-software-manager-13efe08fd890

Aside from a few dubious quotes and phrasings, I believe someone channeled my life when they wrote this.  (Is that why I smelled burning sulfur recently?)  When the goal of a software org is getting the most value into customers hands as quickly as possible, shaving down every point of friction between “User Story” and “running in production” is an obsessive mission.

No one does phrasing better than Sterling Archer

Benefit vs Cost: How to Prioritize Your Product Roadmap

https://www.productplan.com/how-to-prioritize-product-roadmap/

I’ve been a data-driven, quantitative prioritization junkie in my Product work for years. When you want to have a repeatable, defensible, consensus-able (?) way for everyone to see what are the most valuable items in your Backlog, you ought to invest in a way to estimate Business Value just as much as you need the engineering team to estimate Effort.  Makes planning and communicating much easier in the long run.

Specific methodologies are a reflection of the rigour and particular characteristics an organization derives value for its customers and shareholders.  A heavily regulated industry might use “Regulatory Compliance” as a double-weighted factor in their ‘algorithm’; an internal IT team might focus .   Many teams put emphasis on Estimated Revenue Impact and Reducing Customer Churn, and I’ve personally ensured that UX (“Expected Frustration Reduction”) has a place at the table.  Numeric scales, “high-medium-low”, “S-M-L-XL” or “Y/N” can all factor in, to whatever degree of rigour is necessary to sufficiently order and prioritize your backlog – don’t overengineer a system when half as much effort will get a useful starting place for the final “sorting negotiation” among stakeholders.

 

Introduction to ES6 Promises

http://jamesknelson.com/grokking-es6-promises-the-four-functions-you-need-to-avoid-callback-hell/

Been hearing about Promises and async/await from my engineering colleagues for ages.  Conceptually they’re a great advancement – making code more efficient, reflecting the unpredictable nature of distributed software systems, breaking the serialization bias of every new programmer yet again.

However, the truth of implementing Promises, for those who have never wrangled such code, is far more complex than I expected.  Just reading the explanations by accomplished programmers, with all their multi-layered assumptions and skipping-ahead-without-clarification, makes me feel dense in a way that doesn’t have to be true.  I’m sure if every element of the canonical use of a Promise object was explained (where it’s used, in what order, by what consumer), it would be much easier to get it to work.  I’ll keep hunting for that pedagogical example.

GraphQL vs REST Overview

https://philsturgeon.uk/api/2017/01/24/graphql-vs-rest-overview/

I’m hearing a lot of developers extoll the virtues of GraphQL in their side projects (and professional work, where they have room to advocate up).  I haven’t managed a Product shipping GraphQL services yet, so I’ve been curious what the folks already implementing these are learning.

One problem this article highlights is “deprecation” – when is it time to no longer support a field or endpoint in your API?  The latter is easy to see how many requests you’re currently receiving; the former is trickier in a REST environment, and GraphQL supports “sparse field sets”.

The question I don’t see addressed here is: does GraphQL require every request to specify every field they’re going to obtain?  Or is there also support to request all fields (cf. SELECT * from TABLE), in which case that benefit quickly vanishes?  If only some of your requests specify which fields they’re using, and the rest just demand them all, then you still don’t know whether that field you want to depreciate is up for grabs, nor which users are still using it.  You can infer some educated guesses based on the data you do have, but it’s still down to guesswork.

(Edit: I’ve concluded that fields must be explicit in GraphQL requests)

REST is the New SOAP

https://medium.com/@pakaldebonchamp/rest-is-the-new-soap-97ff6c09896d

OK OK I get it – REST is challenging in many ways when trying to deal with the reality of API behaviours.  Thank you for writing an article that outlines in specific what your problems are.  (OTOH, wouldn’t it be nice to see an article that acknowledges the problems *and* extolls the remaining virtues – see author’s own words, “I don’t doubt that some smart people out there will provide cases where REST shines”?  Or even better, talks about when to use this solution and when to use a solution that works better for a specified scenario/architecture – and not just offhandedly mention something they “heard that one time”?)

And why do I have that itchy feeling in the back of my brain that newer alternatives like GraphQL will put us in a state of complexity that we ran from the last time we did this to ourselves in the name of “giving ourselves all the tools we might ever need” (aka SOAP)?

It’s smart to select a relevant architecture for the problem space – it just makes me worry every time I watch someone put in place all sorts of “just in case” features that they have no need for now – and can’t even articulate a specific problem for which this is a great solution – but are sure there’ll be some use for in the future.  I haven’t delved deeply enough into GraphQL (obviously) but my glancing analysis made it seem much more flexible – and the last time I saw “eminently flexible” was when I saw OAuth 2 described to me as “puts all the grenades you’ll ever need in your hands and pulls the pins”.

Kitty Quinn captures my unease very well here, and quotes Allen Sherman to boot:

‘Cause they promise me miracles, magic, and hope,
But, somehow, it always turns out to be SOAP

Linus rants at the security community again – bravo

https://lkml.org/lkml/2017/11/17/767

Linus goes off on the security community who keep trying to make sweeping, under-tested, destabilizing changes to the kernel, and while his delivery leaves something to be desired, the message is welcome and apparently remains necessary.  Making radical changes that do nothing to help the system operators and users know what’s going on, or be able to control or even just report the issues, is shall we say frustrating.

keep-calm-and-burn-it-down-5

It’s this kind of flagrant power play by security mavens that irks the rest of us to homicidal degree. It punishes the user in the hopes that that user will push the pain uphill to the originator of the buggy code.

Except that no typical user (i.e. 99% of the computing end user population) even *recognises* that the problem is with the calling code (app, driver) rather than the OS (“computer”, “CPU”, “crap phone”) that is merely trained to enforce these extreme behaviours.

I find after a couple of decades in infosec land that this is motivated by the disregard security folks have for the end user victims of this whole tug-of-war, which seems so often to break down to “I’m sick of chasing software developers to convince them to fix their bugs, so instead let’s make the bug ‘obvious’ to the end users and then the users will chase down the software developers for me”.

Immediate kernel panic may have been an appropriate response decades ago when operators, programmers and users were closely tied in space and culture. It may even still be an appropriate posture for some mission-critical and highly-sensitive systems, if you favour “protection” over stability.

It is increasingly ridiculous for the user of most other systems to have any idea how to communicate with the powers that be what happened and have that turned into a fix in a viable timeframe – let alone rely on instrumented, aggregated, anonymized crash reports be fed en masse to the few vendors who know let alone have the time to request, retrieve and paw through millions of such reports looking for the few needles in haystacks.

Punish the victim and offload the *real* work of security (i.e. getting bugs fixed) to people least interested and least expert at it? Yeah, good luck.

It is entirely appropriate in an increasing number of circumstances to soften the approach and try warning the user and trusting them with a little power to make some decisions themselves (rather than arbitrarily punish them for mistakes not their own).

I love many of my colleagues in the security community dearly, and wouldn’t tell them to quit their jobs, but goddamn do we quickly forget that the options are not just “PREVENT” but also “DETECT” and “CORRECT”. I’m glad to see that Kees Cook’s followup clarifies that he’s already looking into this, and learning that such violent change to a kernel can’t be swallowed whole.

When will DevSecOps resemble DevOps?

https://www-forbes-com.cdn.ampproject.org/c/s/www.forbes.com/sites/jasonbloomberg/2017/11/20/mitigate-digital-transformation-cybersecurity-risk-with-devsecops/amp/

Another substance-free treatise on the glories of DevSecOps.

“Security is everyone’s job”, “everyone should care about security” and “we can’t just automate this job” seems to be the standard mantra, a decade on.

Which is entirely frustrating to those of us who are tired of security people pointing out the problems and then running as soon as there’s talk of the backbreaking labour of actually fixing the security issues, let alone making substantive system improvements that reduce their frequency in the future.

Hell, we even get a subheading that implies it’ll advance security goals in a CI/CD world: “The Role of Tooling in DevSecOps”. Except that there’s nothing more than a passing wave hello to Coverity (a decent static analysis vendor, but not the start nor the finish of the problem space) and more talk of people & process.

Where’s the leading thinkers on secure configuration of your containers? Where’s the automated injection of tools that can enforce good security IAM and correct for the bad?

I am very tired of chasing Lucy’s football:

lucy-football

I’m tired of going out to DevSecOps discussions at meetups and conferences and hearing nothing that sounds like they “get” DevOps.

DevOps works in service of the customers, developers and the business in helping to streamline, reduce the friction of release and make it possible to get small chances out as fast and frequently as possible.

I’ve asked at each of those discussions, “What tools and automation can you recommend that gets security integrated into the CI/CD chain?”

And I’ve heard a number of unsatisfying answers, from “security is everyone’s job and they should be considering it before their code gets committed” all the way through to “we can’t talk about tools until we get the culture right”. Which are all just tap-dancing dodges around the basic principle: the emperor has no clothes.

If DevSecOps is nothing more than “fobbing the job off on developers” and “we don’t recommend or implement any tools in the CI/CD chain”, then you have no business jumping on the DevOps bandwagon as if you’re actively participating in the process.

If you’re reliant merely on the humans (not the technology) to improve security, and further that you’re pushing the problem onto the people *least* expert in the problem space, how can you possibly expect to help the business *accelerate* their results?

Yes I get that DevOps is more than merely tools, but if you believe Gene Kim (as I’m willing to do), it’s about three principles for which tools are an essential component:

  1. Flow (reduce the friction of delivery) and systems thinking (not kicking the can down to some other poor soul)
  2. Amplify feedback loops (make it easy and obvious to learn from mistakes)
  3. Create a culture of learning from failure.

Now, which of those does your infosec approach support?

Hell, tell me I’m wrong and you’ve got a stack of tooling integrated into your DevOps pipeline. Tell me what kinds of tools/scripts/immutable infrastructure you’ve got in that stack. I will kiss your feet to find out what the rest of us are missing!

Edit: thoughts

  • Obviously I’m glossing over some basic tools everyone should be using: linters.  Not that your out-of-the-box linter is going to directly catch any significant security issues, no – but that if you don’t even have your code following good coding standards, how the hell will your senior developers have the attention and stamina to perform high-quality, rapid code reviews when they’re getting distracted by off-pattern code constructions?
  • Further, all decent linters will accept custom rules, disabled/info-only settings to existing rules – giving you the ability to converge on an accepted baseline that all developers can agree to follow, and then slowly expand the footprint of those rules as the obvious issues get taken care of in early rounds.
  • Oh, and I stumbled across the DevSecCon series, where there are likely a number of tantalizing tidbits

Edit: found one!

Here’s a CI-friendly tool: Peach API Security

  • Good news: built to integrate directly into the DevOps CI pipeline, testing the OWASP Top Ten against your API.
  • Bad news: I’d love to report something good about it, but the evaluation experience is frustratingly disjointed and incomplete.  I’m guessing they don’t have a Product Manager on the job, because there are a lot of missing pieces in the sales-evaluation-and-adoption pipeline:
    • Product Details are hosted in a PDF file (rather than online, as is customary today), linked as “How to Download” but titled “How to Purchase”
    • Most “hyperlinks” in the PDF are non-functional
    • Confusing user flow to get to additional info – “Learn More” next to “How to Download” leads to a Data Sheet, the footer includes a generic “Datasheets” link that leads to a jumbled mass over overly-whitespaced links to additional documents on everything from “competitive cheatsheets” to “(randomly-selected-)industry-specific discussion” to “list of available test modules”
    • Documents have no common look-and-feel, layout, topic flow or art/branding identity (almost as if they’re generated by individuals who have no central coordination)
    • There are no browseable/downloadable evaluation guides to explain how the product works, how to configure it, what commands to use to integrate it into the various CI pipelines, how to read the output, example scripts to parse and alert on the output – lacking this, I can’t gain confidence that this tool is ready for production usage
    • No running/interrogable sample by which to observe the live behaviour (e.g. an AWS instance running against a series of APIs, whose code is hosted in public GitHub repos)
  • I know the guys at Deja Vu are better than this – their security consulting services are awesome – so I’m mystified why Peach Tech seems the forgotten stepchild.

Edit: found another!

Neuvector is fielding a “continuous container security” commercial tool.  This article is what tipped me off about them, and it happens to mention a couple of non-commercial ideas for container security that are worth checking out as well:

Edit: and an open source tool!

Zed Attack Proxy (ZAProxy), coordinated by OWASP, and hosted on github.  Many automatable, scripted capabilities to search for security vulnerabilities in your web applications.

 

 

Occupied Neurons, November Kunst

What Trader Joes Figured Out About Work Culture That My Other Past Employers Haven’t

http://engage.guidespark.com/talent-and-culture/what-trader-joes-figured-out-about-work-culture-that-my-other-past-employers-havent/

Holy shit folks, I could study this like the Torah for the rest of my professional life.  Open every conversation with open-ended question?  “There’s 1000 ways to do it right”?  Yes yes and yes.

Saving The World From Code

https://www.theatlantic.com/technology/archive/2017/09/saving-the-world-from-code/540393/

One of the most frustrating things for me, a part-time coder, is having so much difficulty following the state of things as expressed in semi-linear Code (don’t even get me started with Functional and async). When I’m trying to piece together code fragments from multiple sources, it’s nearly impossible for me to reason the complete working model – I end up writing out a stepwise process model, or changing variable names one at a time and iterating forever until I see which piece contributes what to the whole machine.

So this piece – and the underlying theme of “software is beyond the reasoning capacity of great humans” – resonates like hell for me.

Uncle Bob and Silver Bullets

https://www.hillelwayne.com/post/uncle-bob/

There’s only so much “blame the victim” I can stand in this world, and Uncle Bob is one of the loudest offenders. Yeah we should all get better at coding, and yeah we should hold ourselves accountable when it doesn’t measure up.

But what about the interim? How’s about standing on the shoulders of giants? Or leaning on our elders? Or centralising expertise and leaving others to be good at what they’re good at?

I’m all for not being expected to master the universe before getting on with the job of getting something out into the world to learn from it. If everyone waited until they were the best at every discipline involved in the making of things…well, you can imagine how bereft the world would be.

It’s actually a good reminder to dial back the damned voice in my (and your) head telling us we’re not good enough yet. Let’s make something useful, and find out how wrong we are in someone else’s eyes, by encountering the actual evidence of feedback.

What’s the Difference between JavaScript and ECMAScript?

When Do I Know I’m Ready for Redux?

https://medium.com/dailyjs/when-do-i-know-im-ready-for-redux-f34da253c85f

One of many think-pieces about whether and when to add Redux to a React.js app, and a helpful guide for those not already steeped in the experience of doing so.

Understanding ReactJS-Component life cycle

https://medium.com/@baphemot/understanding-reactjs-component-life-cycle-823a640b3e8d

Far too abstract for a non-expert to follow – this is a documentation piece, and not even a good one at that. Re-examine this in a year, maybe it’ll make sense by then. Experts only.

Presentational and Container Components

https://medium.com/@dan_abramov/smart-and-dumb-components-7ca2f9a7c7d0

An interesting pattern to note for later, when the app I’m working on scales to the point I find myself passing props through component layers.

Optimizing React Rendering (part 1)

https://flexport.engineering/optimizing-react-rendering-part-1-9634469dca02

Optimizing?  Bah – definitely too early for optimization in my app.  Got one page working.  Let’s leave this as a breadcrumb for future Mike.

Javascript Arrow Functions for Beginners

https://codeburst.io/javascript-arrow-functions-for-beginners-926947fc0cdc

I’ll re-read this until it sinks in.  Lambda notation mystifies me, but probably I just need to implement it a hundred times or so and my brain will settle down.

WhoDidITalkTo: working ReactJS code!

You ever take a very long time to birth something small but ultimately, personally meaningful?

Me neither, but what I’m calling stage 1 of my ReactJS app is working to my liking.

WhoDidITalkTo is a personal work of love to help me remember all the wonderful encounters I have at Meetups and other such networking events.  It’s painful for me to keep forgetting the awesome conversations I’ve had with people, and have to confess I don’t remember someone who I very clearly made an impression on.  As someone with superhuman empathy, it’s crushing to see those hurt microexpressions cross their faces when they realize I’m no better than Leonard Shelby:

tumblr_m6igbdnbxr1qfola7o1_500
A little less dirty than him, usually

So I’m trying to remedy that, by giving myself a tool I can use from my phone to capture and review salient details from each new personal encounter I have at all the events I slut around to.

It’s prototype stage, and I have no dreams of monetizing this (so many startups have tried and failed to make this kind of “personal CRM lite” work and failed), and it’s a long ways from being fully functional in the field.  Still, I’m having fun seeing just how far I can stretch my rusty front end skills *and* treat this like a real Product Management project for myself.

If you’d like to peer inside my jumbled mind, this isn’t a bad place to see for yourself:
https://github.com/MikeTheCanuck/WhoDidITalkTo/projects/1

WhoDidITalkTo prototype v1

Occupied Neurons, September release

I’ve been scratching the itch of building an app for myself that solves a Job-to-be-done: when I’m networking, I want a tool to remind myself who are the weak ties in my network I’ve talked to, and what I’ve learned about them.  I want visual refreshers (photos I may have of them) and textual reminders of topics and things an otherwise-non-porous-memory would retain about people whose company I have previously enjoyed.

Using Firebase with ReactJS

In all the research I’m doing on prototyping a front end for my app, I’ve struggled to find something that’s more than “assemble every bespoke tag, class and id by hand” but less than “spend the next six months learning AngularJS”.  Focusing on the front-end to explore my user needs, I didn’t want to get stuck developing a big-ass (and probably unnecessary) back-end stack – even just adapting some well-defined pattern – so I started to explore Firebase [which is all front-end coding with a back-end data layer – to approximate it horribly].

And with a couple more explorations of the territory, I stumbled on the ReactJS “getting started” guide via the Hello World app, and finally understood how cool it is to have a pseudo-object-oriented approach to assembling the “V” in MVC.  (Who knows – for all I know, this is just vanilla ES6 now, and I’m just that far behind the times.)

Still, it is strikingly familiar in basic construction and with the promise of integrating a Firebase “backend” to give me a lightweight stack that will more than adequately perform for me as a single user, I’m finally willing to wade through the React Tutorial and see if that’s enough for me to piece together a working prototype

Props vs State in React

This is one of the more striking subtleties of React – how similar props and state are, and how it appears [at least to me] that the distinction is more a convention for others to understand how to use your React code, than anything that is required by the React compiler.

 

And on the Product Side of my mental tesseract…

I’ve also been refreshing my knowledge of the Product Management practices I haven’t had an opportunity to practice lately.  Amongst which:

How does a Product Manager perform competitive analysis?

This is the clearest-eyed explanation I’ve seen yet about “understanding your competition”.  I’ve worked with too many Product Marketing folks who get spun up about the checklist war, and making sure that we have feature parity in the product, and it’s always seemed like a lot of sound and fury, signifying nothing.

Focusing on “what problems does the competition solve for *YOU* dear customer, and why are those important to your core business?” is a whole lot more genuine *and* believable to me.  I’ve never thought of this line of questioning as “competitive analysis”, just part of doing my job to suss out what I can do to help my customers.

The Equifax breach – reckless endangerment of the US citizenry

UN-fucking-believable. I was hoping that this would turn out to be a situation where at the very least, Equifax had built defense-in-depth measures to limit the amount or type of information someone *could* get if an attacker exploited one of the innumerable vulnerabilities that exist on every modern software platform.

Nope – pretty much EVERY piece of sensitive personal data they have on more than half the US adult population was exposed as a result of this attack. Everything that any reasonable check of your identity or financial fitness would use to verify someone is you. Pretty nearly all the info a malicious individual would use to impersonate you, to obtain loans in your name, or file a tax return to get a refund, or screw with you life in many other highly-damaging ways.

Some choice quotes from https://arstechnica.com/information-technology/2017/09/why-the-equifax-breach-is-very-possibly-the-worst-leak-of-personal-info-ever/:

By providing full names, Social Security numbers, birth dates, addresses, and, in some cases, driver license numbers, it provided most of the information banks, insurance companies, and other businesses use to confirm consumers are who they claim to be.

That means well more than half of all US residents who rely the most on bank loans and credit cards are now at a significantly higher risk of fraud and will remain so for years to come.

Meanwhile, in the hours immediately following the breach disclosure, the main Equifax website was displaying debug codes, which for security reasons, is something that should never happen on any production server, especially one that is a server or two away from so much sensitive data. A mistake this serious does little to instill confidence company engineers have hardened the site against future devastating attacks [editorializing:…or even that the company’s engineers have half a clue what they can do to prevent the rest of the US’ personal data from leaking – if there’s even any left in their databases left to find].

The management and executives of this company should not only resign, but be brought on charges of criminal, reckless negligence on behalf of all Americans. They (along with the other two credit reporting agencies, and dozens of grey-market data hoarders) are stewards and power brokers over our lives, central/single points of failure in an economy that is nearly all digital, and which so fragily transacts on such thin premises of trust and explicit, positive assertions of identity.

We should not only be scared of how terribly their negligence endangers our lives for the rest of our lives, but be rationally and irrationally angry that the lobbyists and oligarchs have set up a system where these careless morons can and will walk away with a slap on the wrists, a cost-of-doing-business fine and strictures, for foreseeably ruining millions of lives and livelihoods.

What to do

I froze my credit after one of the big health insurer breaches a while back, and so far my life hasn’t been significantly inconvenienced – but the very fact that we each are forced to opt in to this measure, and insult-to-injury forced to pay for the privilege of preventing something none of us asked for, is just downright Mafia tactics.

You should probably freeze your credit too ASAP, because even if you weren’t affected this time, inevitably you were in the past or will be in the future. This brittle negligence and lack of accountability is what the US economy runs on

ImportError: No module named ‘rest_framework_swagger’

Summary

Building our Django app locally (i.e. no Docker container wrapping it) works great. Building the same app in Docker fails. Hint: make sure you know which requirements.txt file you’re using to build the app.  (And get familiar with the -f parameter for Docker commands.)

Problem

When I first started build the Docker container, I was getting the ImportError error after the container successfully builds:

ImportError: No module named 'rest_framework_swagger'

Research

The only half-useful hit on StackOverflow was this one, and it didn’t seem like it explicitly addressed my issue in Docker:

http://stackoverflow.com/questions/27369314/django-rest-framework-swagger-ui-importerror-no-module-named-rest-framework

…And The Lightning Bolt Struck

However, with enough time and desperation I finally understood that that article wasn’t wrong either.  I wasn’t using the /requirements.txt that contained all the dependencies – I was using the incomplete/abandoned /budget_proj/requirements.txt file, which lacked a key dependency.

Aside

I wasn’t watching the results of pip install closely enough – and when running Docker-compose up --build multiple times, the layer of interest won’t rebuild if there’s no changes to that layer’s inputs. (Plus this is a case where there’s no error message thrown, just one or two fewer pip installs – and who notices that until they’ve spent the better part of two days on the problem?)

Detailed Diagnostics

If you look closely at our project from that time, you’ll notice there are actually two copies of requirements.txt – one at the repo root and one in the /budget_proj/ folder.

Developers who are just testing Django locally will simply launch pip install -r requirements.txt from the root directory of their clone of the repo.  This is fine and good.  This is the result of the pip install -r requirements.txt when using the expected file:

$ pip install -r requirements.txt 
Collecting appdirs==1.4.0 (from -r requirements.txt (line 1))
 Using cached appdirs-1.4.0-py2.py3-none-any.whl
Collecting Django==1.10.5 (from -r requirements.txt (line 2))
 Using cached Django-1.10.5-py2.py3-none-any.whl
Collecting django-filter==1.0.1 (from -r requirements.txt (line 3))
 Using cached django_filter-1.0.1-py2.py3-none-any.whl
Collecting django-rest-swagger==2.1.1 (from -r requirements.txt (line 4))
 Using cached django_rest_swagger-2.1.1-py2.py3-none-any.whl
Collecting djangorestframework==3.5.4 (from -r requirements.txt (line 5))
 Using cached djangorestframework-3.5.4-py2.py3-none-any.whl
Requirement already satisfied: packaging==16.8 in ./budget_venv/lib/python3.5/site-packages (from -r requirements.txt (line 6))
Collecting psycopg2==2.7 (from -r requirements.txt (line 7))
 Using cached psycopg2-2.7-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting pyparsing==2.1.10 (from -r requirements.txt (line 8))
 Using cached pyparsing-2.1.10-py2.py3-none-any.whl
Collecting requests==2.13.0 (from -r requirements.txt (line 9))
 Using cached requests-2.13.0-py2.py3-none-any.whl
Requirement already satisfied: six==1.10.0 in ./budget_venv/lib/python3.5/site-packages (from -r requirements.txt (line 10))
Collecting gunicorn (from -r requirements.txt (line 12))
 Using cached gunicorn-19.7.0-py2.py3-none-any.whl
Collecting openapi-codec>=1.2.1 (from django-rest-swagger==2.1.1->-r requirements.txt (line 4))
Collecting coreapi>=2.1.1 (from django-rest-swagger==2.1.1->-r requirements.txt (line 4))
Collecting simplejson (from django-rest-swagger==2.1.1->-r requirements.txt (line 4))
 Using cached simplejson-3.10.0-cp35-cp35m-macosx_10_11_x86_64.whl
Collecting uritemplate (from coreapi>=2.1.1->django-rest-swagger==2.1.1->-r requirements.txt (line 4))
 Using cached uritemplate-3.0.0-py2.py3-none-any.whl
Collecting coreschema (from coreapi>=2.1.1->django-rest-swagger==2.1.1->-r requirements.txt (line 4))
Collecting itypes (from coreapi>=2.1.1->django-rest-swagger==2.1.1->-r requirements.txt (line 4))
Collecting jinja2 (from coreschema->coreapi>=2.1.1->django-rest-swagger==2.1.1->-r requirements.txt (line 4))
 Using cached Jinja2-2.9.5-py2.py3-none-any.whl
Collecting MarkupSafe>=0.23 (from jinja2->coreschema->coreapi>=2.1.1->django-rest-swagger==2.1.1->-r requirements.txt (line 4))
Installing collected packages: appdirs, Django, django-filter, uritemplate, requests, MarkupSafe, jinja2, coreschema, itypes, coreapi, openapi-codec, simplejson, djangorestframework, django-rest-swagger, psycopg2, pyparsing, gunicorn
 Found existing installation: appdirs 1.4.3
 Uninstalling appdirs-1.4.3:
 Successfully uninstalled appdirs-1.4.3
 Found existing installation: pyparsing 2.2.0
 Uninstalling pyparsing-2.2.0:
 Successfully uninstalled pyparsing-2.2.0
Successfully installed Django-1.10.5 MarkupSafe-1.0 appdirs-1.4.0 coreapi-2.3.0 coreschema-0.0.4 django-filter-1.0.1 django-rest-swagger-2.1.1 djangorestframework-3.5.4 gunicorn-19.7.0 itypes-1.1.0 jinja2-2.9.5 openapi-codec-1.3.1 psycopg2-2.7 pyparsing-2.1.10 requests-2.13.0 simplejson-3.10.0 uritemplate-3.0.0

However, because our Django application (and the related Docker files) is contained in a subdirectory off the repo root (i.e. in the /budget_proj/ folder) – and because I was an idiot at the time and didn’t know about the -f parameter for docker-compose , so I was convinced I had to run docker-compose from the same directory as docker-compose.yml – docker-compose didn’t have access to files in the parent directory of wherever it was launched.  Apparently Docker effectively “chroots” its commands so it doesn’t have access to ../bin/requirements.txt for example.

So when docker-compose launched pip install -r requirements.txt, it could only access this one and gives us this result instead:

Step 12/12 : WORKDIR /code
 ---> 8626fa515a0a
Removing intermediate container 05badf699f66
Successfully built 8626fa515a0a
Recreating budgetproj_budget-service_1
Attaching to budgetproj_budget-service_1
web_1 | Running docker-entrypoint.sh...
web_1 | [2017-03-16 00:31:34 +0000] [5] [INFO] Starting gunicorn 19.7.0
web_1 | [2017-03-16 00:31:34 +0000] [5] [INFO] Listening at: http://0.0.0.0:8000 (5)
web_1 | [2017-03-16 00:31:34 +0000] [5] [INFO] Using worker: sync
web_1 | [2017-03-16 00:31:34 +0000] [8] [INFO] Booting worker with pid: 8
web_1 | [2017-03-16 00:31:35 +0000] [8] [ERROR] Exception in worker process
web_1 | Traceback (most recent call last):
web_1 | File "/usr/local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 578, in spawn_worker
web_1 | worker.init_process()
web_1 | File "/usr/local/lib/python3.5/site-packages/gunicorn/workers/base.py", line 126, in init_process
web_1 | self.load_wsgi()
web_1 | File "/usr/local/lib/python3.5/site-packages/gunicorn/workers/base.py", line 135, in load_wsgi
web_1 | self.wsgi = self.app.wsgi()
web_1 | File "/usr/local/lib/python3.5/site-packages/gunicorn/app/base.py", line 67, in wsgi
web_1 | self.callable = self.load()
web_1 | File "/usr/local/lib/python3.5/site-packages/gunicorn/app/wsgiapp.py", line 65, in load
web_1 | return self.load_wsgiapp()
web_1 | File "/usr/local/lib/python3.5/site-packages/gunicorn/app/wsgiapp.py", line 52, in load_wsgiapp
web_1 | return util.import_app(self.app_uri)
web_1 | File "/usr/local/lib/python3.5/site-packages/gunicorn/util.py", line 376, in import_app
web_1 | __import__(module)
web_1 | File "/code/budget_proj/wsgi.py", line 16, in <module>
web_1 | application = get_wsgi_application()
web_1 | File "/usr/local/lib/python3.5/site-packages/django/core/wsgi.py", line 13, in get_wsgi_application
web_1 | django.setup(set_prefix=False)
web_1 | File "/usr/local/lib/python3.5/site-packages/django/__init__.py", line 27, in setup
web_1 | apps.populate(settings.INSTALLED_APPS)
web_1 | File "/usr/local/lib/python3.5/site-packages/django/apps/registry.py", line 85, in populate
web_1 | app_config = AppConfig.create(entry)
web_1 | File "/usr/local/lib/python3.5/site-packages/django/apps/config.py", line 90, in create
web_1 | module = import_module(entry)
web_1 | File "/usr/local/lib/python3.5/importlib/__init__.py", line 126, in import_module
web_1 | return _bootstrap._gcd_import(name[level:], package, level)
web_1 | ImportError: No module named 'rest_framework_swagger'
web_1 | [2017-03-16 00:31:35 +0000] [8] [INFO] Worker exiting (pid: 8)
web_1 | [2017-03-16 00:31:35 +0000] [5] [INFO] Shutting down: Master
web_1 | [2017-03-16 00:31:35 +0000] [5] [INFO] Reason: Worker failed to boot.
budgetproj_web_1 exited with code 3

Coda

It has been pointed out that not only is it redundant for the project to have two requirements.txt files (I agree, and when we find the poor soul who inadvertently added the second file, they’ll be sacked…from our volunteer project ;)…

…but also that if we’re encapsulating our project’s core application in a subdirectory (called budget_proj), then logically that is where the “legit” requirements.txt file belongs – not at the project’s root, just because that’s where you normally find requirements.txt in a repo.