AWS Tutorial wrangling, effort 2: HelloWorld via CodeDeploy

I’m taking a crash course in DevOps this winter, and our instructor assigned us a trivial task: get Hello World running in S3.

I found this tutorial (tantalizingly named “Deploy a Hello World Application with AWS CodeDeploy (Windows Server)”), figured it looked close enough (and if not, it’d keep me limber and help me narrow in on what I *do* need) so I foolishly dove right in.

TL;DR I found the tutorial damnably short on explicit clarity – lots of references to other tutorials, and plenty of incomplete or vague instructions, it seems this was designed by someone who’s already overly-familiar with AWS and didn’t realize the kinds of ambiguities they’d left behind.

I got myself all the way to Step 3 and was faced with this error – little did I know this was just the first of many IAM mysteries to solve:

Mac4Mike:aws mike$ aws iam attach-role-policy --role-name CodeDeployServiceRole --policy-arn arn:aws:iam::aws:policy:/service-role/AWSCodeDeployRole
An error occurred (AccessDenied) when calling the AttachRolePolicy operation: User: arn:aws:iam::720781686731:user/Mike is not authorized to perform: iam:AttachRolePolicy on resource: role CodeDeployServiceRole

Back the truck up, Mike

Hold on, what steps preceded this stumble?

CodeDeploy – Getting Started

Well, first I pursued the CodeDeploy Getting Started path:

  • Provision an IAM user
    • It wasn’t clear from the referring tutorials, so I learned by trial and error to create a user with CLI permissions (Access/Secret key authN), not Console permissions
  • Assigning them the specified policies (to enable CodeDeploy and CloudFormation)
  • Creating the specifiedCodeDeployServiceRole service role
    • The actual problem arose here, where I ran the command as specified in the guide

I tried this with different combinations of user context (Mike, who has all access to EC2, and Mike-CodeDeploy-cli, who has all the policies assigned in Step 2 of Getting Started) AND the –policy-arn parameter (both the Getting Started string and the one dumped out by the aws iam create-role command (arn:aws:iam::720781686731:role/CodeDeployServiceRole)).

And literally, searching on this error and variants of it, there appear to be no other people who’ve ever written about encountering this.  THAT’s a new one on me.  I’m not usually a trailblazer (even of the “how did he fuck this up *that* badly?” kind of trailblazing…)

OK, so then forget it – if the CLI + tutorial can’t be conquered, let’s try the Console-based tutorial steps.   [Note: in both places, they state it’s important that you “Make sure you are signed in to the AWS Management Console with the same account information you used in Getting Started.”  Why?  And what “account information” do they mean – the user with which you’re logged into the web console, or the user credentials you provisioned?]

I was able to edit the just-created CodeDeployServiceRole and confirm all the configurations they specified *except* where they state (in step 4), “On the Select Role Type page, with AWS Service Roles selected, next to AWS CodeDeploy, choose Select.”  Not sure what that means (which should’ve pulled me in the direction of “delete and recreate this role”), but I tried it out as-is anyway.  [The only change I had to make so far was to attach AWSCodeDeployRole.]

Reading up on the AttachRolePolicy action, it appears that –policy-arn refers to the permissions you wish to attach to the targeted role, and –role-name refers to the role getting additional permissions.  That would mean I’m definitely meant to attach “arn:aws:iam::aws:policy/service-role/AWSCodeDeployRole” policy to CodeDeployServiceRole.  (Still doesn’t explain why I lack the AttachRolePolicy permission in either of the IAM Users I’ve defined, nor how to add that permission.)

Instead, with no help from any online docs or discussions, I discovered that it’s possible to assign the individual permission by starting with this interface:

  • Click Create Policy
  • Select Policy Generator
  • AWS Service: Select AWS Identity and Access Management
  • Actions: Attach Role Policy
  • ARN: I tried constructing two policies with (arn:aws:iam::aws:policy/service-role/AWSCodeDeployRole) and (arn:aws:iam::720781686731:role/CodeDeployServiceRole)
    • The first returned “AccessDenied” and with the second, the command I’m fighting with returned “InvalidInput”

Then I went to the Users console:

  • select the user of interest (Mike-CodeDeploy-cli in my journey)
  • click Add Permissions
  • select “Attach existing policies directly”
  • select the new Policy I just created (where type = Customer managed, so it’s relatively easy to spot among all the other “AWS managed” policies)

As I mentioned, the second construction returned this error to the command:

An error occurred (InvalidInput) when calling the AttachRolePolicy operation: ARN arn:aws:iam::720781686731:role/CodeDeployServiceRole is not valid.

Nope, wait, dammit – the ARN is the *policy*, not the *object* to which the policy grants permission…

Here’s where I started ranting to myself…

Tried it a couple more times, still getting AccessDenied, so screw it.  [At this point I conclude AWS IAS is an immature dog’s breakfast – even with an explicit map you still end up turned in knots.]

So I just went to the Role CodeDeployServiceRole and attached both policies (I’m pretty sure I only need to attach the policy AWSCodeDeployRole but I’m adding the custom AttachRolePolicy-CodeDeployRole because f it I just need to get through this trivial exercise).

[Would it kill the folks at AWS to draw a friggin picture of how all these capabilities with their overlapping terminology are related?  Cause I don’t know about you, but I am at the end of my rope trying to keep these friggin things straight.  Instead, they have a superfluous set of fragmented documented and tutorials, which it’s clear they’ve never usability tested end-to-end, and for which they assume way too much existing knowledge & context.]

I completed the rest of the InstanceProfile creation steps (though I had to create a second one near the end, because the console complained I was trying to create one that already existed).

CodeDeploy – create a Windows instance the “easy” way

Then of course we’re on to the fun of creating a Windows Instance in AWS.  Brave as I am, I tried it with CloudFormation.

I grabbed the CLI command and substituted the following two Parameter values in the command:

  • –template-url:

    ttp:// (for the us-west-2 region I am closest to)

  • Parameter-Key=KeyPairName: MBP-2009 (for the .pem file I created a while back for use in SSH-managing all my AWS operations)

The first time I ran the command it complained:

You must specify a region. You can also configure your region by running "aws configure".

So I re-ran aws configure and filled in “us-west-2” when it prompted for “Default region name”.

Second time around, it spat out:

    "StackId": "arn:aws:cloudformation:us-west-2:720781686731:stack/CodeDeployDemoStack/d1c817d0-d93e-11e6-8ee1-503f20f2ade6"

They tell us not to proceed until this command reports “CREATE_COMPLETED”, but wow does it take a while to stop reporting “None”:

aws cloudformation describe-stacks --stack-name CodeDeployDemoStack --query "Stacks[0].StackStats" --output text

When I went looking at the cloudformation console (blame it on lack of patience), it reported my instance(s)’ status was “ROLLBACK_COMPLETE”.  Now, I’m no AWS expert, but that doesn’t sound like a successful install to me.  I headed to the details, and of course something else went horribly wrong:

  • CREATE_FAILED – AWS::EC2::Instance – API: ec2:RunInstances Not authorized for images: [ami-7f634e4f]

CodeDeploy – create a Windows instance the “hard” way

So let’s forget the “easy” path of CloudFormation.  Try the old-fashioned way of creating a Windows instance, and see if I can make it through this:

  • Deciding among Windows server AMI’s is a real blast – over 600 of them!
  • I narrowed it down to the “Windows_Server-2016-English-Full-Base-2016.12.24”
    • Nano is only available to Windows Assurance customers
    • Enterprise is way more than I’d need to serve a web page
    • Full gives you the Windows GUI to manage the server, whereas Core only includes the PowerShell (and may not even allow RDP access)
    • I wanted to see what the Manage Server GUI looks like these days, otherwise I probably would’ve tried Core
    • Note: there were three AMI all prefixed “Windows_Server-2016-English-Full-Base”, I just chose the one with the latest date suffix (assuming it’s slightly more up-to-date with patches)
  • I used the EC2 console to get the Windows password, then installed the Microsoft Remote Desktop client for Mac to enable me to interactively log in to the instance

Next is configuring the S3 bucket:

  • There is some awfully confusing and incomplete documentation here
  • There are apparently two policies to be configured, with helpful sample policies, but it’s unclear where to go to attach them, or what steps to take to make sure this occurs
  • It’s like the author has already done this a hundred times and knows all the steps by heart, but has forgotten that as part of a tutorial, the intended audience are people like me who have little or no familiarity with the byzantine interfaces of AWS to figure out where to attach these policies [or any of the other hundred steps I’ve been through over the last few weeks]
  • I *think* I found where to attach the first policy (giving permission to the Amazon S3 Bucket) – I attached this policy template (substituting both the AWS account ID [111122223333] and bucket name [codedeploydemobucket] for the ones I’m using):
    { "Statement": [ { "Action": ["s3:PutObject"], "Effect": "Allow", "Resource": "arn:aws:s3:::codedeploydemobucket/*", "Principal": { "AWS": [ "111122223333" ] } } ] }
  • I also decided to attach the second recommended policy to the same bucket as another bucket policy:
    { "Statement": [ { "Action": ["s3:Get*", "s3:List*"], "Effect": "Allow", "Resource": "arn:aws:s3:::codedeploydemobucket/*", "Principal": { "AWS": [ "arn:aws:iam::80398EXAMPLE:role/CodeDeployDemo" ] } } ] }
  • Where did I finally attach them?  I went to the S3 console, clicked on the bucket I’m going to use (called “hacku-devops-testing”), selected the Properties button, expanded the Permissions section, and clicked the Add bucket policy button the first time.  The second time, since it would only allow me to edit the bucket policy, I tried Add more permissions – but that don’t work, so I tried editing the damned bucket policy by hand and appending the second policy as another item in the Statement dictionary – after a couple of tries, I found a combination that the AWS bucket policy editor would accept, so I’m praying this is the intended combination that will all this seductive tutorial to complete:
     "Version": "2008-10-17",
     "Statement": [
     "Effect": "Allow",
     "Principal": {
     "AWS": "arn:aws:iam::720781686731:root"
     "Action": "s3:PutObject",
     "Resource": "arn:aws:s3:::hacku-devops-testing/*"
     "Action": [
     "Effect": "Allow",
     "Resource": "arn:aws:s3:::hacku-devops-testing/*",
     "Principal": {
     "AWS": [

CodeDeploy – actually deploying code

I followed the remaining commands (closely – gotta watch every parameter and fill in the correct details, ugh).  But thankfully this was the trivial part.  [I guess they got the name “CodeDeploy” right – it’s far more attractive than “CodeDeployOnceYouFoundTheLostArkOfTheCovenantToDecipherIAMIntricacies”.]


Success!  Browsing to the public DNS of the EC2 instance showed me the Hello World page I’ve been trying to muster for the past three days!


This tutorial works as a demonstration of how to marshall a number of contributing parts of the AWS stack: CodeDeploy (whose “ease of deployment” benefits I can’t yet appreciate, considering how labourious and incomplete/error-prone this tutorial was), IAM (users, roles, groups, policies), S3, EC2 and AMI.

However, as a gentle introduction to a quick way to get some static HTML on an AWS endpoint, this is a terrible failure.  I attacked this over a span of three days.  Most of my challenges were in deciphering the mysteries of IAM across the various layers of AWS.

In a previous life I was a security infrastructure consultant, employed by Microsoft to help decipher and troubleshoot complex interoperable security infrastructures.  I prided myself on being able to take this down to the lowest levels and figure out *exactly* what’s going wrong.  And while I was able to find *a* working pathway through this maze, my experience here and my previous expertise tells me that AWS has a long way to go to make it easy for AWS customers to marshall all their resources for secure-by-default application deployments.  [Hell, I didn’t even try to enhance the default policies I encountered to limit the scope of what remote endpoints or roles would have access to the HTTP and SSH endpoints on my EC2 instance.  Maybe that’s a lesson for next time?]


Simplifying Vagrant-based testing: unsolved (I’m just calling it out to the universe)

I’m doing some pretty mind-numbing testing using Vagrant (yes, on Windows 10 – I like the challenge, apparently!), to make sure that I’m getting the results from changes I’m making to Ansible scripts.  Currently I’m testing the implementation of Ansible Vault, which means at each step of testing I:

  1. Vagrant destroy whatever box I just worked on
    • Which half the time means Vagrant and Virtualbox get out of sync, and I need to delete files and just vagrant init)
  2. Vagrant up
    • If I just init’d a new box, then I have to go into the Vagrantfile to uncomment then edit the config.vm.synced_folder setting, so that it removes the rsync dependency (setting it to config.vm.synced_folder “.”, “/vagrant”, disabled:true) – otherwise, vagrant up halts when it can’t find an rsync executable
  3. Mount the VM in Virtualbox Manager – Machine, Add…, find the .vbox file), then  launch the VM from VBox Mgr, login as vagrant, and edit the /etc/ssh/sshd_config file to set all instances of PasswordAuthentication to “yes”
  4. Reboot the VM
  5. Vagrant up
  6. Run ssh-keygen -f “/home/mike/.ssh/known_hosts” -R []:2222 to clear out the previously-trusted host SSH key
  7. Run ssh-copy-id vagrant@ -p 2222 to add my user’s SSH public key to the remote system (to enable Ansible to run over SSH)

I haven’t had time yet to start researching how to troubleshoot/automate each of these steps, but which I’ll eventually have to conquer so that I’m not re-learning the manual steps every time I return to volunteering a little spare time to this infrastructure project.

Why doesn’t chmod under Bash on Ubuntu on Windows 10 actually “take”?

I’m continuing to beat my head against a wall, attempting to test a very simple configuration change to an Ansible playbook I wrote, so that I can verify if my understanding of the use of Ansible vault is correct.

The latest problem?  Unix permissions.

Now that I’ve got SSH communications working between by Bash shell (Ubuntu on Windows 10, aka WSL), I’ve implemented changes to the playbook’s files including creating a .vault_pass.txt file under the Bash shell, and encrypting a vault.yml file using the password contained in the .vault_pass.txt.

When I run ansible-playbook role.yml –vault-password-file .vault_pass.txt, it complains of the following:

mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ ansible-playbook role.yml --vault-password-file .vault_pass.txt
ERROR! Problem running vault password script / m n t / c / U s e r s / M i k e / c o d e / C o p y - a n s i b l e - r o l e - u n a t t e n d e d - u p g r a d e s / . v a u l t _ p a s s . t x t ([Errno 8] Exec format error). If this is not a script, remove the executable bit from the file.

No problem, I’ve got this.  Just gotta run chmod 600 (or similarly, to remove the execute bit for my user) on the .vault_pass.txt file.  [For comparison, I just tried this on the same configuration under Ubuntu – which is having a different blocking issue at present, but not related to file permissions – and the command took immedateily.]  Hah, you should be so lucky:

mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ ls -la .vault_pass.txt
-rwxrwxrwx 1 root root 25 Sep 26 18:38 .vault_pass.txt
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ chmod 600 .vault_pass.txt
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ ls -la .vault_pass.txt
-rwxrwxrwx 1 root root 25 Sep 26 18:38 .vault_pass.txt
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ whoami
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ sudo chmod 600 .vault_pass.txt
[sudo] password for mike:
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ ls -la .vault_pass.txt
-rwxrwxrwx 1 root root 25 Sep 26 18:38 .vault_pass.txt

Yes, I get that the file is owned by root, and I’m running as mike – so why doesn’t it make a difference when I run sudo chmod?  Is this a problem with files owned by root?  Is this a problem with chmod?  Is this a problem with WSL/Bash?

Lightbulb moment

I went hunting for such issues in the Microsoft repo for the Bash On Windows project, and found this issue & comment:

So I figured I re-examine the situation.  All my files under the ~/code folder are owned by root – even . and .., which is odd…

mike@MIKE-WIN10-SSD:~/code$ ls -la
total 68
drwxrwxrwx 2 root root 0 Sep 26 10:51 .
drwxrwxrwx 2 root root 0 Aug 16 17:00 ..
drwxrwxrwx 2 root root 0 Aug 16 16:28 ansible-role-unattended-upgrades

Then I looked at my home folder and – d’oh!

mike@MIKE-WIN10-SSD:~$ ls -la
total 24
drwxr-xr-x 2 mike mike 0 Sep 26 18:37 .
drwxr-xr-x 2 root root 0 Dec 31 1969 ..
-rw------- 1 mike mike 2452 Aug 16 22:48 .bash_history
-rw-r--r-- 1 mike mike 220 Aug 5 10:06 .bash_logout
-rw-r--r-- 1 mike mike 3637 Aug 5 10:06 .bashrc
lrwxrwxrwx 1 mike mike 22 Aug 16 12:58 code -> /mnt/c/Users/Mike/code


Now I remember: when I first sat down with this Bash On Ubuntu on Windows setup, I figured I’d save myself some trouble by using the exact same files in all my local repos – why bother duplicating the repos between Windows and Bash on Ubuntu?  So I symlinked a mount of the /code folder from my Windows user profile…and left myself a nice little landmine, it seems.

Rather than struggle with cacls.exe and try to find some magic combination that results in non-executable permissions on that file through the WSL translation layer (if at all), I just cloned the repo to a different folder (local to the Bash/Ubuntu/Win10 environment) and retried, with trivial success.

Troubleshooting another SSH blocker (networking?) in debian/jessie64

Since I ran into another wall with trying to use Ansible Vault under Bash on Ubuntu on Windows10 (this time, chmod wouldn’t change the permissions on the .vault_pass.txt file from 755 to 600 – or any other permissions set for that matter), I went back to my Linux-based setup to try out the Ansible Vault solution I’d devised.

Here, I ended up unable to communicate with the VM using Ansible because SSH from Ubuntu to the Debian8 box had an incompatibility – to wit, when I ran ssh vagrant@ -p 2222, the command eventually timed out with the error “ssh_exchange_identification: read: Connection reset by peer”.

This is yet another piece of evidence that someone very recently (I believe between the 8.5.2 and the 8.6.0 versions of the box on Atlas) made breaking changes to the OpenSSH and/or OpenSSL configuration of the box.  One change I’ve figured out is they disabled PasswordAuthentication in the /etc/ssh/sshd_config file.

This problem?  Looks like (based on my read of articles like this one) the ssh client and server can’t agree on some cryptographic parameter.  Fun.  Cause there’s only about a million combinations of these parameters to play with.

[I also pursued ideas like the solution to this report, but currently the Debian8 box’s /etc/hosts.deny is still empty of uncommented entries.  Or the “is sshd running” idea from this report, but /var/log/auth.log definitely includes “[date] jessie sshd[366]: Server listening on port 22”.]

OK, so what’s the fastest way to isolate the set of parameters  that are being offered and demanded between the client and server?

Running the ssh client with -vvv parameter doesn’t help much – it enumerates the “key_load_public” attempts (rsa, rsa-cert, dsa, dsa-cert, ecdsa, ecdsa-cert, ed25519, ed25519-cert), then “Enabling compatibility mode for protocol 2.0” and the SSH version “Local version string SSH-2.0-OpenSSH_7.2ps Ubuntu-4ubuntu2.1”, then fires off the “connection reset by peer” error again.  Dpkg -l reports that openssh-client is “1:7.2ps-4ubuntu2.1”.

What’s the server’s version of OpenSSH?  According to dpkg -l, it’s “1;6.7p1-5+deb8u3.  Is that right – 1.6.7?  And if so, how do I find out if there’s a cryptographic configuration incompatibility between 1.7.2 and 1.6.7?  [Certainly I can see that we have no such “connection reset by peer” issue between my Win10 Bash on Ubuntu shell, running 1.6.6p1 of openssh-client and the Debian8 box’s 1.6.7p1, so cryptographic compatibility between 1.6.6 and 1.6.7 is a reasonable assumption.]  Or better, is it possible to upgrade the Debian8 box’s openssh-server to something later than 1.6.7 – preferably (but not exclusively) 1.7.2?

On the server, I can crawl through the /etc/ssh/sshd_config” file to look for configured parameters (RSAAuthentication yes for example), but that doesn’t tell me what the OpenSSH defaults are, and doesn’t tell me what’s necessarily being asked of OpenSSL either (which might be swallowing the actual error).

Aside/Weirdness: networking

I started to pursue the idea of upgrading OpenSSH, so I ran sudo apt-get update to prepare for updating everything in the VM.  That’s when I noticed I wasn’t getting any network connectivity, as it spat back “Could not resolve ‘'” and “Could not resolve ‘'”.

Vbox Mgr indicates I’m using NAT networking (the default), which has worked for me in the past – and works fine for the same Vagrant box running on my Win10 VirtualBox/Vagrant instance (sudo apt-get update “Fetched 529 kB in 3s (142kB/s)”).  Further, the Ubuntu host for this VM has no problem reaching the network.

So I tried changing to Bridged Adapter in Vbox Manager.  Nope, no difference.  Why does the same Vagrant box work fine under Windows but not under Ubuntu?  Am I cursed?

Back to the root problem

Let me review: I’m having a problem getting Ansible to communicate with the VM over SSH.  So let’s get creative:

  • Can Ansible be coerced into talking to the target without SSH?
  • Can Ansible use password authentication instead of public key authentication for SSH?
  • Can the Ubuntu client be downgraded from 1.7.2 to 1.6.7 openssh-client?

Lightbulb moment

Of course!  The “connection reset by peer” issue isn’t a matter of deep crypto at all – unless I’m misreading this, the fact that the Ubuntu SSH client takes nearly a minute to return the “connection reset” error and the fact that the Debian VM doesn’t seem to have any IP networking ability off the host…adds up to SSH client not even connecting to the VM’s sshd?

Boy do I feel dumb.  This has nothing to do with crypto – it’s simple layer 3 issues.

Reminds me of a lesson I learned 20 years ago, and seem to re-learn every year or three: “When you hear hooves, think horses not zebras.”

Then how do we establish where the problem is – Virtualbox, Ubuntu, Debian or something else?

  • If it’s a problem in the Debian VM, then download a different Vagrant box
  • If it’s a problem in the Virtualbox setting, keep trying different network settings until one breaks through
  • If it’s a problem in the Ubuntu host, look for reasons why there’d be a block between (host to VM or vice-versa)

What other evidence do we have?  Well, when I run vagrant up from the Ubuntu host, it gets to “default: SSH auth method: private key” then eventually reports “Timed out while waiting for the machine to boot.  This means that Vagrant was unable to communicate with the guest machine within the configured (“config.vm.boot_timeout” value) time period.”  Makes me more suspicious of the VM.

Searching the Vagrant boxes registry, mosaicpro/html looks like it’s desktop (not locked-down server) oriented, so I tried that one.  Watched it boot, then report “default: Warning: Remote connection disconnect. Retrying…” over and over for a few minutes.  The console via Vbox Mgr looked like the Ubuntu VM was trying to configure networking (even though DHCP had offered it an address of – which must’ve been the NAT adapter, since my home network runs on 192.168.1/24).  But oddly, networking from within the client was working fine after that – ping out to my home router ( returned fine.  OK, then I’m *definitely* suspecting that Debian/jessie64 (8.6.0) box.

Vagrant/Debian downgrade anyone?

So, after all this, can I download a previous version of the Debian/jessie64 box (e.g. 8.5.2, not this troublesome 8.6.0)?  Let’s try it, using this article as basis.

(I went one step further and ran the initial command as vagrant add debian-8.5.2 – and amazingly, this variation seemed to work!)

And here’s some promising results:

  • lsb_release -a reports the 8.5.2 box as “Debian GNU/Linux 8.5 (jessie)”, vs the 8.6.0 box as “Debian GNU/Linux 8.6 (jessie)”
  • A quick look at the /etc/ssh/sshd_config from the 8.5.2 box shows there is *no* insertion of the PasswordAuthentication configuration parameter (let alone setting it to “no” like in the 8.6.0 box)
  • Network connectivity from the 8.5.2 box to my home router is awesome (vs the 8.6.0 box that can’t seem to ping out of a wet paper bag)

Final Lesson

If you’re a Vagrant + Virtualbox user, stay FAR away from the 8.6.0 version of the debian/jessie64 box (unless you’re prepared to fight with these same issues I have, and probably other ‘security lockdown’ ideas that I haven’t even uncovered yet, but are almost surely there).

The Yahoo Hack: Protect Yourself, PLEASE


If you have a Yahoo account (you probably do, by these numbers), first go change the identical password on other sites (you probably re-used the password between Yahoo and some other sites)…

AND be prepared to change the answers to (and maybe even questions of, if you often use the same ones) your security questions [the ones used to help you – OR A HACKER – reset a forgotten password] on any sites with answers in common.  Please, these responses that you’ve typed in – if accurate, and used on many sites – are not only a great way for someone who gets your password on one site, to then dig into those answers and reset your password (even one you never used elsewhere) on another site.

Focus first on your primary email address (because that’s often the most valuable – since it’s where all password resets get sent, right?), and then on your financial accounts (even those with two-factor authentication – let’s not let them drain our savings just because we were a bit lazy).

Then consider whether any of your other online accounts have real value to you if you permanently or even temporarily lost control of them. e.g. Twitter/Instagram/Tumblr/Wordpress, if you have a public presence that has helped build your reputation.

Then go get yourself a password manager (see some reviews here and here). I adopted 1Password three years ago (mostly because I prefer good UX over infinite configurability), and now I don’t care how ridiculous my random passwords are, and I intentionally provide random/hilarious (at least to me) misinformation in my security questions (because I just write these misinfos down in my password manager in the Notes field for each site).

Then reset the rest of your passwords on sites where you used the same one as your Yahoo account(s).

Sorry this was so long. But a breach like this hits lots of people and opens them up to a LOT of malicious activity across much of their digital life.  You may not be that attractive a target, but I bet your financial accounts are.

Occupied Neurons, late September edition

Modern Agile (Agile 2016 keynote)

This call out for advancement of Agile beyond 2001 and beyond the fossilization of process and “scale” is refreshing. It resonates with me in ways few other discussions of “is there Agile beyond SCRUM?” have inspired – because it provides an answer upon which we can stand up actual debate, refinement and objective experiments.

While I’m sure there are those who would wish to quibble of perfecting these new principles before committing to their underlying momentum, I for one am happy to accept this as an evolutionary stage beyond Agile Manifesto and use it to further my teams and my own evolution.

Forget Technical Debt – Here’s How to Build Technical Wealth

I had the pleasure of meeting and talking with (mostly listening and learning intently on my part) Andrea Goulet at .NET Fringe 2016 conference. Andrea is a refreshing leader in software development because she leads not only through craftsmanship but also communication as key tenet of success with her customers.

Andrea advances the term “software remodelling” to properly focus the work that deals with Technical Debt. Rather than approach the TD as a failing, looking at it “as a natural outgrowth of occupying and using the software” draws heavily and well on the analogy of remodelling your/a home.

Frequent Password Changes Are The Enemy of Security

After a decade or more of participating in the constant ground battle of information security, it became clear to me that the threat models and state of the art in information warfare has changed drastically; the defenses have been slow to catch up.

One of the vestigial tails of 20th-century information security is the dogmatically-proscribed “scheduled password change”.

The idea back then was that we had so few ways of knowing whether someone was exploiting an active, privileged user account, and we only had single-factor (password) authentication as a means of protecting that digital privilege on a system, that it seemed reasonable to force everyone to change passwords on a frequent, scheduled basis. So that, if an attacker somehow found your password (such as on a sticky note by your keyboard), *eventually* they would lose such access because they wouldn’t know your new password.

So many problems with this – for example:

  • Password increments – so many of us with multiple frequently-rotating passwords just tack on an increment img number to the end of the last password when forced to change – not terribly secure, but the only tolerable defense when forced to deal with this unnecessary burden
  • APTs and password databases – most password theft these days don’t come from random guessing, it comes from hackers either getting access to the entire database at the server, or persistent malware on your computer/phone/tablet or public devices like wifi hardware that MITM’s your password as you send it to the server
  • Malware re-infections – changing your password is only good if it isn’t as easy to steal it *after* the change as it was *before* the change – not a lot of point in changing passwords when you can get attacked just as easily (and attackers are always coming up with new zero-days to get you)

I was one of the evil dudes who reflexively recommended this measure to every organization everywhere. I apologize for perpetuating this mythology.

Hashicorp Vault + Ansible + CD: open source infra, option 2

“How can we publish our server configuration scripts as open source code without exposing our secrets to the world?”

In my first take on this problem, I fell down the rabbit hole of Ansible’s Vault technology – a single-password-driven encryption implementation that encrypts whole files and demands they be decrypted by interactive input or static filesystem input at runtime. Not a bad first try, but feels a little brittle (to changes in the devops team, to accidental inclusion in your git commits, or to division-of-labour concerns).

There’s another technology actively being developed for the devops world, by the Hashicorp project, also (confusingly/inevitably) called Vault. [I’ll call it HVault from here on, to distinguish from Ansible Vault >> AVault.]

HVault is a technology that (at least from a cursory review of the intro) promises to solve the brittle problems above. It’s an API-driven lockbox and runtime-proxy for all manner of secrets, making it possible to store and retrieve static secrets, provision secrets to some roles/users and not others, and create limited-time-use credentials for applications that have been integrated with HVault.

Implementation Options

So for our team’s purposes, we only need to worry about static secrets so far. There’s two possible ways I can see us trying to integrate this:

  1. retrieve the secrets (SSH passphrases, SSL private keys, passwords) directly and one-by-one from HVault, or
  2. retrieve just an AVault password that then unlocks all the other secrets embedded in our Ansible YAML files (using reinteractive’s pseudo-leaf indirection scheme).

(1) has the advantage of requiring one fewer technologies, which is a tempting decision factor – but it comes at the expense of creating a dependency/entanglement between HVault and our Ansible code (in naming and managing the key-value pairs for each secret) and of having to find/use a runtime solution to injecting each secret into the appropriate file(s).

(2) simplifies the problem of injecting secrets at runtime to a single secret (i.e. AVault can accept a script to insert the AVault password) and enables us to use a known quantity (AVault) for managing secrets in the Ansible YAMLs, but also means that (a) those editing the “secret-storing YAMLs” will still have to have access to a copy of the AVault password, (b) we face the future burden to plan for breaking changes introduced by both AVault and HVault, and (c) all secrets will be dumped to disk in plaintext on our continuous deployment (CD) server.

Thoughts on Choosing For Our Team

Personally, I favour (1) or even just using AVault alone. While the theoretical “separation of duties” potential for AVault + HVault is supposed to be more attractive to a security geek like me, this just seems like needless complexity for effectively very little gain. Teaching our volunteers (now and in the future) how to manage two secrets-protecting technologies would be more painful, and we double the risks of dealing with a breaking change (or loss of active development) for a necessary and non-trivially-integrated technology in our stack.

Further, if I had to stick with one, I’d stay “single vendor” and use AVault rather than spread us across two projects with different needs & design philosophies. Once we accept that there’s an occasional “out of band initialization” burden for setting up either vault, and that we’d likely have to share access to larger numbers of secrets with a wider set of the team than ideal, I think the day-to-day management overhead of AVault is no worse (and possibly lighter) than HVault.

Pseudo-Solution for an HVault-only Implementation

Assuming for the moment that we proceed with (1), this (I think) is the logical setup to make it work:

  • Setup an HVault instance
  • Design a naming scheme for secrets
  • Populate HVault with secrets
  • Install Consul Template as a service
  • Rewrite all secret-containing Ansible YAMLs with Consul Template templating variables (matching the HVault naming)
  • Rewrite CD scripts to pull HVault secrets and rewrite all secret-containing Ansible YAMLs
  • Populate the HVault environment variables to enable CD scripts to authenticate to HVault

Operational Concerns

If the HVault instance is running on a server in the production infrastructure, can HVault be configured to only allow connections from other servers that require access to the HVault secrets? This would reduce the risk that knowledge of the HVault (authentication token and address as used here) wouldn’t provide instant access to the secrets from anywhere on the Internet. This would be considered a defense-in-depth measure in case ip_tables and SSH protections could be circumvented to allow incoming traffic at the network level.

The HVault discussions about “flexibility” and “developer considerations” lead me to conclude that – for a volunteer team using part-time time slivers to manage an open source project’s infrastructure – HVault Cubbyhole just isn’t low-impact, fully-baked enough at this time to make it worth the extra development effort to create a full solution for our needs. While Cubbyhole addresses an interesting edge case in making on-the-wire HVault tokens less vulnerable, it doesn’t substantially mitigate (for us, at least) the bootstrapping problem, especially when it comes to a single-server HVault+deployment service setup.

Residual Security Issues

  • All this gyration with HVault is meant to help solve the problems of (a) storing all Ansible YAML-bound secrets in plaintext, (b) storing a static secret (the AVault password) in plaintext on our CD server, and (c) finding some way to keep any secrets from showing up in our github repo.
  • However, there’s still the problem of authenticating a CD process to HVault to retrieve secret(s) in the first place
  • We’re still looking to remove human intervention from standard deployments, which means persisting the authentication secret (token, directory-managed user/pass, etc) somewhere on disk (e.g. export VAULT_TOKEN=xxxx)
  • Whatever mechanism we use will ultimately be documented – either directly in our github repo, or in documentation we end up publishing for use by other infrastructure operators and those who wish to follow our advice


This is not the final word – these are merely my initial thoughts, and I’m looking forward to members of the team bringing their take to these technologies, comparisons and issues.  I’m bound to learn something and we’ll check back with the results.

Reading List

Intro to Hashicorp Vault:

Blog example using HVault with Chef:

Example Chef Recipe for using HVault

Ansible lookup module to retrieve secrets from HVault

Ansible modules for interacting with HVault