Ansible Vault for an open source project: adventures in simplified indirection

“How can we publish our server configuration scripts as open source code without exposing our secrets to the world?”

It seemed like a simple enough mission. There are untold numbers of open source projects publishing directly to; most large projects have secrets of one form or another. Someone must have figured out a pattern for keeping the secrets *near* the code without actually publishing them (or a key leading to them) as plaintext *in* the code, yes?

However, a cursory examination of tutorials on Ansible Vault left me with an uneasy feeling. It appears that a typical pattern for this kind of setup is to partition your secrets as variables in an Ansible Role, encrypt the variables, and unlock them at runtime with reference to a password file (~/.vault_pass.txt) [or an interactive prompt at each Ansible run *shudder*]. The encrypted content is available as an AES256 blob, and the password file… well, here’s where I get the heebie-jeebies:

  1. While AES256 is a solid algorithm, it still feels…weird to publish such files to the WORLD. Distributed password cracking is quite a thing; how ridiculous of a password would we need to have to withstand an army of bots grinding away at a static password, used to unlock the encrypted secrets? Certainly not a password that anyone would feel comfortable typing by hand every time it’s prompted.
  2. Password files need to be managed, stored, backed up and distributed/distributable among project participants. Have you ever seen the docs for PGP re: handling the master passphrase? Last time I remember looking with a friend, he showed me four places where the docs said “DON’T FORGET THE PASSPHRASE”. [Worst case, what happens if the project lead gets hit by a bus?]

I guess I was expecting some kind of secured, daemon-based query-and-response RPC server, the way Jan-Piet Mens envisioned here.


  • We have a distributed, all-volunteer team – hit-by-a-bus scenarios must be part of the plan
  • (AFAIK) We have no permanent “off-the-grid” servers – no place to stash a secret that isn’t itself backed up on the Internet – so there will have to be at least periodic bootstrapping, and multiple locations where the vault password will live

Concerns re: Lifecycle of Ansible Vault secrets:

  1. Who should be in possession of the master secret? Can this be abstracted or does anyone using it have to know its value?
  2. What about editing encrypted files? Do you have to decrypt them each time and re-encrypt, or does “ansible-vault edit” hand-wave all that for you?
    • Answer: no, “ansible-vault edit” doesn’t persist the decrypted contents to disk, just sends them to your editor and transparently re-encrypts on save.
  3. Does Ansible Vault use per-file AES keys or a single AES key for all operations with the same password (that is, is the vault password a seed for the key or does it encrypt the key)?
    • Answer: not confirmed, but perusing the source code and the docs never mention per-file encryption, and the encrypted contents do not appear to store an encrypted AES key, so it looks like one AES key per vault password.
  4. Where to store the vault password if you want to integrate it into a CD pipeline?
    • Answer: –vault-password-file ~/.vault_pass.txt OR EVEN –vault-password-file ~/, where the script sends the password to stdout]
  5. Does anyone have a viable scheme that doesn’t require a privileged operator to be present during every deployment (–ask-vault-pass)?
    • i.e. doesn’t that mean you’re in danger of including ~/.vault_pass.txt in your git commit at some point? If not, where does that secret live?
  6. If you incorporate LastPass into your workflow to keep a protected copy of the vault password, can *that* be incorporated into the CD pipeline somehow?
  7. Are there any prominent OSS projects that have published their infrastructure and used Ansible Vault to publish encrypted versions of their secrets?

Based on my reading of the docs and blogs, it seems like this is the proferred solution for maximum automation and maintainability:

  • Divvy up all your secrets as variables and use pseudo-leaf indirection (var files referencing prefixed variables in a separate file) as documented here.
  • Encrypt the leaf-node file(s) using a super-complex vault password
  • Store the vault password in ~/.vault_pass.txt
  • Call all ansible and ansible-playbook commands using the –vault-password-file option
  • Smart: wire up a pre-commit step in git to make sure the right files are always encrypted as documented here.
  • Backup the vault password in a password manager like LastPass (so that only necessary participants get access to that section)
  • Manually deploy the ,vault_pass.txt file to your Jenkins server or other CI/CD master and give no one else access to that server/root/file.
  • Limit the number of individuals who need to edit the encrypted file(s), and make sure they list.vault_pass.txt in their .gitignore file.

P.S. Next up – look into the use of Hashicorp’s Vault project.

Reading List

Ansible Vault Docs:

This is an incredibly useful article of good practices for using Ansible (and Ansible Vault) in a reasonably productive way:

Occupied Neurons, early July 2016: security edition

Who are you, really: Safer and more convenient sign-in on the web – Google I/O 2016

Google shared some helpful tips for web developers to make it as easy as possible for users to securely sign in to your web site, from the Google Chrome team:

  • simple-if-annoying-that-we-still-have-to-use-these attributes to add to your forms to assist Password Manager apps
  • A Credential Management API that (though cryptically explained) smoothes out some of the steps in retrieving creds from the Chrome Credential Manager
  • This API also addresses some of the security threats (plaintext networks, Javascript-in-the-middle, XSS)
  • Then they discuss the FIDO UAF and U2F specs – where the U2F “security key” signs the server’s secondary challenge with a private key whose public key is already enrolled with the online identity the server is authenticating

The U2F “security key” USB dongle idea is cute and useful – it requires the user’s interaction with the button (can’t be automatically scraped by silent malware), uses RSA signatures to provide strong proof of possession and can’t be duplicated. But as with any physical “token”, it can be lost and it requires that physical interface (e.g. USB) that not all devices have. Smart cards and RSA tokens (the one-time key generators) never entirely caught on either, despite their laudable security laurels.

The Credential Manager API discussion reminds me of the Internet Explorer echo chamber from 10-15 years ago – Microsoft browser developers adding in all these proprietary hooks because they couldn’t imagine anyone *not* fully embracing IE as the one and only browser they would use everywhere. Disturbing to see Google slip into that same lazy arrogance – assuming that web developers will assume that their users will (a) always use Chrome and (b) be using Chrome’s Credential Manager (not an external password manager app) to store passwords.

Disappointing navel-gazing for the most part.

Google’s password-free logins may arrive on Android apps by year-end

Project Abacus creates a “Trust Score API” – an interesting concept which intends supplant the need for passwords or other explicit authentication demands, by taking ambient readings from sensors and user interaction patterns with their device to determine how likely it is that the current holder/user is equivalent to the identity being asserted/authenticated.

This is certainly more interesting technology, if only because it allows for the possibility that any organization/entity that wishes to set their own tolerance/threshold per-usage can do so, using different “Trust Scores” depending on how valuable the data/API/interaction is that the user is attempting. A simple lookup of a bank balance could require a lower score than making a transfer of money out of an account, for example.

The only trick to this is the user must allow Google to continuously measure All The Thingz from the device – listen on the microphone, watch all typing, observe all location data, see what’s in front of the camera lens. Etc. Etc. Etc.

If launched today, I suspect this would trip over most users’ “freak-out” instinct and would fail, so kudos to Google for taking it slow. They’re going to need to shore up the reputation of Android phones and their inscrutably cryptic if comprehensive permissions model and how well that’s sandboxed if they’ll ever get widespread trust for Google to watch everything you’re doing.


Looks like Microsoft is incorporating “widely-used hacked passwords” into the set of password rules that Active Directory can enforce against users trying to establish a weak password. Hopefully this’ll be less frustrating than the “complex passwords” rules that AD and some of Microsoft’s more zealous customers like to enforce, making it nigh-impossible to know what the rules are let alone give a sentient human a chance of getting a password you might want to type 20-50 times/day. [Not that I have any PTSD from that…]

Unfortunately, they do a piss-poor job of explaining how “Smart Password Lockout” works. I’m going to take a guess how this works, and hopefully someday it’ll be spelled out. It appears they’ve got some extra smarts in the AD password authentication routine that runs at the server-side – it can effectively determine whether the bad password authentication attempt came from an already-known device or not. This means that AD is keeping a rolling cache of the “familiar environments” – likely one that ages out the older records (e.g. flushing anything older than 30 days). What’s unclear is whether they’re recording remote IP addresses, remote computer names/identities, remote IP address subnets, or some new “cookie”-like data that wasn’t traditionally sent with the authentication stream.

If this is based on Kerberos/SAML exchanges, then it’s quite possible to capture the remote identity of the computer from which the exchange occurred (at least for machines that are part of the Active Directory domain). However, if this is meant as a more general-purpose mitigation for accounts used in more Internet (not Active Directory domain) setting, then unless Active Directory has added cookie-tracking capabilities it didn’t have a decade ago, I’d imagine they’re operating strictly on the remote IP address enveloped around any authentication request (Kerberos, NTLM, Basic, Digest).

Still seems a worthwhile effort – if it allows AD to lockout attackers trying to brute-force my account from locations where no successful authentication has taken place – AND continues to allow me to proceed past the “account lockout” at the same time – this is a big win for end users, especially where AD is used in Internet-facing settings like Azure.

Useable Security tales, part the 23rd: TouchID spoof still smells in the realm of the fantastic

CSI Fingerprint Investigation KitSaw the latest video proof of the possibility of spoofing the iPhone 5S TouchID sensor with a fingerprint replica ‘recovered’ from the iPhone.  Yes, the “proof” is in the video, and congrats to the CCC who have demonstrated their mastery of fingerprint recovery over the decades.  But I think we should remember to think critically about this laboratory demonstration, and what it does and doesn’t demonstrate.  I’m going to focus simply on the first step, the capture of a viable fingerprint from the phone itself.

In a word, trivial – under what real-world (not Hollywood) scenario will you be finding such a (a) clean phone (b) just logged in via passcode and (c) capture the phone in a state where that fingerprint hasn’t been smudged?

I don’t know about you, but in my experience this is quite a unique usage model:

(a)    Take a clean iPhone screen (no previous smudges, swipes or smears on the screen to muddy up the about-to-be-captured fingerprint)

(b)   Login via passcode on a 5S where TouchID has already been enrolled (i.e. this phone hasn’t been used in 48 hours, or it’s only *just* been rebooted and never unlocked)

(c)    Grab the phone *immediately* afterwards (before the user has a chance to touch, swipe and pinch the crap out of that “perfect” fingerprint image)

(d)   Make sure you don’t touch the screen before you capture a hi-res scan of the fingerprint image (i.e. don’t grab it too heavily as a running thief might, and definitely don’t throw it in a bag or pocket as you run away)

When will I be unlocking my 5S with a passcode?  Statistically speaking, most likely in one of the two locations where I use it most: at home, or at work.  Is it likely a thief is waiting behind the credenza for me there?  With an adult diaper and a bag of snacks (as he waits for that perfect moment to bonk me on the head)?

I’m also pretty likely to continue to use the phone – I don’t know too many people who unlock the phone and then leave it aside.  So I’m very likely to pinch, swipe and tap all over that screen, given all the apps locations and usage models I and many users have.

Finally are we relying on a threat scenario where the thief happens to have a forensic evidence-quality bag to drop the phone into…and is he wearing rubber gloves?  If Benson, Stabler or Grissom wanted to grab my phone, I’m pretty sure they’ve got other ways to get at the secrets that I happen to have stored on my phone.

Are we really accepting that this is a realistic enough scenario to warrant all the fear against a significant advancement in consumer security technologies?  Yes the industry can do better, but I hope we’re not letting perfect be the enemy of good – I’d hate to see anyone’s next business ventures all be judged on that model (and still derive the massive profits we’re all in search of).