Simple troubleshooting the usual SSH error from Ansible

After the struggles I’ve had over the last couple of days, it’s strangely reassuring to stumble on a problem I’ve actually *seen* before, and recently.  Firsthand even.

I’ve fresh-built a Debian 8.5.2 VM in Virtualbox via Vagrant.  Then I setup an Ansible inventory file to point to the box’s current vagrant ssh-config settings.  Then fired off the tried-and-true ansible connectivity test, ansible all -u vagrant -m ping.  Here’s the response:

127.0.0.1 | UNREACHABLE! => {
    "changed": false,
    "msg": "ERROR!  SSH encountered an unknown error during the connection.  We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue",
    "unreachable": true
}

Running the same command with -vvvv parameter results in a garbage heap of unformatted/concatenated debug screed, which ends with:

debug1: No more authentication methods to try.\r\nPermission denied (publickey,password).\r\n"

Simple Solution

As I’ve documented to myself already elsewhere, I need to run the following two commands:

ssh-keygen -t rsa

ssh-copy-id -p 2222 vagrant@127.0.0.1

Bingo!  Nice to get an easy win.

What I’ve learned: setting up Bash/Ubuntu/Win10 for Ansible + Vagrant + VirtualBox

My Goal: test the use of this Ansible Role from Windows 10, using a combination of Windows and Bash for Ubuntu on Windows 10 tools.  Favour the *nix tools wherever possible, for maximum compatibility with the all-Linux production environment.

Preconditions

Here is the software/shell arrangement that worked for me in my Win10 box:

  • Runs in Windows: Virtualbox, Vagrant
  • Runs in Bash/Ubuntu: Ansible (in part because of this)

In this setup, I’m using a single Virtualbox VM in default network configuration, whereby Vagrant ends up reporting the host listening on 127.0.0.1 and SSH listening on TCP port 2222.  Substitute your actual values as required.

Also note the versions of software I’m currently running:

  • Windows 10: Anniversary Update, build 14393.51
  • Ansible (*nix version in Bash/Ubuntu/Win10): 1.5.4
  • VirtualBox (Windows): 5.0.26
  • Vagrant (Windows): 1.8.1

Run the Windows tools from a Windows shell

  • C:\> vagrant up
  • (or launch a Bash shell with cbwin support:  C:\>outbash, then try running /mnt/c/…/Vagrant.exe up from the bash environment)

Start the Virtualbox VMs using Vagrant

  • Vagrant (Bash) can’t just do vagrant up where VirtualBox is installed in Windows – it depends on being able to call the VBoxManage binary
    • Q: can I trick Bash to call VBoxManage.exe from /mnt/c/Program Files/Oracle/VirtualBox?
    • If not, is it worth messing around with Vagrant (Bash)?  Or should I relent and try Vagrant (Windows), either using cbwin or just running from a different shell?
  • Vagrant (Windows) runs into the fscking rsync problem (as always)
    • Fortunately you can disable rsync if you don’t need the sync’d folders
    • Disabling the synced_folder requires editing the Vagrantfile to add this in the Vagrant.configure section:
      config.vm.synced_folder “.”, “/vagrant”, disabled: true

Setup the inventory for management

  • Find the IP’s for all managed boxes
  • Organize them (in one group or several) in the /etc/ansible/hosts file
  • Remember to specify the SSH port if non-22:
    [test-web]
    127.0.0.1 ansible_ssh_port=2222
    # 127.0.0.1 ansible_port=2222 when Ansible version > 1.9
    • While “ansible_port” is said to be the supported parameter as of Ansible 2.0, my own experience with Ansible under Bash on Windows was that ansible wouldn’t connect properly to the server until I changed the inventory configuration to use “ansible_ssh_port”, even though ansible –version reported itself as 2.1.1.0
    • Side question: is there some way to predictably force the same SSH port every time for the same box?  That way I can setup an inventory in my Bash environment and keep it stable.

Getting SSH keys on the VMs

  • (Optional: generate keys if not already) Run ssh-keygen -t rsa
  • (Optional: if you’ve destroyed and re-generated the VM with vagrant destroy/up, wipe out the existing key for the host:port combination by running the following command that is recommended when ssh-copy-id fails): ssh-keygen -f “/home/mike/.ssh/known_hosts” -R [127.0.0.1]:2222
  • Run ssh-copy-id vagrant@127.0.0.1 -p 2222 to push the public key to the target VM’s vagrant account

Connect to the VMs using Ansible to test connectivity

  • [from Windows] vagrant ssh-config will tell you the IP address and port of your current VM
  • [from Bash] ansible all -u vagrant -m ping will check basic Ansible connectivity
    • (ansible all -c local -m ping will go even more basic, testing Ansible itself)

Run the playbook

  • Run ansible-playbook [playbook_name.yml e.g. playbook.yml] -u vagrant
    • If you receive an error like “SSH encountered an unknown error” with details that include “No more authentication methods to try.  Permission denied (publickey,password).”, make sure to remember to specify the correct remote user (i.e. one that trusts your SSH key)
    • If you receive an error like “stderr: E: Could not open lock file /var/lib/dpkg/lock – open (13: Permission denied)”, make sure your remote user runs with root privilege – e.g. in the [playbook.yml], ensure sudo: true is included
  • Issue: if you receive an error like “fatal: [127.0.0.1]: UNREACHABLE! => {“changed”: false, “msg”: “Failed to connect to the host via ssh.”, “unreachable”: true}”, check that your SSH keys are trusted by the remote user you’re using (e.g. “-u vagrant” may not have the SSH keys already trusted)
  • If you wish to target a subset of servers in your inventory (e.g. using one or more groups), add the “-l” parameter and name the inventory group, IP address or hostname you wish to target
    e.g. ansible-playbook playbook.yml -u vagrant -l test-web
    or ansible-playbook playbook.yml -u vagrant -l 127.0.0.1

Protip: remote_user

If you want to stop having to add -u vagrant to all the fun ansible commands, then go to your /etc/ansible/ansible.cfg file and add remote_user = vagrant in the appropriate location.

Rabbit Hole Details for the Pedantically-Inclined

03ec8fe3bb146924423af6381eb99ea9

Great Related Lesson: know the difference between vagrant commands

  • Run vagrant ssh to connect to the VM [note: requires an SSH app installed in Windows, under this setup]
  • Run vagrant status to check what state the VM is in
  • Run vagrant reload to restart the VM
  • Run vagrant halt to stop the VM
  • Run vagrant destroy to wipe the VM

Ansible’s RSA issue when SSH’ing into a non-configured remote user

  • The following issue occurs when running ansible commands to a remote SSH target
    e.g. ansible all -m ping
  • This occurs even when the following commands succeed:
    • ansible -c local all -m ping
    • ssh vagrant@host.name [port #]
    • ssh-copy-id -p [port #] vagrant@host.name
  • Also note: prefixing with “sudo” doesn’t seem to help – just switches whose local keys you’re using
  • I spent the better part of a few hours (spaced over two days, due to rage quit) troubleshooting this situation
  • Troubleshooting this is challenging to say the least, as ansible doesn’t intelligently hint at the source of the problem, even though this must be a well-known issue
    • There’s nothing in the debug output of ssh/(openssl?) that indicates that there are no trusted SSH keys in the account of the currently-used remote user
    • Nor is it clear which remote user is being impersonated – sure, I’ll bet someone that fights with SSH & OpenSSL all day would have noticed the subtle hints, but for those of us just trying to get a job done, it’s like looking through foggy glass
  • Solution: remember to configure the remote user under which you’re connecting (i.e. a user with the correct permissions *and* who trusts the SSH keys in use)
    • Solution A: add the -u vagrant parameter
    • Solution B: specify remote_user = vagrant in the ansible.cfg file under [defaults]
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ ansible-playbook role.yml -vvvv

PLAY [all] ********************************************************************

GATHERING FACTS ***************************************************************
<127.0.0.1> ESTABLISH CONNECTION FOR USER: mike
<127.0.0.1> REMOTE_MODULE setup
<127.0.0.1> EXEC ['ssh', '-C', '-tt', '-vvv', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'ControlPath=/home/mike/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 'Port=2222', '-o', 'KbdInteractiveAuthentication=no', '-o', 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', '-o', 'PasswordAuthentication=no', '-o', 'ConnectTimeout=10', '127.0.0.1', "/bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1471378875.79-237810336673832 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1471378875.79-237810336673832 && echo $HOME/.ansible/tmp/ansible-tmp-1471378875.79-237810336673832'"]
fatal: [127.0.0.1] => SSH encountered an unknown error. The output was:
OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: auto-mux: Trying existing master
debug1: Control socket "/home/mike/.ansible/cp/ansible-ssh-127.0.0.1-2222-mike" does not exist
debug2: ssh_connect: needpriv 0
debug1: Connecting to 127.0.0.1 [127.0.0.1] port 2222.
debug2: fd 3 setting O_NONBLOCK
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.
debug3: timeout: 10000 ms remain after connect
debug3: Incorrect RSA1 identifier
debug3: Could not load "/home/mike/.ssh/id_rsa" as a RSA1 public key
debug1: identity file /home/mike/.ssh/id_rsa type 1
debug1: identity file /home/mike/.ssh/id_rsa-cert type -1
debug1: identity file /home/mike/.ssh/id_dsa type -1
debug1: identity file /home/mike/.ssh/id_dsa-cert type -1
debug1: identity file /home/mike/.ssh/id_ecdsa type -1
debug1: identity file /home/mike/.ssh/id_ecdsa-cert type -1
debug1: identity file /home/mike/.ssh/id_ed25519 type -1
debug1: identity file /home/mike/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.7p1 Debian-5+deb8u1
debug1: match: OpenSSH_6.7p1 Debian-5+deb8u1 pat OpenSSH* compat 0x04000000
debug2: fd 3 setting O_NONBLOCK
debug3: put_host_port: [127.0.0.1]:2222
debug3: load_hostkeys: loading entries for host "[127.0.0.1]:2222" from file "/home/mike/.ssh/known_hosts"
debug3: load_hostkeys: found key type ECDSA in file /home/mike/.ssh/known_hosts:2
debug3: load_hostkeys: loaded 1 keys
debug3: order_hostkeyalgs: prefer hostkeyalgs: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug2: kex_parse_kexinit: curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
debug2: kex_parse_kexinit: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-dss-cert-v01@openssh.com,ssh-rsa-cert-v00@openssh.com,ssh-dss-cert-v00@openssh.com,ssh-ed25519,ssh-rsa,ssh-dss
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se
debug2: kex_parse_kexinit: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: zlib@openssh.com,zlib,none
debug2: kex_parse_kexinit: zlib@openssh.com,zlib,none
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: kex_parse_kexinit: curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1
debug2: kex_parse_kexinit: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ssh-ed25519
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com
debug2: kex_parse_kexinit: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: kex_parse_kexinit: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: kex_parse_kexinit: none,zlib@openssh.com
debug2: kex_parse_kexinit: none,zlib@openssh.com
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: mac_setup: setup hmac-sha1-etm@openssh.com
debug1: kex: server->client aes128-ctr hmac-sha1-etm@openssh.com zlib@openssh.com
debug2: mac_setup: setup hmac-sha1-etm@openssh.com
debug1: kex: client->server aes128-ctr hmac-sha1-etm@openssh.com zlib@openssh.com
debug1: sending SSH2_MSG_KEX_ECDH_INIT
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ECDSA 07:f3:2f:b0:86:b5:b6:2b:d9:f5:26:71:95:6e:d9:ce
debug3: put_host_port: [127.0.0.1]:2222
debug3: put_host_port: [127.0.0.1]:2222
debug3: load_hostkeys: loading entries for host "[127.0.0.1]:2222" from file "/home/mike/.ssh/known_hosts"
debug3: load_hostkeys: found key type ECDSA in file /home/mike/.ssh/known_hosts:2
debug3: load_hostkeys: loaded 1 keys
debug1: Host '[127.0.0.1]:2222' is known and matches the ECDSA host key.
debug1: Found key in /home/mike/.ssh/known_hosts:2
debug1: ssh_ecdsa_verify: signature correct
debug2: kex_derive_keys
debug2: set_newkeys: mode 1
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug2: set_newkeys: mode 0
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug2: key: /home/mike/.ssh/id_rsa (0x7fffbdbd5b80),
debug2: key: /home/mike/.ssh/id_dsa ((nil)),
debug2: key: /home/mike/.ssh/id_ecdsa ((nil)),
debug2: key: /home/mike/.ssh/id_ed25519 ((nil)),
debug1: Authentications that can continue: publickey,password
debug3: start over, passed a different list publickey,password
debug3: preferred gssapi-with-mic,gssapi-keyex,hostbased,publickey
debug3: authmethod_lookup publickey
debug3: remaining preferred: ,gssapi-keyex,hostbased,publickey
debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/mike/.ssh/id_rsa
debug3: send_pubkey_test
debug2: we sent a publickey packet, wait for reply
debug1: Authentications that can continue: publickey,password
debug1: Trying private key: /home/mike/.ssh/id_dsa
debug3: no such identity: /home/mike/.ssh/id_dsa: No such file or directory
debug1: Trying private key: /home/mike/.ssh/id_ecdsa
debug3: no such identity: /home/mike/.ssh/id_ecdsa: No such file or directory
debug1: Trying private key: /home/mike/.ssh/id_ed25519
debug3: no such identity: /home/mike/.ssh/id_ed25519: No such file or directory
debug2: we did not send a packet, disable method
debug1: No more authentication methods to try.
Permission denied (publickey,password).

TASK: [ansible-role-unattended-upgrades | add distribution-specific variables] ***
FATAL: no hosts matched or all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/mike/role.retry

127.0.0.1                  : ok=0    changed=0    unreachable=1    failed=0

Ansible’s permissions issue when trying to run non-trivial commands without sudo

  • ansible -m ping will work fine without local root permissions, making you think that you might be able to do other ansible operations without sudo
  • Haha! You would be wrong, foolish apprentice
  • Thus, the SSH keys for enabling ansible to work will have to be (a) generated for the local root user and (b) copied to the remote vagrant user
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ ansible-playbook -u vagrant role.yml

PLAY [all] ********************************************************************

GATHERING FACTS ***************************************************************
ok: [127.0.0.1]

TASK: [ansible-role-unattended-upgrades | add distribution-specific variables] ***
ok: [127.0.0.1]

TASK: [ansible-role-unattended-upgrades | install unattended-upgrades] ********
failed: [127.0.0.1] => {"failed": true, "item": ""}
stderr: E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?

msg: 'apt-get install 'unattended-upgrades' ' failed: E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?


FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/mike/role.retry

127.0.0.1                  : ok=2    changed=0    unreachable=0    failed=1

 

Articles I reviewed while doing the work outlined here

https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys–2

http://blog.publysher.nl/2013/07/infra-as-repo-using-vagrant-and-salt.html

https://github.com/devopsgroup-io/vagrant-digitalocean

https://github.com/mitchellh/vagrant/issues/4073

http://stackoverflow.com/questions/23337312/how-do-i-use-rsync-shared-folders-in-vagrant-on-windows

https://github.com/mitchellh/vagrant/issues/3230

https://www.vagrantup.com/docs/synced-folders/basic_usage.html

http://docs.ansible.com/ansible/intro_inventory.html

http://stackoverflow.com/questions/36932952/ansible-unable-to-connect-to-aws-ec2-instance

http://serverfault.com/questions/649659/ansible-try-to-ping-connection-between-localhost-and-remote-server

http://stackoverflow.com/questions/22232509/vagrant-provision-works-but-i-cannot-send-an-ad-hoc-command-with-ansible

http://stackoverflow.com/questions/21670747/what-user-will-ansible-run-my-commands-as#21680256

Bash/Ubuntu on Win10: getting *nix vagrant working with virtualbox (not)

TL;DR Getting vagrant + virtualbox running natively in Bash for Unbuntu on Windows is a no-go.  Try a hybrid Windows/WSL solution instead.

At the end of our last episode, our hero was trapped under the following paradox:

mike@MIKE-WIN10-SSD:/mnt/c/Users/Mike/VirtualBox VMs/BaseDebianServer$ vagrant up
VirtualBox is complaining that the installation is incomplete. Please
run `VBoxManage --version` to see the error message which should contain
instructions on how to fix this error.
mike@MIKE-WIN10-SSD:/mnt/c/Users/Mike/VirtualBox VMs/BaseDebianServer$ VBoxManage --version
WARNING: The character device /dev/vboxdrv does not exist.
         Please install the virtualbox-dkms package and the appropriate
         headers, most likely linux-headers-3.4.0+.

         You will not be able to start VMs until this problem is fixed.

However, the advice for installing virtualbox-dkms is merely a distraction:

mike@MIKE-WIN10-SSD:/mnt/c/Users/Mike/VirtualBox VMs/BaseDebianServer$ sudo apt-get install virtualbox-dkms
Reading package lists... Done
Building dependency tree
Reading state information... Done
virtualbox-dkms is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 44 not upgraded.

And installing linux-headers-3.4.0+ doesn’t seem to work:

mike@MIKE-WIN10-SSD:/mnt/c/Users/Mike/VirtualBox VMs/BaseDebianServer$ sudo apt-get install linux-headers-3.4.0+
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package linux-headers-3.4.0
E: Couldn't find any package by regex 'linux-headers-3.4.0'

Where to go from here?

AskUbuntu turns up this tasty lead:

http://askubuntu.com/questions/465454/problem-with-the-installation-of-virtualbox

…where VBoxManage instead indicates “…most likely linux-headers-generic”.  This aligns with my previous investigation into the version of Linux that ships with Bash on Ubuntu for Windows (‘uname -r’ returns “3.4.0+”, which I suspect is what VBoxManage appends to its “most likely” hint).

Aside

On a lark, I decided to see if I could confirm this theory from the virtualbox source code.  Since it’s Oracle, of course they had to use an “enterprise-y” repo (Trac) which provides a browseable but not searchable front-end, so I pawed through each of the .cpp files by hand on the off-chance this message was being constructed directly in VBoxManage*.cpp source:

https://www.virtualbox.org/browser/vbox/trunk/src/VBox/Frontends/VBoxManage

It’s entirely possible the message is passed up from an imported library, or that it’s constructed from fragments that don’t explicitly include the string “most likely” in any one line of source, but I wasn’t able to find it from this branch of the virtualbox source repo.

Dead End, Take a Guess

OK, if there’s no specific indication which version of the headers must be used, and on the assumption no damage can be caused by downloading what should merely be text files, then let’s just try the linux-headers-generic and see what happens.

And the apt-get messages seem promising – especially that it selected linux-headers-3.13.* files magically without me tracking down which specific versions I needed:

mike@MIKE-WIN10-SSD:/mnt/c/Users/Mike/VirtualBox VMs/BaseDebianServer$ sudo apt-get install linux-headers-generic
[sudo] password for mike:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
  linux-headers-3.13.0-92 linux-headers-3.13.0-92-generic
The following NEW packages will be installed:
  linux-headers-3.13.0-92 linux-headers-3.13.0-92-generic
  linux-headers-generic
0 upgraded, 3 newly installed, 0 to remove and 44 not upgraded.
Need to get 9,571 kB of archives.
After this operation, 77.0 MB of additional disk space will be used.
Do you want to continue? [Y/n]

Cool, except for these lines in the script output:

Examining /etc/kernel/header_postinst.d.
run-parts: executing /etc/kernel/header_postinst.d/dkms 3.13.0-92-generic /boot/vmlinuz-3.13.0-92-generic

(why does that make me think I just overwrote something important?)

…and for reasons initially unknown to me, the run-parts script seems to go zombie.

[Aside: I’m still too polluted by my Win32 experience, so I kept trying to interrupt with Ctrl-C.  No.  Bad dog, no treat.  Instead, try Ctrl-Z (pronounced “zed”, ’cause I’m Canadian like that.]

Finding Out if Anyone Else Has Seen This

Vagrant is a pretty popular way of managing virtual machines these days, right?  Yeah.  And while I might be in the first days of the public release of Bash on Windows, there’s been an Insiders Preview going for months, and lots of people banging on the corners.

So what are the odds someone else has tried this too?

Old school: search stackoverflow.com, social.technet.microsoft.com.  No bueno – plenty of folks reporting issues on SO with Bash on Windows, but no one there has reported this vagrant problem.

New school: somehow stumbled across the github repo for BashOnWindows, and dutifully filled out as detailed an issue report as I could muster.

=== NOW HERE’S THE PART THAT BLEW MY MIND ===

A Microsoft employee responded with an intelligent and helpful reply within hours on the same day!!!

(I remember a decade ago, Microsoft’s ‘engagement’ with customers reporting real issues with new software – even when Microsoft’s external bug trackers existed – was abysmal.  You’d be lucky to get an acknowledgement inside a month, and rarely if ever would they bother to update the issue when/if it ever got dispositioned, let alone addressed.  THIS KIND OF RESPONSIVENESS IS AMAZING FROM A CORPORATION.)

Root Issue

My bad, I’d misunderstood the implications of this: WSL (Windows Subsystem for Linux), which supports the user-mode Bash on Ubuntu layer, doesn’t implement any native Linux kernel support.  It’s all user-mode support, and it’s only for non-GUI apps (i.e. things that don’t require Display:0).

Our intrepid Microsoft employee reports here that DKMS isn’t currently supported.  The fact I took it even further to try installing the linux headers was moot; /dev/vboxdrv wouldn’t be available no matter what.

Cleanup in Aisle 4

Did you happen to go down the same road as me?  [What, are you similarly touched in the head?]  If so, here’s what I did to back out of my mess:

  • Performed the lock/install package cleanup specified here
  • Did as clean an uninstall of the linux-headers-generic package as I could (running sudo apt-get –purge remove linux-headers-generic), which outputs
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    The following packages will be REMOVED:
      linux-headers-generic*
    0 upgraded, 0 newly installed, 1 to remove and 47 not upgraded.
    2 not fully installed or removed.
    After this operation, 29.7 kB disk space will be freed.
    Do you want to continue? [Y/n]

    …which leads to the same run-parts script that fails.  Cleanup the locks/install packages again…then pray not enough damage was done by run-parts (in either direction) to matter. [Boy is that a landmine waiting to go off months from now…]

  • Clean uninstall of vagrant (sudo apt-get –purge remove vagrant)…which somehow leads again to these same lines:
    ...
    Do you want to continue? [Y/n]
    (Reading database ... 64370 files and directories currently installed.)
    Removing vagrant (1.4.3-1) ...
    Purging configuration files for vagrant (1.4.3-1) ...
    Processing triggers for man-db (2.6.7.1-1ubuntu1) ...
    Setting up linux-headers-3.13.0-92-generic (3.13.0-92.139) ...
    Examining /etc/kernel/header_postinst.d.
    run-parts: executing /etc/kernel/header_postinst.d/dkms 3.13.0-92-generic /boot/vmlinuz-3.13.0-92-generic

    Ctrl-Z, rm locks and install bits.  [This is getting old.]

  • Clean uninstall of virtualbox (sudo apt-get –purge remove virtualbox)…and once again that unkillable linux-headers setup rears its head.
  • Let’s look closer.
  • Here’s the preamble when removing the vagrant package:
    mike@MIKE-WIN10-SSD:/var/lib/dpkg/updates$ sudo apt-get --purge remove vagrant
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    The following packages were automatically installed and are no longer required:
      bsdtar libarchive13 liblzo2-2 libnettle4 libruby1.9.1 ruby ruby-childprocess
      ruby-erubis ruby-ffi ruby-i18n ruby-log4r ruby-net-scp ruby-net-ssh
      ruby1.9.1
    Use 'apt-get autoremove' to remove them.
    The following packages will be REMOVED:
      vagrant*
    0 upgraded, 0 newly installed, 1 to remove and 47 not upgraded.
    1 not fully installed or removed.
    After this operation, 1,612 kB disk space will be freed.
    Do you want to continue? [Y/n]

    [My italics for emphasis]

  • Maybe – just MAYBE – the “not fully installed” package is linux-headers-generic, and if I could coax apt-get or dpkg to clean *that* up, we’d rid ourselves of this mess.  [*foreshadowing*  …or maybe I just need to find out how to wipe and reinstantiate Bash on Windows…]
  • First, do the suggested cleanup (sudo apt-get autoremove)
  • Then install debfoster and deborphan
  • Debfoster reports nothing interesting, but deborphan reports:
    deborphan: The status file is in an improper state.
    One or more packages are marked as half-installed, half-configured,
    unpacked, triggers-awaited or triggers-pending. Exiting.
  • This article provides a great grep for isolating the issue – here’s what it uncovered:
    Package: linux-headers-3.13.0-92-generic
    Status: install ok half-configured
    --
    Package: dialog
    Status: install ok unpacked
    --
    Package: debfoster
    Status: install ok unpacked
    --
    Package: deborphan
    Status: install ok unpacked
  • sudo dpkg –audit reports:
    The following packages are only half configured, probably due to problems
    configuring them the first time.  The configuration should be retried using
    dpkg --configure <package> or the configure menu option in dselect:
      linux-headers-3.13.0-92-generic Linux kernel headers for version 3.13.0 on 64
  • We already know “retry” isn’t the answer here…
  • sudo dpkg –configure –pending definitely kicks off the dead-end configuration of the headers…what can cause this to back out, or to remove the stuff that keeps getting triggered?
  • As I was about to uninstall Bash for Ubuntu, I (for no reason) ran exit from within the Bash shell, which showed me this new output:
    mike@MIKE-WIN10-SSD:/var/lib/dpkg$ exit
    exit
    run-parts: waitpid: Interrupted system call
    Failed to process /etc/kernel/header_postinst.d at /var/lib/dpkg/info/linux-headers-3.13.0-92-generic.postinst line 110.
    dpkg: error processing package linux-headers-3.13.0-92-generic (--configure):
    subprocess installed post-installation script returned error exit status 4
    Setting up dialog (1.2-20130928-1) …
  • After a few minutes I just killed the window
  • Restarted Bash, didn’t appear to have made any improvements on my situation.

Perhaps it’s time to finally throw in the towel.

Complete rebuild of Bash on Ubuntu

When all else fails, uninstall and reinstall.  Thankfully I hadn’t invested a ton of real work into this…

According to this comment, the following command cleans up the whole deal:

Lxrun /uninstall /full

(Coda: In case you get an error re-installing afterwards, try running this command again.  I happened to end up with error code 0x80070091 for which I could find no help, but others have reported other error codes too.)

Let’s try this again from scratch.

Hope: I discovered the cbwin project is being actively developed, to enable users of Bash on Ubuntu for Win10 to launch Windows binaries from within the bash environment.  I’ll try this for the vagrant/virtualbox combo and report back.

Update

I quickly ran into limits with cbwin in this particular setup, but seemed to have found peace with a hybrid approach.

Running *nix apps on Win10 “Anniversary update”: initial findings

Ever since Microsoft announced the “Bash on Windows” inclusion in the Anniversary update of Win10, I’ve been positively *itching* to try it out.  I spent *hours* in Git Bash, Cygwin and other workarounds inside Windows to get tools like Vagrant to work natively in Windows.

Spoiler: it never quite worked. [Aside: if anyone has any idea how to get rsync to work in Cygwin or similarly *without* the Bash shell on Windows, let’s talk.  That was the killer flaw.]

Deciphering the (hidden) installation of Bash

I downloaded the update first thing this morning and got it installed, turned on Developer Mode, then…got stumped by Hanselman’s article (above) on how exactly to get the shell/subsystem itself installed.  [Seems like something got mangled in translation, since “…and adding the Feature, run you bash and are prompted to get Ubuntu on Windows from Canonical via the Windows Store…” doesn’t make any grammatical sense, and searching the Windows Store for “ubuntu”, “bash” or “canonical” didn’t turn up anything useful.]

The Windows10 subreddit’s megathread today left incomplete instructions, and a rumour that this was only available on Win10 Pro (not Home).

Instead, it turns out that you have to navigate to legacy control panel to enable, after you’ve turned on Developer Mode (thanks MSDN Blogs):

Control Panel >> Programs >> Turn Windows features on or off, then check “Windows Subsystem for Linux (Beta)”.  Then reboot once again.

WindowsFeaturesDialog

Then fire up CMD.EXE and type “bash” to initiate the installation of “Ubuntu on Windows”:

BashUbuntuWin10Install

Now to use it!

Once installed, it’s got plenty of helpful hints built in (or else Ubuntu has gotten even easier than I remember), such as:

BashUbuntuWin10Hints

Npm, rpm, vagrant, git, ansible, virtualbox are similarly ‘hinted’.

Getting up-to-date software installed

Weirdly, Ansible 1.5.4 was installed, not from the 2.x version.  What gives?  OK, time to chase a rat through a rathole…

This article implies I could try to get a trusty-backport of ansible:
https://blogs.msdn.microsoft.com/commandline/2016/04/06/bash-on-ubuntu-on-windows-download-now-3/

Does that mean the Ubuntu on Windows is effectively an old version of Ubuntu?  How can I even figure that out?

Running ‘apt-get –version’ indicates we have apt 1.0.1ubuntu2 for amb64 compiled on Jan 12 2016.  That seems relatively recent…

Running ‘apt-cache policy ansible’ gives me the following output:

mike@MIKE-WIN10-SSD:/etc/apt$ apt-cache policy ansible 
ansible: 
  Installed: 1.5.4+dfsg-1 
  Candidate: 1.5.4+dfsg-1 
  Version table: 
 *** 1.5.4+dfsg-1 0 
        500 http://archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages 
        100 /var/lib/dpkg/status

Looking at /etc/apt/sources.list, there’s only three listed by default:

mike@MIKE-WIN10-SSD:/etc/apt$ cat sources.list 
deb http://archive.ubuntu.com/ubuntu trusty main restricted universe multiverse 
deb http://archive.ubuntu.com/ubuntu trusty-updates main restricted universe multiverse 
deb http://security.ubuntu.com/ubuntu trusty-security main restricted universe multiverse

So is there some reason why Ubuntu-on-Windows’ package manager (apt) doesn’t even list > 1.5.4 as an available installation?  ‘Cause I was previously running v2.2.0 of Ansible on native Ubuntu (just last month).

I *could* run from source in a subdirectory from my home directory – but I’m shamefully (blissfully?) unaware of the implications – are there common configuration files that might stomp on each other?  Is there common code stuffed in some dark location that is better left alone?

Or should I add the source repo mentioned here?  That seems the safest option, because then apt should manage the dependencies and not leave me with two installs of ansible (1.5.4 and 2.x).

Turns out the “Latest Releases via Apt (Ubuntu)” seems to have done well enough – now ‘ansible –version’ returns “ansible 2.1.1.0”, which appears to be latest according to https://launchpad.net/~ansible/+archive/ubuntu/ansible.

Deciphering hands-on install dependencies

Next I tried installing vagrant, which went OK, but then complained about an incomplete installation:

mike@MIKE-WIN10-SSD:/mnt/c/Users/Mike/VirtualBox VMs/BaseDebianServer$ vagrant up

VirtualBox is complaining that the installation is incomplete. Please run
`VBoxManage --version` to see the error message which should contain
instructions on how to fix this error. 

mike@MIKE-WIN10-SSD:/mnt/c/Users/Mike/VirtualBox VMs/BaseDebianServer$ VBoxManage --version 
WARNING: The character device /dev/vboxdrv does not exist. 
         Please install the virtualbox-dkms package and the appropriate 
         headers, most likely linux-headers-3.4.0+. 

         You will not be able to start VMs until this problem is fixed.

So, tried ‘sudo apt-get install linux-headers-3.4.0’ and it couldn’t find a match.  Tried ‘apt-cache search linux-headers’ and it came back with a wide array of options – 3.13, 3.16, 3.19, 4.2, 4.4 (many subversions and variants available).

Stopped me dead in my tracks – which one would be appropriate to the Ubuntu that ships as “Ubuntu for Windows” in the Win10 Anniversary Update?  Not that header files *should* interact with the operations of the OS, but on the off-chance that there’s some unexpected interaction, I’d rather be a little methodical than have to figure out how to wipe and reinstall.

Figuring out what is the equivalent “version of Ubuntu” that ships with this subsystem isn’t trivial:

  • According to /etc/issue, it’s “Ubuntu 14.04.4 LTS”.
  • What version of the Linux kernel comes with 14.04.4?
  • According to ‘uname -r’, it’s “3.4.0+”, which seems suspiciously under-specific.
  • According to /proc/version, its “Linux version 3.4.0-Microsoft (Microsoft@Microsoft.com) (gcc version 4.7 (GCC) ) #1 SMP PREEMPT Wed Dec 31  14:42:53 PST 2014”.

That’s enough for one day – custom versions of the OS should make one ponder.  Tune in next time to see what kind of destruction I can wring out of my freshly-unix-ized Windows box.

P.S. Note to self: it’s cool to get an environment running; it’s even better for it to stay up to date.  This dude did a great job of documenting his process for keeping all the packages current.