Status Report: Docker Toolbox failures on Windows 10

TL;DR

A growing body of experience building containers on Windows 10 systems (using the Docker Toolbox for Windows) indicates that Docker Toolbox for Windows (on Win10 at minimum) is not a supportable combination for our project at present.  It appears that the command in docker-compose.yml cannot be found – at least in cases where the called script is stored in a directory inside the container.

While there may be some adjustment that could be made to support both Windows and *NIX hosts in this scenario, it’s yet another incompatibility we’ve encountered (among many) that only crops up when using Windows as the host, and our team has to focus its cycles on supporting the majority of our project’s developers (who aren’t on Windows).

Problem

One of my colleagues on the project reported the following error when bringing up the Docker container from this branch in this repo.  He’s using Windows 10, Docker Toolbox for Windows:

user@DESKTOP MINGW64 /c/develop/python/team-budget/budget_proj (dockerize)
$ docker-compose up
Starting budgetproj_web_1

ERROR: for web Cannot start service web: oci runtime error: container_linux.go:247: starting container process caused "exec: \"/code/docker-entrypoint.sh\": stat /code/docker-entrypoint.sh: no such file or directory"
ERROR: Encountered errors while bringing up the project.

I explained that as far as I understand Docker engine and Docker Toolbox (which hosts the engine in a Virtualbox VM), what’s going on is that inside the container, Docker is trying to execute /code/docker-entrypoint.sh – so theoretically there should be no reason why this would behave any differently on Windows than Mac or Linux, since the runtime environment inside the Docker container shouldn’t know anything about its underlying host’s environment.  I know for sure it’s working well on Mac and Linux, even on my personal Mac that’s running the Docker Toolbox.

Investigation

Budget repo on Windows: fails

I attempted this myself with the same branch/repo using Docker Toolbox for Windows (downloaded today, running v1.13.1 of Docker engine) on Windows 10 (Anniversary update), and received effectively the same result:

...

Step 12/12 : WORKDIR /code
 ---> 37cb2ce39964
Removing intermediate container af6440be2e49
Successfully built 37cb2ce39964
WARNING: Image for service web was built because it did not already exist. To rebuild this image you must use `docker-compose build` or `docker-compose up --build`.
Creating budgetproj_web_1
Attaching to budgetproj_web_1
web_1  | standard_init_linux.go:178: exec user process caused "no such file or directory"
budgetproj_web_1 exited with code 1

Budget repo on Mac: succeeds

The same commit from the same branch/repo on OS X 10.11 with Docker Toolbox for Mac:

...

Step 12/12 : WORKDIR /code
 ---> 0da697ffe35c
Removing intermediate container d4df9c99e8f9
Successfully built 0da697ffe35c
Recreating budgetproj_web_1
Attaching to budgetproj_web_1
web_1 | Operations to perform:
web_1 | Apply all migrations: admin, auth, budget_app, contenttypes, sessions
web_1 | Running migrations:
web_1 | Applying contenttypes.0001_initial... OK
web_1 | Applying auth.0001_initial... OK
web_1 | Applying admin.0001_initial... OK
web_1 | Applying admin.0002_logentry_remove_auto_add... OK
web_1 | Applying contenttypes.0002_remove_content_type_name... OK
web_1 | Applying auth.0002_alter_permission_name_max_length... OK
web_1 | Applying auth.0003_alter_user_email_max_length... OK
web_1 | Applying auth.0004_alter_user_username_opts... OK
web_1 | Applying auth.0005_alter_user_last_login_null... OK
web_1 | Applying auth.0006_require_contenttypes_0002... OK
web_1 | Applying auth.0007_alter_validators_add_error_messages... OK
web_1 | Applying auth.0008_alter_user_username_max_length... OK
web_1 | Applying budget_app.0001_initial... OK
web_1 | Applying budget_app.0002_auto_20170221_0359... OK
web_1 | Applying sessions.0001_initial... OK
web_1 | [2017-03-01 21:42:05 +0000] [10] [INFO] Starting gunicorn 19.6.0
web_1 | [2017-03-01 21:42:05 +0000] [10] [INFO] Listening at: http://0.0.0.0:8000 (10)
web_1 | [2017-03-01 21:42:05 +0000] [10] [INFO] Using worker: sync

So I tried this on a couple of other of our organization’s projects.

Housing repo on Windows: fails

Housing-17 under Docker Toolbox for Windows on Windows 10:

$ docker-compose up --build
Building web
Step 1/6 : FROM python:3.5
 ---> 4e5ed9f6613e
Step 2/6 : ENV PYTHONUNBUFFERED 1
 ---> Using cache
 ---> a62a6ae73cec
Step 3/6 : ADD ./requirements.txt /provision/
 ---> Using cache
 ---> 9f34a7d35294
Step 4/6 : WORKDIR /provision/
 ---> Using cache
 ---> 4e06b4c2249f
Step 5/6 : RUN pip install -r requirements.txt
 ---> Using cache
 ---> e96a581fc549
Step 6/6 : WORKDIR /code/
 ---> Using cache
 ---> fc61cc36c06f
Successfully built fc61cc36c06f
Starting housing17_db_1
Starting housing17_web_1
Attaching to housing17_db_1, housing17_web_1
db_1   | LOG:  database system was shut down at 2017-03-01 19:38:08 UTC
db_1   | LOG:  MultiXact member wraparound protections are now enabled
db_1   | LOG:  database system is ready to accept connections
web_1  | standard_init_linux.go:178: exec user process caused "no such file or directory"
housing17_web_1 exited with code 1

Housing repo on Mac: succeeds

Housing-17 under Docker Toolbox for Mac on OS X 10.11:

...

Step 6/6 : WORKDIR /code/
 ---> Using cache
 ---> f79cbc2964cb
Successfully built f79cbc2964cb
Starting housing17_db_1
Starting housing17_web_1
Attaching to housing17_db_1, housing17_web_1
db_1 | LOG: database system was shut down at 2017-03-01 03:26:27 UTC
db_1 | LOG: MultiXact member wraparound protections are now enabled
db_1 | LOG: database system is ready to accept connections
web_1 | 
web_1 | 0 static files copied to '/code/static', 126 unmodified.
web_1 | [2017-03-01 21:50:00 +0000] [7] [INFO] Starting gunicorn 19.6.0
web_1 | [2017-03-01 21:50:00 +0000] [7] [INFO] Listening at: http://0.0.0.0:8000 (7)
web_1 | [2017-03-01 21:50:00 +0000] [7] [INFO] Using worker: sync

Emergency_response repo on Windows: fails

Emergency-response-backend under Docker Toolbox for Windows on Windows 10:

... 

---> d1dece959fab
Removing intermediate container 446d7dae0532
Step 14/14 : COPY . /code/
 ---> d12ccbed9557
Removing intermediate container 27809cb3988a
Successfully built d12ccbed9557
WARNING: Image for service web was built because it did not already exist. To rebuild this image you must use `docker-compose build` or `docker-compose up --build`.
Creating emergencyresponsebackend_web_1
Attaching to emergencyresponsebackend_web_1
web_1  | standard_init_linux.go:178: exec user process caused "no such file or directory"
emergencyresponsebackend_web_1 exited with code 1

Emergency_response repo on Mac: succeeds/fails (but for an application-specific reason)

Emergency-response-backend under Docker Toolbox for Mac on OS X 10.11:

Step 14/14 : COPY . /code/
 ---> bc4a1bd8e372
Removing intermediate container e9b079ea31da
Successfully built bc4a1bd8e372
WARNING: Image for service web was built because it did not already exist. To rebuild this image you must use `docker-compose build` or `docker-compose up --build`.
Creating emergencyresponsebackend_web_1
Attaching to emergencyresponsebackend_web_1
web_1 | Traceback (most recent call last):
web_1 | File "manage.py", line 10, in <module>
web_1 | execute_from_command_line(sys.argv)
web_1 | File "/usr/local/lib/python3.4/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
web_1 | utility.execute()
web_1 | File "/usr/local/lib/python3.4/site-packages/django/core/management/__init__.py", line 316, in execute
web_1 | settings.INSTALLED_APPS
web_1 | File "/usr/local/lib/python3.4/site-packages/django/conf/__init__.py", line 53, in __getattr__
web_1 | self._setup(name)
web_1 | File "/usr/local/lib/python3.4/site-packages/django/conf/__init__.py", line 41, in _setup
web_1 | self._wrapped = Settings(settings_module)
web_1 | File "/usr/local/lib/python3.4/site-packages/django/conf/__init__.py", line 97, in __init__
web_1 | mod = importlib.import_module(self.SETTINGS_MODULE)
web_1 | File "/usr/local/lib/python3.4/importlib/__init__.py", line 109, in import_module
web_1 | return _bootstrap._gcd_import(name[level:], package, level)
web_1 | File "<frozen importlib._bootstrap>", line 2254, in _gcd_import
web_1 | File "<frozen importlib._bootstrap>", line 2237, in _find_and_load
web_1 | File "<frozen importlib._bootstrap>", line 2226, in _find_and_load_unlocked
web_1 | File "<frozen importlib._bootstrap>", line 1200, in _load_unlocked
web_1 | File "<frozen importlib._bootstrap>", line 1129, in _exec
web_1 | File "<frozen importlib._bootstrap>", line 1471, in exec_module
web_1 | File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
web_1 | File "/code/emergency_response_api/settings.py", line 16, in <module>
web_1 | from . import project_config
web_1 | ImportError: cannot import name 'project_config'
web_1 | Traceback (most recent call last):
web_1 | File "manage.py", line 10, in <module>
web_1 | execute_from_command_line(sys.argv)
web_1 | File "/usr/local/lib/python3.4/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
web_1 | utility.execute()
web_1 | File "/usr/local/lib/python3.4/site-packages/django/core/management/__init__.py", line 316, in execute
web_1 | settings.INSTALLED_APPS
web_1 | File "/usr/local/lib/python3.4/site-packages/django/conf/__init__.py", line 53, in __getattr__
web_1 | self._setup(name)
web_1 | File "/usr/local/lib/python3.4/site-packages/django/conf/__init__.py", line 41, in _setup
web_1 | self._wrapped = Settings(settings_module)
web_1 | File "/usr/local/lib/python3.4/site-packages/django/conf/__init__.py", line 97, in __init__
web_1 | mod = importlib.import_module(self.SETTINGS_MODULE)
web_1 | File "/usr/local/lib/python3.4/importlib/__init__.py", line 109, in import_module
web_1 | return _bootstrap._gcd_import(name[level:], package, level)
web_1 | File "<frozen importlib._bootstrap>", line 2254, in _gcd_import
web_1 | File "<frozen importlib._bootstrap>", line 2237, in _find_and_load
web_1 | File "<frozen importlib._bootstrap>", line 2226, in _find_and_load_unlocked
web_1 | File "<frozen importlib._bootstrap>", line 1200, in _load_unlocked
web_1 | File "<frozen importlib._bootstrap>", line 1129, in _exec
web_1 | File "<frozen importlib._bootstrap>", line 1471, in exec_module
web_1 | File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
web_1 | File "/code/emergency_response_api/settings.py", line 16, in <module>
web_1 | from . import project_config
web_1 | ImportError: cannot import name 'project_config'
...

(Note: the runtime error in “manage.py” indicates that the command in docker-compose.yml was executed correctly, and there’s an app-specific issue in running one of the commands inside the docker-entrypoint.sh script that docker-compose.yml specifies.)

Potentially related issues re: Docker running on Windows

Absolute paths change with git bash on Windows

Docker Build Image fails with ‘NOT FOUND’ executing…

Bash script always prints command not found (perhaps a missing ‘execute’ bit, that only affects commands running in a Docker container when running on Docker Toolbox for Windows?)

 

Advertisements

Scaling the Cliffs of Insanity (aka using Ansible from a Windows controller)

cliffs-of-insanity
Dread Pirate Roberts and Princess Buttercup, out for a morning stroll

This post is just record-keeping for me to remember why I abandoned Ansible on Windows a few months back.

here-there-be-dragons
Seriously, here be dragons

Here’s what I did to enable me to use Ansible for automation

  • Since I’m using a Windows box as my primary desktop for now, I wanted to see if I could avoid running a Linux VM just to manage other *NIX boxes
  • I thought I should be able to get away with using the Git for Windows environment rather than the full Cygwin environment that Jeff Geerling documents here (spoiler alert: I couldn’t, but it’s instructive to document how far I got and where it broke down)

Install and troubleshoot Ansible on Git for Windows

  • I followed installation instructions from step (3) (including using those legacy versions of PyYAML and Jinja2)
  • I skipped step (6) since I already had SSH keys
  • I ran ansible –version, and saw errors like:
     $ ansible --version
     Traceback (most recent call last):
       File "C:/Users/Mike/code/ansible/bin/ansible", line 46, in <module>
        from ansible.utils.display import Display
       File "C:\Users\Mike\code\ansible\lib\ansible\utils\display.py", line 21, in <module>
        import fcntl
     ImportError: No module named 'fcntl'
  • Tried resolving that dependency via http://stackoverflow.com/questions/1422368/fcntl-substitute-on-windows
  • This just led to the following (which seemed a hint that I might encounter many such issues, not worth it):
     $ ansible --version
     Traceback (most recent call last):
       File "C:/Users/Mike/code/ansible/bin/ansible", line 46, in <module>
        from ansible.utils.display import Display
       File "C:\Users\Mike\code\ansible\lib\ansible\utils\display.py", line 33, in <module>
        from termios import TIOCGWINSZ
     ImportError: No module named 'termios'
  • Gave in, uninstalled Python 3.5.1 and jumped back to latest Python 2.x (2.7.1)
  • Still couldn’t get past that error, so I gave up and went to Cygwin

Install and troubleshoot Ansible on Cygwin64

Then I found this article, followed a combo of the top answer:

  • apt-cyg remove python-cryptography
  • git clone –depth 1 git://github.com/ansible/ansible
  • apt-cyg install git python-{jinja2,six,yaml}
  • wget rawgit.com/transcode-open/apt-cyg/master/apt-cyg && install apt-cyg /bin
  • That results in this output:
     $ ansible --version
     ansible 2.2.0 (devel 37737ca6c1) last updated 2016/05/11 14:27:18 (GMT -700)
       lib/ansible/modules/core:  not found - use git submodule update --init lib/ansible/modules/core
       lib/ansible/modules/extras:  not found - use git submodule update --init lib/ansible/modules/extras
       config file =
       configured module search path = ['/opt/ansible/library']
  • Then running this command from within the local ansible repo (e.g. from /opt/ansible) gets the basic modules you’ll need to get started (e.g. the “ping” module for testing)
    git submodule update –init –recursive
  • Then I decided to test ansible’s ability to connect to the target host based on the command recommended in Ansible’s installation docs:
     $ ansible all -m ping --ask-pass
     SSH password:
     192.168.1.13 | FAILED! => {
        "failed": true,
        "msg": "to use the 'ssh' connection type with passwords, you must install the sshpass program"
     }
  • OK, let’s get sshpass installed on my control device [Windows box] so that we can quickly bootstrap the SSH keys to the target…
  • Since sshpass isn’t an available package in Cygwin package directory, the only way to get sshpass is to install it from source
    •  NOTE: I discovered this the hard way, after trying every trick I could think of to make all this work without having to deal with a C compiler
  • Found this article for OS X users and followed it in my Windows environment (NOTE: I had to install the “make” and “gcccore” packages from Cygwin setup – and then having to re-run apt-cyg remove python-cryptography from the Cygwin terminal again because it gets automatically reinstalled by Cygwin setup)
  • This time when running the command I get a different error – which means I must’ve got sshpass stuffed in the right location:
     $ ansible all -m ping --ask-pass
     SSH password:
     192.168.1.13 | FAILED! => {
        "failed": true,
        "msg": "Using a SSH password instead of a key is not possible because Host Key checking is enabled and sshpass does not support this.  Please add this host's fingerprint to your known_hosts file to manage this host."
     }
  • I’m guessing this means I need to obtain the target host’s SSH public key (and not that the target host refused to connect because it didn’t trust the control host).  So then I had to harvest the target host’s fingerprint
     $ ssh -o StrictHostKeyChecking=ask -l mike 192.168.1.13
     The authenticity of host '192.168.1.13 (192.168.1.13)' can't be established.
     ECDSA key fingerprint is SHA256:RqBcq8aCAgohFeiTlPeYd8hLDfTz1A25ZPlzyxlDrqI.
     Are you sure you want to continue connecting (yes/no)? yes
     Warning: Permanently added '192.168.1.13' (ECDSA) to the list of known hosts.
     mike@192.168.1.13's password:
  • This time around when running the ping command I saw this error:
     $ ansible all -m ping --ask-pass
     SSH password:
     192.168.1.13 | UNREACHABLE! => {
        "changed": false,
        "msg": "Authentication failure.",
        "unreachable": true
     }
  • I finally got the bright idea to poke around the target’s log files and see if there were any clues – sure enough, /var/log/auth.log made it clear enough to me:
     Invalid user Mike from 192.168.1.10
     input_userauth_request: invalid user Mike [preauth]
      pam_unix(sshd:auth): check pass; user unknown
      pam_unix(sshd:auth): authentication failure...
      Failed password for invalid user Mike from 192.168.1.10 port 58665 ssh2
      Connection closed by 192.168.1.10 [preauth]
  • (Boy do I hate *NIX case-sensitivity at times like this – my Windows username on the control device is “Mike” and the Linux username on the target is “mike”)
  • Tried this same command but adding the root user (ansible all -m ping –user root –ask-pass) but kept getting the same error back, and saw /var/log/auth.log reported “Failed password for root from 192.168.1.10”)
  • Re-ran su root in bash on the target just to make sure I wasn’t forgetting the password (I wasn’t)
  • Tried a basic ssh connection with that same password (to eliminate other variables):
     $ ssh root@192.168.1.13
     root@192.168.1.13's password:
     Permission denied, please try again.
     root@192.168.1.13's password:
     Permission denied, please try again.
     root@192.168.1.13's password:
     Permission denied (publickey,password).
  • Tried the same thing with my “mike” user, success:
     $ ssh mike@192.168.1.13
     mike@192.168.1.13's password:
     The programs included with the Debian GNU/Linux system are free software;
     the exact distribution terms for each program are described in the
     individual files in /usr/share/doc/*/copyright.
     Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
     permitted by applicable law.
     Last login: Sat May 14 16:10:05 2016 from 192.168.1.10
     mike@debianbox:~$
  • One more try with the ansible/SSH/password approach:
     $ ansible all -m ping --user mike --ask-pass
     SSH password:
     192.168.1.13 | UNREACHABLE! => {
        "changed": false,
        "msg": "Failed to connect to the host via ssh.",
        "unreachable": true
     }
  • So close!!  Search led to this possibility, so I re-ran with -vvvv param to get this:
     $ ansible all -m ping --user mike --ask-pass -vvvv
     No config file found; using defaults
     SSH password:
     Loaded callback minimal of type stdout, v2.0
     <192.168.1.13> ESTABLISH SSH CONNECTION FOR USER: mike
     <192.168.1.13> SSH: EXEC sshpass -d44 ssh -C -vvv -o ControlMaster=auto -o ControlPersist=60s -o User=mike -o ConnectTimeout=10 -o ControlPath=/home/Mike/.ansible/cp/ansible-ssh-%h-%p-%r 192.168.1.13 '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1463273031.8-128175022355609 `" && echo "` echo $HOME/.ansible/tmp/ansible-tmp-1463273031.8-128175022355609 `" )'"'"''
     <192.168.1.13> PUT /tmp/tmp948_ec TO /home/mike/.ansible/tmp/ansible-tmp-1463273031.8-128175022355609/ping
     <192.168.1.13> SSH: EXEC sshpass -d44 sftp -b - -C -vvv -o ControlMaster=auto -o ControlPersist=60s -o User=mike -o ConnectTimeout=10 -o ControlPath=/home/Mike/.ansible/cp/ansible-ssh-%h-%p-%r '[192.168.1.13]'
     192.168.1.13 | UNREACHABLE! => {
        "changed": false,
        "msg": "SSH Error: data could not be sent to the remote host. Make sure this host can be reached over ssh",
        "unreachable": true
     }
  • I can definitely see the ansible-tmp-1463273031.8-128175022355609 file under ~/.ansible/tmp on the target system, so Ansible is getting authenticated and can run the initial shell commands
  • But I’m not seeing the /ping file under that directory, and I’m wondering if there’s something preventing sftp from connecting to the target host (since that’s the final command being run just before I get back the error). The “sftp” program is available on the controller though.
  • Digging around in Wireshark, I can see the SSHv2 traffic between controller and target, but after the initial key exchange I see exactly 3 encrypted packets sent from the controller (and encrypted responses from the target), and then no further communication between the two thereafter.  The only other activity I see on either system that’s unexplained is the target device ARP’ing for the local router three times, once every second, after the SSH traffic dies off
  • After the first two initial successes in getting the tmp files pushed via ssh, I’ve since only had failures “Failed to connect to the host via ssh.”  Using ProcExp.exe to verify that there is actual network traffic being sent to the target’s IP, and using Wireshark to get some idea what’s getting through and what’s not (but Wireshark is acting up and no longer showing me traffic from controller to target, only target to controller responses, so it’s getting a little nuts at this point)
  • I’ve added “PTR” records to the hosts files on both the controller and the target to resolve the IP address for each other to a defined name, but I’m still getting “failed to connect…” (even though I can confirm that the tools are using the newly-registered name, since I even tried substituting the falsified name in the ansible_hosts file for the previously-used IP address)
  • I tried the advice from here to switch to SCP if SFTP might not be working, but that didn’t help (so I emptied out the .ansible.cfg file again)
  • I don’t know where to look for log files to see exactly what errors are occurring locally, so I’m pretty much stumped at this point
  • !!!!!! I pushed my ssh key to the target host, and the very next run of ansible all -m ping succeeded!!!! 😦
  • CONCLUSION: ssh-pass doesn’t seem compatible with the Windows setup I’ve been using all weekend
  • EPILOGUE: “Failed to connect…” error is back again when running ansible from Windows – I can see successful auth in the target’s /var/log/auth.log, but even -m ping fails (e.g. ansible all -m ping -u root)

Other References

http://everythingshouldbevirtual.com/ansible-using-ansible-on-windows-via-cygwin

https://help.ubuntu.com/community/SSH/OpenSSH/Keys

Simplifying Vagrant-based testing: unsolved (I’m just calling it out to the universe)

I’m doing some pretty mind-numbing testing using Vagrant (yes, on Windows 10 – I like the challenge, apparently!), to make sure that I’m getting the results from changes I’m making to Ansible scripts.  Currently I’m testing the implementation of Ansible Vault, which means at each step of testing I:

  1. Vagrant destroy whatever box I just worked on
    • Which half the time means Vagrant and Virtualbox get out of sync, and I need to delete files and just vagrant init)
  2. Vagrant up
    • If I just init’d a new box, then I have to go into the Vagrantfile to uncomment then edit the config.vm.synced_folder setting, so that it removes the rsync dependency (setting it to config.vm.synced_folder “.”, “/vagrant”, disabled:true) – otherwise, vagrant up halts when it can’t find an rsync executable
  3. Mount the VM in Virtualbox Manager – Machine, Add…, find the .vbox file), then  launch the VM from VBox Mgr, login as vagrant, and edit the /etc/ssh/sshd_config file to set all instances of PasswordAuthentication to “yes”
  4. Reboot the VM
  5. Vagrant up
  6. Run ssh-keygen -f “/home/mike/.ssh/known_hosts” -R [127.0.0.1]:2222 to clear out the previously-trusted host SSH key
  7. Run ssh-copy-id vagrant@127.0.0.1 -p 2222 to add my user’s SSH public key to the remote system (to enable Ansible to run over SSH)

I haven’t had time yet to start researching how to troubleshoot/automate each of these steps, but which I’ll eventually have to conquer so that I’m not re-learning the manual steps every time I return to volunteering a little spare time to this infrastructure project.

Why doesn’t chmod under Bash on Ubuntu on Windows 10 actually “take”?

I’m continuing to beat my head against a wall, attempting to test a very simple configuration change to an Ansible playbook I wrote, so that I can verify if my understanding of the use of Ansible vault is correct.

The latest problem?  Unix permissions.

Now that I’ve got SSH communications working between by Bash shell (Ubuntu on Windows 10, aka WSL), I’ve implemented changes to the playbook’s files including creating a .vault_pass.txt file under the Bash shell, and encrypting a vault.yml file using the password contained in the .vault_pass.txt.

When I run ansible-playbook role.yml –vault-password-file .vault_pass.txt, it complains of the following:

mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ ansible-playbook role.yml --vault-password-file .vault_pass.txt
ERROR! Problem running vault password script / m n t / c / U s e r s / M i k e / c o d e / C o p y - a n s i b l e - r o l e - u n a t t e n d e d - u p g r a d e s / . v a u l t _ p a s s . t x t ([Errno 8] Exec format error). If this is not a script, remove the executable bit from the file.

No problem, I’ve got this.  Just gotta run chmod 600 (or similarly, to remove the execute bit for my user) on the .vault_pass.txt file.  [For comparison, I just tried this on the same configuration under Ubuntu – which is having a different blocking issue at present, but not related to file permissions – and the command took immedateily.]  Hah, you should be so lucky:

mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ ls -la .vault_pass.txt
-rwxrwxrwx 1 root root 25 Sep 26 18:38 .vault_pass.txt
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ chmod 600 .vault_pass.txt
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ ls -la .vault_pass.txt
-rwxrwxrwx 1 root root 25 Sep 26 18:38 .vault_pass.txt
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ whoami
mike
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ sudo chmod 600 .vault_pass.txt
[sudo] password for mike:
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ ls -la .vault_pass.txt
-rwxrwxrwx 1 root root 25 Sep 26 18:38 .vault_pass.txt

Yes, I get that the file is owned by root, and I’m running as mike – so why doesn’t it make a difference when I run sudo chmod?  Is this a problem with files owned by root?  Is this a problem with chmod?  Is this a problem with WSL/Bash?

Lightbulb moment

I went hunting for such issues in the Microsoft repo for the Bash On Windows project, and found this issue & comment:

https://github.com/Microsoft/BashOnWindows/issues/81#issuecomment-207553514

So I figured I re-examine the situation.  All my files under the ~/code folder are owned by root – even . and .., which is odd…

mike@MIKE-WIN10-SSD:~/code$ ls -la
total 68
drwxrwxrwx 2 root root 0 Sep 26 10:51 .
drwxrwxrwx 2 root root 0 Aug 16 17:00 ..
drwxrwxrwx 2 root root 0 Aug 16 16:28 ansible-role-unattended-upgrades

Then I looked at my home folder and – d’oh!

mike@MIKE-WIN10-SSD:~$ ls -la
total 24
drwxr-xr-x 2 mike mike 0 Sep 26 18:37 .
drwxr-xr-x 2 root root 0 Dec 31 1969 ..
-rw------- 1 mike mike 2452 Aug 16 22:48 .bash_history
-rw-r--r-- 1 mike mike 220 Aug 5 10:06 .bash_logout
-rw-r--r-- 1 mike mike 3637 Aug 5 10:06 .bashrc
lrwxrwxrwx 1 mike mike 22 Aug 16 12:58 code -> /mnt/c/Users/Mike/code

Idiot.

Now I remember: when I first sat down with this Bash On Ubuntu on Windows setup, I figured I’d save myself some trouble by using the exact same files in all my local repos – why bother duplicating the repos between Windows and Bash on Ubuntu?  So I symlinked a mount of the /code folder from my Windows user profile…and left myself a nice little landmine, it seems.

Rather than struggle with cacls.exe and try to find some magic combination that results in non-executable permissions on that file through the WSL translation layer (if at all), I just cloned the repo to a different folder (local to the Bash/Ubuntu/Win10 environment) and retried, with trivial success.

Simple troubleshooting the usual SSH error from Ansible

After the struggles I’ve had over the last couple of days, it’s strangely reassuring to stumble on a problem I’ve actually *seen* before, and recently.  Firsthand even.

I’ve fresh-built a Debian 8.5.2 VM in Virtualbox via Vagrant.  Then I setup an Ansible inventory file to point to the box’s current vagrant ssh-config settings.  Then fired off the tried-and-true ansible connectivity test, ansible all -u vagrant -m ping.  Here’s the response:

127.0.0.1 | UNREACHABLE! => {
    "changed": false,
    "msg": "ERROR!  SSH encountered an unknown error during the connection.  We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue",
    "unreachable": true
}

Running the same command with -vvvv parameter results in a garbage heap of unformatted/concatenated debug screed, which ends with:

debug1: No more authentication methods to try.\r\nPermission denied (publickey,password).\r\n"

Simple Solution

As I’ve documented to myself already elsewhere, I need to run the following two commands:

ssh-keygen -t rsa

ssh-copy-id -p 2222 vagrant@127.0.0.1

Bingo!  Nice to get an easy win.

Troubleshooting another SSH blocker (networking?) in debian/jessie64

Since I ran into another wall with trying to use Ansible Vault under Bash on Ubuntu on Windows10 (this time, chmod wouldn’t change the permissions on the .vault_pass.txt file from 755 to 600 – or any other permissions set for that matter), I went back to my Linux-based setup to try out the Ansible Vault solution I’d devised.

Here, I ended up unable to communicate with the VM using Ansible because SSH from Ubuntu to the Debian8 box had an incompatibility – to wit, when I ran ssh vagrant@127.0.0.1 -p 2222, the command eventually timed out with the error “ssh_exchange_identification: read: Connection reset by peer”.

This is yet another piece of evidence that someone very recently (I believe between the 8.5.2 and the 8.6.0 versions of the box on Atlas) made breaking changes to the OpenSSH and/or OpenSSL configuration of the box.  One change I’ve figured out is they disabled PasswordAuthentication in the /etc/ssh/sshd_config file.

This problem?  Looks like (based on my read of articles like this one) the ssh client and server can’t agree on some cryptographic parameter.  Fun.  Cause there’s only about a million combinations of these parameters to play with.

[I also pursued ideas like the solution to this report, but currently the Debian8 box’s /etc/hosts.deny is still empty of uncommented entries.  Or the “is sshd running” idea from this report, but /var/log/auth.log definitely includes “[date] jessie sshd[366]: Server listening on 0.0.0.0 port 22”.]

OK, so what’s the fastest way to isolate the set of parameters  that are being offered and demanded between the client and server?

Running the ssh client with -vvv parameter doesn’t help much – it enumerates the “key_load_public” attempts (rsa, rsa-cert, dsa, dsa-cert, ecdsa, ecdsa-cert, ed25519, ed25519-cert), then “Enabling compatibility mode for protocol 2.0” and the SSH version “Local version string SSH-2.0-OpenSSH_7.2ps Ubuntu-4ubuntu2.1”, then fires off the “connection reset by peer” error again.  Dpkg -l reports that openssh-client is “1:7.2ps-4ubuntu2.1”.

What’s the server’s version of OpenSSH?  According to dpkg -l, it’s “1;6.7p1-5+deb8u3.  Is that right – 1.6.7?  And if so, how do I find out if there’s a cryptographic configuration incompatibility between 1.7.2 and 1.6.7?  [Certainly I can see that we have no such “connection reset by peer” issue between my Win10 Bash on Ubuntu shell, running 1.6.6p1 of openssh-client and the Debian8 box’s 1.6.7p1, so cryptographic compatibility between 1.6.6 and 1.6.7 is a reasonable assumption.]  Or better, is it possible to upgrade the Debian8 box’s openssh-server to something later than 1.6.7 – preferably (but not exclusively) 1.7.2?

On the server, I can crawl through the /etc/ssh/sshd_config” file to look for configured parameters (RSAAuthentication yes for example), but that doesn’t tell me what the OpenSSH defaults are, and doesn’t tell me what’s necessarily being asked of OpenSSL either (which might be swallowing the actual error).

Aside/Weirdness: networking

I started to pursue the idea of upgrading OpenSSH, so I ran sudo apt-get update to prepare for updating everything in the VM.  That’s when I noticed I wasn’t getting any network connectivity, as it spat back “Could not resolve ‘security.debian.org'” and “Could not resolve ‘httpredir.debian.org'”.

Vbox Mgr indicates I’m using NAT networking (the default), which has worked for me in the past – and works fine for the same Vagrant box running on my Win10 VirtualBox/Vagrant instance (sudo apt-get update “Fetched 529 kB in 3s (142kB/s)”).  Further, the Ubuntu host for this VM has no problem reaching the network.

So I tried changing to Bridged Adapter in Vbox Manager.  Nope, no difference.  Why does the same Vagrant box work fine under Windows but not under Ubuntu?  Am I cursed?

Back to the root problem

Let me review: I’m having a problem getting Ansible to communicate with the VM over SSH.  So let’s get creative:

  • Can Ansible be coerced into talking to the target without SSH?
  • Can Ansible use password authentication instead of public key authentication for SSH?
  • Can the Ubuntu client be downgraded from 1.7.2 to 1.6.7 openssh-client?

Lightbulb moment

Of course!  The “connection reset by peer” issue isn’t a matter of deep crypto at all – unless I’m misreading this, the fact that the Ubuntu SSH client takes nearly a minute to return the “connection reset” error and the fact that the Debian VM doesn’t seem to have any IP networking ability off the host…adds up to SSH client not even connecting to the VM’s sshd?

Boy do I feel dumb.  This has nothing to do with crypto – it’s simple layer 3 issues.

Reminds me of a lesson I learned 20 years ago, and seem to re-learn every year or three: “When you hear hooves, think horses not zebras.”

Then how do we establish where the problem is – Virtualbox, Ubuntu, Debian or something else?

  • If it’s a problem in the Debian VM, then download a different Vagrant box
  • If it’s a problem in the Virtualbox setting, keep trying different network settings until one breaks through
  • If it’s a problem in the Ubuntu host, look for reasons why there’d be a block between 127.0.0.1 (host to VM or vice-versa)

What other evidence do we have?  Well, when I run vagrant up from the Ubuntu host, it gets to “default: SSH auth method: private key” then eventually reports “Timed out while waiting for the machine to boot.  This means that Vagrant was unable to communicate with the guest machine within the configured (“config.vm.boot_timeout” value) time period.”  Makes me more suspicious of the VM.

Searching the Vagrant boxes registry, mosaicpro/html looks like it’s desktop (not locked-down server) oriented, so I tried that one.  Watched it boot, then report “default: Warning: Remote connection disconnect. Retrying…” over and over for a few minutes.  The console via Vbox Mgr looked like the Ubuntu VM was trying to configure networking (even though DHCP had offered it an address of 10.2.0.15 – which must’ve been the NAT adapter, since my home network runs on 192.168.1/24).  But oddly, networking from within the client was working fine after that – ping out to my home router (192.168.1.1) returned fine.  OK, then I’m *definitely* suspecting that Debian/jessie64 (8.6.0) box.

Vagrant/Debian downgrade anyone?

So, after all this, can I download a previous version of the Debian/jessie64 box (e.g. 8.5.2, not this troublesome 8.6.0)?  Let’s try it, using this article as basis.

(I went one step further and ran the initial command as vagrant add debian-8.5.2 https://atlas.hashicorp.com/debian/boxes/jessie64/versions/8.5.2/providers.virtualbox.box – and amazingly, this variation seemed to work!)

And here’s some promising results:

  • lsb_release -a reports the 8.5.2 box as “Debian GNU/Linux 8.5 (jessie)”, vs the 8.6.0 box as “Debian GNU/Linux 8.6 (jessie)”
  • A quick look at the /etc/ssh/sshd_config from the 8.5.2 box shows there is *no* insertion of the PasswordAuthentication configuration parameter (let alone setting it to “no” like in the 8.6.0 box)
  • Network connectivity from the 8.5.2 box to my home router is awesome (vs the 8.6.0 box that can’t seem to ping out of a wet paper bag)

Final Lesson

If you’re a Vagrant + Virtualbox user, stay FAR away from the 8.6.0 version of the debian/jessie64 box (unless you’re prepared to fight with these same issues I have, and probably other ‘security lockdown’ ideas that I haven’t even uncovered yet, but are almost surely there).

Troubleshooting SSH blocker in the Debian/jessie64 Vagrant box

After getting Vagrant and Virtualbox to play nice together, I turned my attention back to testing my Ansible Vault configuration ideas on a Debian8 VM.

Because I’d been having continued problems connecting to the damned box, I init’d a new VM based on debian/jessie64.  Once again, however, I noticed two issues:

  1. Vagrant is no longer registering new VMs in the VirtualBox Manager (when did Vagrant stop doing this?)
  2. I’m unable to copy SSH keys to freshly-booted Debian VM – ssh-copy-id results in this response:
    mike@MIKE-WIN10-SSD:~/$ ssh-copy-id vagrant@127.0.0.1 -p 2200
    /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
    /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
    Permission denied (publickey).

This article purports to have the answer to this, but even after fixing and re-checking the /etc/ssh/sshd_config multiple times, I’m still getting the same “Permission denied (publickey)” response.

After a couple of hours of tail-chasing, I finally remembered the debug flags in the ssh command (-v and -vvv).  That results in this output:

mike@MIKE-WIN10-SSD:~/code/Copy-ansible-role-unattended-upgrades$ ssh -p 2200 vagrant@127.0.0.1 -v
OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: Connecting to 127.0.0.1 [127.0.0.1] port 2200.
debug1: Connection established.
debug1: identity file /home/mike/.ssh/id_rsa type 1
debug1: identity file /home/mike/.ssh/id_rsa-cert type -1
debug1: identity file /home/mike/.ssh/id_dsa type -1
debug1: identity file /home/mike/.ssh/id_dsa-cert type -1
debug1: identity file /home/mike/.ssh/id_ecdsa type -1
debug1: identity file /home/mike/.ssh/id_ecdsa-cert type -1
debug1: identity file /home/mike/.ssh/id_ed25519 type -1
mike@MIKE-WIN10-SSD:~/code/Copy-ansible-role-unattended-upgrades$ ssh -p 2200 vagrant@127.0.0.1 -v
OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: Connecting to 127.0.0.1 [127.0.0.1] port 2200.
debug1: Connection established.
debug1: identity file /home/mike/.ssh/id_rsa type 1
debug1: identity file /home/mike/.ssh/id_rsa-cert type -1
debug1: identity file /home/mike/.ssh/id_dsa type -1
debug1: identity file /home/mike/.ssh/id_dsa-cert type -1
debug1: identity file /home/mike/.ssh/id_ecdsa type -1
debug1: identity file /home/mike/.ssh/id_ecdsa-cert type -1
debug1: identity file /home/mike/.ssh/id_ed25519 type -1
debug1: identity file /home/mike/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.8
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.7p1 Debian-5+deb8u3
debug1: match: OpenSSH_6.7p1 Debian-5+deb8u3 pat OpenSSH* compat 0x04000000
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-sha1-etm@openssh.com none
debug1: kex: client->server aes128-ctr hmac-sha1-etm@openssh.com none
debug1: sending SSH2_MSG_KEX_ECDH_INIT
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ECDSA b0:b7:27:f4:0a:91:a4:37:8c:ce:35:a3:e3:fe:db:2d
debug1: Host '[127.0.0.1]:2200' is known and matches the ECDSA host key.
debug1: Found key in /home/mike/.ssh/known_hosts:4
debug1: ssh_ecdsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/mike/.ssh/id_rsa
debug1: Authentications that can continue: publickey
debug1: Trying private key: /home/mike/.ssh/id_dsa
debug1: Trying private key: /home/mike/.ssh/id_ecdsa
debug1: Trying private key: /home/mike/.ssh/id_ed25519
debug1: No more authentication methods to try.
Permission denied (publickey).

I tried regenerating keys, but that still ends with the same “Permission denied (publickey)” message.

Is it possible that the Debian box’s sshd isn’t accepting RSA keys for SSH auth?  If that were true, wouldn’t the sshd_config include “RSAAuthentication no” instead of the “RSAAuthentication yes” I’m seeing?

It’s odd – when I attempt to ssh directly, I’m getting this kind of output – this implies that the remote sshd is attempting to accept password for authentication, even though it’s acting like I haven’t typed in the correct password (I am):

mike@MIKE-WIN10-SSD:~/code/Copy-ansible-role-unattended-upgrades$ ssh vagrant@127.0.0.1 2200
vagrant@127.0.0.1's password:
Permission denied, please try again.
vagrant@127.0.0.1's password:
Permission denied, please try again.
vagrant@127.0.0.1's password:
Received disconnect from 127.0.0.1: 14:

Time for a reset.

Solution (?)

Ripped out every VM on my system.  Re-inited.  Edited Vagrantfile.  Tried/failed.  Mounted the machine in the VirtualBox Manager app (because, Vagrant’s still not registering the machine with the Manager UI). Launched the Debian box interactively from VBox Mgr.

Edited the /etc/ssh/sshd_config file to change the PasswordAuthentication setting TWICE.

Yes, TWICE.

Something, somewhere, is inserting two entries (one commented out, the other uncommented at the very end of the file) that are both set to “no”.

What.  The.  Heck.

Set them *both* to yes (left them both uncommented, just for show) and rebooted the box.

Now?

ssh-copy-id is easily able to authenticate with the vagrant password *and* copy the current RSA public key to the appropriate file:

mike@MIKE-WIN10-SSD:~/$ ssh-copy-id vagrant@127.0.0.1 -p 2222
The authenticity of host '[127.0.0.1]:2222 ([127.0.0.1]:2222)' can't be established.
ECDSA key fingerprint is b0:b7:27:f4:0a:91:a4:37:8c:ce:35:a3:e3:fe:db:2d.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
Permission denied (publickey).
mike@MIKE-WIN10-SSD:~/code/Copy-ansible-role-unattended-upgrades$ ssh-copy-id vagrant@127.0.0.1 -p 2222
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
vagrant@127.0.0.1's password:

Number of key(s) added: 1

Now try logging into the machine, with: "ssh -p '2222' 'vagrant@127.0.0.1'"
and check to make sure that only the key(s) you wanted were added.

Boy, *that’s* going to be fun to remember to do every time I destroy and recreate this VM.

[Oh, and keep this article in your back pocket in case you run into a different SSH troubleshooting issue: http://askubuntu.com/questions/311558/ssh-permission-denied-publickey]

Troubleshooting E_FAIL SessionMachine between Virtualbox and Vagrant

Today I wanted to dive back into my research on Ansible, so I fired up one of my Debian VMs under Vagrant/Virtualbox.  The VM was an older image (8.5.3) so I updated to the latest (8.6.0), and then found myself troubleshooting weird connectivity issues for hours.

Eventually I got myself to this point, where the VirtualBox Manager wouldn’t even start up the VM:

debian-virtualbox-startup-issue

This error leads to the following article:

http://superuser.com/questions/785072/e-fail-0x80004005-when-running-linux-through-windows-8-virtualbox

The AppCompatFlags entry doesn’t exist, but as for another user, even though VirtualBox Manager reports it’s running the latest version (5.0.26), looking at the Downloads page tells a different story (5.1.6).  Installed that, then of course Vagrant howled loudly:

C:\Users\Mike\VirtualBox VMs\BaseDebianServer>vagrant destroy
The provider 'virtualbox' that was requested to back the machine
'default' is reporting that it isn't usable on this system. The
reason is shown below:

Vagrant has detected that you have a version of VirtualBox installed
that is not supported by this version of Vagrant. Please install one of
the supported versions listed below to use Vagrant:

4.0, 4.1, 4.2, 4.3, 5.0

A Vagrant update may also be available that adds support for the version
you specified. Please check www.vagrantup.com/downloads.html to download
the latest version.

C:\Users\Mike\VirtualBox VMs\BaseDebianServer>vagrant -v
Vagrant 1.8.1

There’s a 1.8.5 Vagrant version available, so I installed that too.

There, once again the master and the servant are back in sync:

C:\Users\Mike\VirtualBox VMs\BaseDebianServer>vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'debian/jessie64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'debian/jessie64' is up to date...
==> default: Setting the name of the VM: BaseDebianServer_default_1474924390887_22621
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
 default: Adapter 1: nat
==> default: Forwarding ports...
 default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
 default: SSH address: 127.0.0.1:2222
 default: SSH username: vagrant
 default: SSH auth method: private key
 default:
 default: Vagrant insecure key detected. Vagrant will automatically replace
 default: this with a newly generated keypair for better security.
 default:
 default: Inserting generated public key within guest...
 default: Removing insecure key from the guest if it's present...
 default: Key inserted! Disconnecting and reconnecting using new SSH key...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
 default: No guest additions were detected on the base box for this VM! Guest
 default: additions are required for forwarded ports, shared folders, host only
 default: networking, and more. If SSH fails on this machine, please install
 default: the guest additions and repackage the box to continue.
 default:
 default: This is not an error message; everything may continue to work properly,
 default: in which case you may ignore this message.

<editorial understatement=”on”>Based on the SuperUser question above, I’m guessing this isn’t an uncommon problem as Vagrant and Virtualbox rev their engines.</editorial>

What I’ve learned: setting up Bash/Ubuntu/Win10 for Ansible + Vagrant + VirtualBox

My Goal: test the use of this Ansible Role from Windows 10, using a combination of Windows and Bash for Ubuntu on Windows 10 tools.  Favour the *nix tools wherever possible, for maximum compatibility with the all-Linux production environment.

Preconditions

Here is the software/shell arrangement that worked for me in my Win10 box:

  • Runs in Windows: Virtualbox, Vagrant
  • Runs in Bash/Ubuntu: Ansible (in part because of this)

In this setup, I’m using a single Virtualbox VM in default network configuration, whereby Vagrant ends up reporting the host listening on 127.0.0.1 and SSH listening on TCP port 2222.  Substitute your actual values as required.

Also note the versions of software I’m currently running:

  • Windows 10: Anniversary Update, build 14393.51
  • Ansible (*nix version in Bash/Ubuntu/Win10): 1.5.4
  • VirtualBox (Windows): 5.0.26
  • Vagrant (Windows): 1.8.1

Run the Windows tools from a Windows shell

  • C:\> vagrant up
  • (or launch a Bash shell with cbwin support:  C:\>outbash, then try running /mnt/c/…/Vagrant.exe up from the bash environment)

Start the Virtualbox VMs using Vagrant

  • Vagrant (Bash) can’t just do vagrant up where VirtualBox is installed in Windows – it depends on being able to call the VBoxManage binary
    • Q: can I trick Bash to call VBoxManage.exe from /mnt/c/Program Files/Oracle/VirtualBox?
    • If not, is it worth messing around with Vagrant (Bash)?  Or should I relent and try Vagrant (Windows), either using cbwin or just running from a different shell?
  • Vagrant (Windows) runs into the fscking rsync problem (as always)
    • Fortunately you can disable rsync if you don’t need the sync’d folders
    • Disabling the synced_folder requires editing the Vagrantfile to add this in the Vagrant.configure section:
      config.vm.synced_folder “.”, “/vagrant”, disabled: true

Setup the inventory for management

  • Find the IP’s for all managed boxes
  • Organize them (in one group or several) in the /etc/ansible/hosts file
  • Remember to specify the SSH port if non-22:
    [test-web]
    127.0.0.1 ansible_ssh_port=2222
    # 127.0.0.1 ansible_port=2222 when Ansible version > 1.9
    • While “ansible_port” is said to be the supported parameter as of Ansible 2.0, my own experience with Ansible under Bash on Windows was that ansible wouldn’t connect properly to the server until I changed the inventory configuration to use “ansible_ssh_port”, even though ansible –version reported itself as 2.1.1.0
    • Side question: is there some way to predictably force the same SSH port every time for the same box?  That way I can setup an inventory in my Bash environment and keep it stable.

Getting SSH keys on the VMs

  • (Optional: generate keys if not already) Run ssh-keygen -t rsa
  • (Optional: if you’ve destroyed and re-generated the VM with vagrant destroy/up, wipe out the existing key for the host:port combination by running the following command that is recommended when ssh-copy-id fails): ssh-keygen -f “/home/mike/.ssh/known_hosts” -R [127.0.0.1]:2222
  • Run ssh-copy-id vagrant@127.0.0.1 -p 2222 to push the public key to the target VM’s vagrant account

Connect to the VMs using Ansible to test connectivity

  • [from Windows] vagrant ssh-config will tell you the IP address and port of your current VM
  • [from Bash] ansible all -u vagrant -m ping will check basic Ansible connectivity
    • (ansible all -c local -m ping will go even more basic, testing Ansible itself)

Run the playbook

  • Run ansible-playbook [playbook_name.yml e.g. playbook.yml] -u vagrant
    • If you receive an error like “SSH encountered an unknown error” with details that include “No more authentication methods to try.  Permission denied (publickey,password).”, make sure to remember to specify the correct remote user (i.e. one that trusts your SSH key)
    • If you receive an error like “stderr: E: Could not open lock file /var/lib/dpkg/lock – open (13: Permission denied)”, make sure your remote user runs with root privilege – e.g. in the [playbook.yml], ensure sudo: true is included
  • Issue: if you receive an error like “fatal: [127.0.0.1]: UNREACHABLE! => {“changed”: false, “msg”: “Failed to connect to the host via ssh.”, “unreachable”: true}”, check that your SSH keys are trusted by the remote user you’re using (e.g. “-u vagrant” may not have the SSH keys already trusted)
  • If you wish to target a subset of servers in your inventory (e.g. using one or more groups), add the “-l” parameter and name the inventory group, IP address or hostname you wish to target
    e.g. ansible-playbook playbook.yml -u vagrant -l test-web
    or ansible-playbook playbook.yml -u vagrant -l 127.0.0.1

Protip: remote_user

If you want to stop having to add -u vagrant to all the fun ansible commands, then go to your /etc/ansible/ansible.cfg file and add remote_user = vagrant in the appropriate location.

Rabbit Hole Details for the Pedantically-Inclined

03ec8fe3bb146924423af6381eb99ea9

Great Related Lesson: know the difference between vagrant commands

  • Run vagrant ssh to connect to the VM [note: requires an SSH app installed in Windows, under this setup]
  • Run vagrant status to check what state the VM is in
  • Run vagrant reload to restart the VM
  • Run vagrant halt to stop the VM
  • Run vagrant destroy to wipe the VM

Ansible’s RSA issue when SSH’ing into a non-configured remote user

  • The following issue occurs when running ansible commands to a remote SSH target
    e.g. ansible all -m ping
  • This occurs even when the following commands succeed:
    • ansible -c local all -m ping
    • ssh vagrant@host.name [port #]
    • ssh-copy-id -p [port #] vagrant@host.name
  • Also note: prefixing with “sudo” doesn’t seem to help – just switches whose local keys you’re using
  • I spent the better part of a few hours (spaced over two days, due to rage quit) troubleshooting this situation
  • Troubleshooting this is challenging to say the least, as ansible doesn’t intelligently hint at the source of the problem, even though this must be a well-known issue
    • There’s nothing in the debug output of ssh/(openssl?) that indicates that there are no trusted SSH keys in the account of the currently-used remote user
    • Nor is it clear which remote user is being impersonated – sure, I’ll bet someone that fights with SSH & OpenSSL all day would have noticed the subtle hints, but for those of us just trying to get a job done, it’s like looking through foggy glass
  • Solution: remember to configure the remote user under which you’re connecting (i.e. a user with the correct permissions *and* who trusts the SSH keys in use)
    • Solution A: add the -u vagrant parameter
    • Solution B: specify remote_user = vagrant in the ansible.cfg file under [defaults]
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ ansible-playbook role.yml -vvvv

PLAY [all] ********************************************************************

GATHERING FACTS ***************************************************************
<127.0.0.1> ESTABLISH CONNECTION FOR USER: mike
<127.0.0.1> REMOTE_MODULE setup
<127.0.0.1> EXEC ['ssh', '-C', '-tt', '-vvv', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'ControlPath=/home/mike/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 'Port=2222', '-o', 'KbdInteractiveAuthentication=no', '-o', 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', '-o', 'PasswordAuthentication=no', '-o', 'ConnectTimeout=10', '127.0.0.1', "/bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1471378875.79-237810336673832 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1471378875.79-237810336673832 && echo $HOME/.ansible/tmp/ansible-tmp-1471378875.79-237810336673832'"]
fatal: [127.0.0.1] => SSH encountered an unknown error. The output was:
OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: auto-mux: Trying existing master
debug1: Control socket "/home/mike/.ansible/cp/ansible-ssh-127.0.0.1-2222-mike" does not exist
debug2: ssh_connect: needpriv 0
debug1: Connecting to 127.0.0.1 [127.0.0.1] port 2222.
debug2: fd 3 setting O_NONBLOCK
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.
debug3: timeout: 10000 ms remain after connect
debug3: Incorrect RSA1 identifier
debug3: Could not load "/home/mike/.ssh/id_rsa" as a RSA1 public key
debug1: identity file /home/mike/.ssh/id_rsa type 1
debug1: identity file /home/mike/.ssh/id_rsa-cert type -1
debug1: identity file /home/mike/.ssh/id_dsa type -1
debug1: identity file /home/mike/.ssh/id_dsa-cert type -1
debug1: identity file /home/mike/.ssh/id_ecdsa type -1
debug1: identity file /home/mike/.ssh/id_ecdsa-cert type -1
debug1: identity file /home/mike/.ssh/id_ed25519 type -1
debug1: identity file /home/mike/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.7p1 Debian-5+deb8u1
debug1: match: OpenSSH_6.7p1 Debian-5+deb8u1 pat OpenSSH* compat 0x04000000
debug2: fd 3 setting O_NONBLOCK
debug3: put_host_port: [127.0.0.1]:2222
debug3: load_hostkeys: loading entries for host "[127.0.0.1]:2222" from file "/home/mike/.ssh/known_hosts"
debug3: load_hostkeys: found key type ECDSA in file /home/mike/.ssh/known_hosts:2
debug3: load_hostkeys: loaded 1 keys
debug3: order_hostkeyalgs: prefer hostkeyalgs: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug2: kex_parse_kexinit: curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
debug2: kex_parse_kexinit: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-dss-cert-v01@openssh.com,ssh-rsa-cert-v00@openssh.com,ssh-dss-cert-v00@openssh.com,ssh-ed25519,ssh-rsa,ssh-dss
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se
debug2: kex_parse_kexinit: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: zlib@openssh.com,zlib,none
debug2: kex_parse_kexinit: zlib@openssh.com,zlib,none
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: kex_parse_kexinit: curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1
debug2: kex_parse_kexinit: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ssh-ed25519
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com
debug2: kex_parse_kexinit: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: kex_parse_kexinit: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: kex_parse_kexinit: none,zlib@openssh.com
debug2: kex_parse_kexinit: none,zlib@openssh.com
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: mac_setup: setup hmac-sha1-etm@openssh.com
debug1: kex: server->client aes128-ctr hmac-sha1-etm@openssh.com zlib@openssh.com
debug2: mac_setup: setup hmac-sha1-etm@openssh.com
debug1: kex: client->server aes128-ctr hmac-sha1-etm@openssh.com zlib@openssh.com
debug1: sending SSH2_MSG_KEX_ECDH_INIT
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ECDSA 07:f3:2f:b0:86:b5:b6:2b:d9:f5:26:71:95:6e:d9:ce
debug3: put_host_port: [127.0.0.1]:2222
debug3: put_host_port: [127.0.0.1]:2222
debug3: load_hostkeys: loading entries for host "[127.0.0.1]:2222" from file "/home/mike/.ssh/known_hosts"
debug3: load_hostkeys: found key type ECDSA in file /home/mike/.ssh/known_hosts:2
debug3: load_hostkeys: loaded 1 keys
debug1: Host '[127.0.0.1]:2222' is known and matches the ECDSA host key.
debug1: Found key in /home/mike/.ssh/known_hosts:2
debug1: ssh_ecdsa_verify: signature correct
debug2: kex_derive_keys
debug2: set_newkeys: mode 1
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug2: set_newkeys: mode 0
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug2: key: /home/mike/.ssh/id_rsa (0x7fffbdbd5b80),
debug2: key: /home/mike/.ssh/id_dsa ((nil)),
debug2: key: /home/mike/.ssh/id_ecdsa ((nil)),
debug2: key: /home/mike/.ssh/id_ed25519 ((nil)),
debug1: Authentications that can continue: publickey,password
debug3: start over, passed a different list publickey,password
debug3: preferred gssapi-with-mic,gssapi-keyex,hostbased,publickey
debug3: authmethod_lookup publickey
debug3: remaining preferred: ,gssapi-keyex,hostbased,publickey
debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/mike/.ssh/id_rsa
debug3: send_pubkey_test
debug2: we sent a publickey packet, wait for reply
debug1: Authentications that can continue: publickey,password
debug1: Trying private key: /home/mike/.ssh/id_dsa
debug3: no such identity: /home/mike/.ssh/id_dsa: No such file or directory
debug1: Trying private key: /home/mike/.ssh/id_ecdsa
debug3: no such identity: /home/mike/.ssh/id_ecdsa: No such file or directory
debug1: Trying private key: /home/mike/.ssh/id_ed25519
debug3: no such identity: /home/mike/.ssh/id_ed25519: No such file or directory
debug2: we did not send a packet, disable method
debug1: No more authentication methods to try.
Permission denied (publickey,password).

TASK: [ansible-role-unattended-upgrades | add distribution-specific variables] ***
FATAL: no hosts matched or all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/mike/role.retry

127.0.0.1                  : ok=0    changed=0    unreachable=1    failed=0

Ansible’s permissions issue when trying to run non-trivial commands without sudo

  • ansible -m ping will work fine without local root permissions, making you think that you might be able to do other ansible operations without sudo
  • Haha! You would be wrong, foolish apprentice
  • Thus, the SSH keys for enabling ansible to work will have to be (a) generated for the local root user and (b) copied to the remote vagrant user
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ ansible-playbook -u vagrant role.yml

PLAY [all] ********************************************************************

GATHERING FACTS ***************************************************************
ok: [127.0.0.1]

TASK: [ansible-role-unattended-upgrades | add distribution-specific variables] ***
ok: [127.0.0.1]

TASK: [ansible-role-unattended-upgrades | install unattended-upgrades] ********
failed: [127.0.0.1] => {"failed": true, "item": ""}
stderr: E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?

msg: 'apt-get install 'unattended-upgrades' ' failed: E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?


FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/mike/role.retry

127.0.0.1                  : ok=2    changed=0    unreachable=0    failed=1

 

Articles I reviewed while doing the work outlined here

https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys–2

http://blog.publysher.nl/2013/07/infra-as-repo-using-vagrant-and-salt.html

https://github.com/devopsgroup-io/vagrant-digitalocean

https://github.com/mitchellh/vagrant/issues/4073

http://stackoverflow.com/questions/23337312/how-do-i-use-rsync-shared-folders-in-vagrant-on-windows

https://github.com/mitchellh/vagrant/issues/3230

https://www.vagrantup.com/docs/synced-folders/basic_usage.html

http://docs.ansible.com/ansible/intro_inventory.html

http://stackoverflow.com/questions/36932952/ansible-unable-to-connect-to-aws-ec2-instance

http://serverfault.com/questions/649659/ansible-try-to-ping-connection-between-localhost-and-remote-server

http://stackoverflow.com/questions/22232509/vagrant-provision-works-but-i-cannot-send-an-ad-hoc-command-with-ansible

http://stackoverflow.com/questions/21670747/what-user-will-ansible-run-my-commands-as#21680256

Windows pre-WSL: fighting with vagrant, virtualbox and Cygwin

These are notes I kept for myself while struggling to find a way to use Vagrant and Virtualbox (among other tools like git, ansible and ssh) to test and debug virtual server builds without having to dual-boot into a Linux box.
Note to DavidP: you’ll see throughout the rsync errors that made this whole thing not *quite* a functional vagrant environment.
Obligatory: “this was of course a foolish mission that many loudly warned me against”.
Obligatory: “I thought I was smarter than most, and could probably decipher the grotty issues” arrogance.
Futile: “I hope these notes help someone else somewhere down the line with a similar issue”.
Important: my work was liberally stealing great ideas from Stefan van Essen:

Basic Setup

Details of what I did to make sure I can recreate my environment later:
  • Installed Cygwin64 with tools like bash, curl, gcc-core, make, nano, openssh, openssl, python, wget (and keep uninstalling python-cryptography because it interferes with my almost-working “ansible-under-cygwin” setup)
  • Installed apt-cyg to make cygwin package management easier
  • Installed VirtualBox for Windows with default paths (leaving the VMs in %username%/VirtualBox VMs)
  • Installed Vagrant for Windows with no changes
  • Created a symlink in the cygwin home directory to the VMs for ease of access (ln -s “/cygdrive/c/Users/Mike/VirtualBox VMs” ~/VMs)

First sign of trouble, selection of build target

  • Problem: “vagrant up” runs into a couple of problems consistently, which might have something to do with me instantiating the vagrant box under Windows (where there wasn’t an available rsync program to enable Vagrant to do whatever it needs when first “up”ing a new Vagrant box)
    • There seemed to be an SSH issue: 
      ==> default: Waiting for machine to boot. This may take a few minutes...
         default: SSH address: 127.0.0.1:2222
         default: SSH username: vagrant
         default: SSH auth method: private key
         default: Warning: Remote connection disconnect. Retrying...
         default: Warning: Remote connection disconnect. Retrying...
      ==> default: Machine booted and ready!
    • Then appeared a “hangover” rsync issue:
      ==> default: Rsyncing folder: /cygdrive/c/Users/Mike/VirtualBox VMs/BaseDebian/ => /vagrant
      There was an error when attempting to rsync a synced folder.
      Please inspect the error message below for more info.
      Host path: /cygdrive/c/Users/Mike/VirtualBox VMs/BaseDebian/
      Guest path: /vagrant
      Command: rsync --verbose --archive --delete -z --copy-links --chmod=ugo=rwX --no-perms --no-owner --no-group --rsync-path sudo rsync -e ssh -p 2222 -o ControlMaster=auto -o ControlPath=C:/cygwin64/tmp/ssh.570 -o ControlPersist=10m -o StrictHostKeyChecking=no -o IdentitiesOnly=true -o UserKnownHostsFile=/dev/null -i 'C:/Users/Mike/VirtualBox VMs/BaseDebian/.vagrant/machines/default/virtualbox/private_key' --exclude .vagrant/ /cygdrive/c/Users/Mike/VirtualBox VMs/BaseDebian/ vagrant@127.0.0.1:/vagrant
      Error: Warning: Permanently added '[127.0.0.1]:2222' (ECDSA) to the list of known hosts.
      mm_receive_fd: no message header
      process_mux_new_session: failed to receive fd 0 from slave
      mux_client_request_session: read from master failed: Connection reset by peer
      Failed to connect to new control master
      rsync: connection unexpectedly closed (0 bytes received so far) [sender]
      rsync error: error in rsync protocol data stream (code 12) at io.c(226) [sender=3.1.2]
    • I figured rather than treat this box as a pet, I’d just destroy it and try over again [good practice for breaking my old habits of trying to root cause catastrophic, but sometimes irreproducible, issues]
    • But no – these issues reappear for a freshly-init’d instance of the same vagrant box (debian/jessie64)
    • However, the next vagrant box I tried (hashicorp/precise64) (where I’ll put my dev tools) seemed close enough to fine, no red errors:
      ==> default: Machine booted and ready!
      ==> default: Checking for guest additions in VM...
         default: The guest additions on this VM do not match the installed version of
         default: VirtualBox! In most cases this is fine, but in rare cases it can
         default: prevent things such as shared folders from working properly. If you see
         default: shared folder errors, please make sure the guest additions within the
         default: virtual machine match the version of VirtualBox you have installed on
         default: your host and reload your VM.
         default:
         default: Guest Additions Version: 4.2.0
         default: VirtualBox Version: 5.0
      ==> default: Mounting shared folders...
         default: /vagrant => C:/Users/Mike/VirtualBox VMs/UbuntuDev
    • I also tried with (box-cutter/ubuntu1404-desktop) – worked great!
      ==> default: Machine booted and ready!
      ==> default: Checking for guest additions in VM...
      ==> default: Mounting shared folders...
         default: /vagrant => C:/Users/Mike/VirtualBox VMs/Ubuntu-box-cutter
    • Conclusion: using the box-cutter/ubuntu1404 box for my dev work

Configuration woes

  • Installed git (with suggested additions)
  • Installed ansible
  • Generated ssh keys, ssh-copy-id copied to mike@192.168.1.14 (debian1sttry), then ran “cp ~/.ssh/authorized_keys /root/.ssh” as root
  • Next I wanted to share a single Code folder between Windows and the Ubuntu VM:
    • 1st idea: create a symlink in the shared “vagrant” folder (C:\Users\Mike\VirtualBox VMs\Ubuntu-box-cutter) that would be addressable from inside the Ubuntu VM:
      [from Cygwin on host] ln -s /cygdrive/c/Users/Mike/Code Code
      [from Ubuntu VM] 
      vagrant@vagrant:~$ ls /vagrant/Code
      /vagrant/Code
      vagrant@vagrant:~$ cd /vagrant/Code
      bash: cd: /vagrant/Code: Not a directory
    • 2nd idea: try a native-Windows symlink, not a Cygwin version
      [from CMD on host] 
      C:\Users\Mike\VirtualBox VMs\Ubuntu-box-cutter>mklink /D Code C:\Users\Mike\Code
      symbolic link created for Code <<===>> C:\Users\Mike\Code
      [from Ubuntu VM]
      vagrant@vagrant:/vagrant$ cd /vagrant/Code
      bash: cd: /vagrant/Code: Not a directory
    • 3rd idea: create a new synced folder via vagrant
  • Meanwhile, I started out a new repo (https://github.com/MikeTheCanuck/jQuery-infra-update) to contain the suggested Ansible code (using the Git Bash shell from Git for Windows):
    • mkdir code
      cd code
      touch 50unattended-upgrades
      touch 02periodic
      git add 50unattended-upgrades
      git add 02periodic
    • [then tried git commit but got prompted for identity]
    • git config --global user.email "mikethecanuck@gmail.com"
      git config --global user.name "Mike Lonergan"
      git commit -m "First commit"
      git push origin master
    • [fun: Windows Credential manager prompted for my Github creds]
  • Now to follow the canonical git-flow http://nvie.com/posts/a-successful-git-branching-model/:
    • git checkout -b develop
      git checkout -b default-file-contents develop
      git commit -m "populated default content" code/02periodic code/50unattended-upgrades
      git push
    • [I didn’t think it worked – I didn’t see any changes on the upstream repo – so I tried over]
    • git reset
      git reset --hard
      git commit -m "Added default config file contents" -a
      git push
    • [which resulted in the warning “fatal: The current branch default-file-contents has no upstream branch.  To push the current branch and set the remote as upstream, use…(command)”, so I did]
    • git push --set-upstream origin default-file-contents
  • Result: I have two commits to the branch on my Git repo

Ansible testing environment setup

  • 1st: setup my inventory in /etc/ansible/hosts
  • 2nd: test connectivity “ansible dev -m ping -u root” [cause my target doesn’t have a vagrant account – yeah yeah, I’ll get to building a target using vagrant soon]
  • 3rd: move the files over for testing:
     ansible dev -u root -m copy -a "backup=yes src=~/code/jQuery-infra-update/code/50unattended-upgrades dest=/etc/apt/apt.conf.d/"
     ansible dev -u root -m copy -a "backup=yes src=~/code/jQuery-infra-update/code/02periodic dest=/etc/apt/apt.conf.d/"

Testing vagrant on Debian clones

  • I installed a base Debian box and ignored this error during “vagrant up” that I hoped wouldn’t be a problem:
    ==> default: Checking for guest additions in VM...
       default: No guest additions were detected on the base box for this VM! Guest
       default: additions are required for forwarded ports, shared folders, host only
       default: networking, and more. If SSH fails on this machine, please install
       default: the guest additions and repackage the box to continue.
       default:
       default: This is not an error message; everything may continue to work properly,
       default: in which case you may ignore this message.
    ==> default: Rsyncing folder: /cygdrive/c/Users/Mike/VirtualBox VMs/BaseDebian/ => /vagrant
    There was an error when attempting to rsync a synced folder
    Please inspect the error message below for more info.
    Host path: /cygdrive/c/Users/Mike/VirtualBox VMs/BaseDebian/
    Guest path: /vagrant
    Command: rsync --verbose --archive --delete -z --copy-links --chmod=ugo=rwX --no-perms --no-owner --no-group --rsync-path sudo rsync -e ssh -p 2222 -o ControlMaster=auto -o ControlPath=C:/cygwin64/tmp/ssh.275 -o ControlPersist=10m -o StrictHostKeyChecking=no -o IdentitiesOnly=true -o UserKnownHostsFile=/dev/null -i 'C:/Users/Mike/VirtualBox VMs/BaseDebian/.vagrant/machines/default/virtualbox/private_key' --exclude .vagrant/ /cygdrive/c/Users/Mike/VirtualBox VMs/BaseDebian/ vagrant@127.0.0.1:/vagrant
    Error: Warning: Permanently added '[127.0.0.1]:2222' (ECDSA) to the list of known hosts.
    mm_receive_fd: no message header
    process_mux_new_session: failed to receive fd 0 from slave
    mm_send_fd: sendmsg(2): Broken pipe
    mux_client_request_session: send fds failed
    rsync: connection unexpectedly closed (0 bytes received so far) [sender]
    rsync error: error in rsync protocol data stream (code 12) at io.c(226) [sender=3.1.2]
  • Now I’m working from the BaseDebian box I’d setup, and tried to clone a couple of boxes from it, but found that they complained that the VBox Additions weren’t installed and wouldn’t allow me to proceed:
    ==> default: Checking for guest additions in VM...
        default: No guest additions were detected on the base box for this VM! Guest
        default: additions are required for forwarded ports, shared folders, host only
        default: networking, and more. If SSH fails on this machine, please install
        default: the guest additions and repackage the box to continue.
        default:
        default: This is not an error message; everything may continue to work properly,
        default: in which case you may ignore this message.
    ==> default: Mounting shared folders...
        default: /vagrant => C:/Users/Mike/VirtualBox VMs/DebianTest1
    Failed to mount folders in Linux guest. This is usually because
    the "vboxsf" file system is not available. Please verify that
    the guest additions are properly installed in the guest and
    can work properly. The command attempted was:
    
    mount -t vboxsf -o uid=`id -u vagrant`,gid=`getent group vagrant | cut -d: -f3` vagrant /vagrant
    mount -t vboxsf -o uid=`id -u vagrant`,gid=`id -g vagrant` vagrant /vagrant
    
    The error output from the last command was:
    
    stdin: is not a tty
    mount: unknown filesystem type 'vboxsf'
  • OK, so I followed this article and have pulled together these commands to try to get the Additions installed on the base box (here, I destroyed the clones and am working to repair the box from which I cloned)
     sudo apt-get install gcc
     sudo apt-get install make

    (reboot the box)

     sudo mount /dev/cdrom /media/cdrom
     cd /media/cdrom
     sudo ./VBoxLinuxAdditions.run
  • Sadly, this procedure did *not* repair the rsync issue that I was seeing originally before packaging from that box, but I tried vagrant package and clone from this box again, in case what I did happens to have solve the “unknown filesystem type ‘vboxsf'” error
  • Nope, same problem on the cloned box – “unknown filesystem type ‘vboxsf'”
    • Just in case, I tried installing the additions in the cloned box
    • They won’t install – complains about the following:
      Building the VirtualBox Guest Additions kernel modules
      The headers for the current running kernel were not found. If the following
      module compilation fails then this could be the reason.
      
      Building the main Guest Additions module ...fail!
      (Look at /var/log/vboxadd-install.log to find out what went wrong)
      Doing non-kernel setup of the Guest Additions ...done.
    • And that log file says:
      /tmp/vbox.0/Makefile.include.header:97: *** Error: unable to find the sources of your current Linux kernel. Specify KERN_DIR=<directory> and run Make again.  Stop.
      Creating user for the Guest Additions.
      Creating udev rule for the Guest Additions kernel module.
    • So I tried this article next, but that didn’t entirely help – the headers still aren’t available
    • However, instead of erroring out, the VBox Guest Additions installation said this at the same place:
      Building the VirtualBox Guest Additions kernel modules
      The headers for the current running kernel were not found. If the following
      module compilation fails then this could be the reason.
      
      Building the main Guest Additions module ...done.
      Building the shared folder support module ...done.
      Building the graphics driver module ...done.
      Doing non-kernel setup of the Guest Additions ...done.
      Starting the VirtualBox Guest AdditionsInstalling the Window System drivers
      Could not find the X.Org or XFree86 Window System, skipping.
      ...done.
    • HOLY SHIT GUYS, it actually worked:
       default: Mounting shared folders...
         default: /vagrant => C:/Users/Mike/VirtualBox VMs/DebianTest1
    • So I rebuilt the base Debian box with this procedure – sadly it didn’t address the rsync issue (that was a valiant wish), but at least I should be able to clone this box and avoid the ‘vboxsf’ issue
    • To address the rsync issue, I tried a variant on this article’s top answer – placed the cwrsync folder on my C:\ root, then copied the script to an rsync.bat file

Issues I never solved

  • Q: what’s the source of the actual problem with the rsync on my BaseDebian box?
    • TODO: Follow this article for some ideas on troubleshooting/narrowing down the problem
  • Q: what’s the source of the problem that causes my “vagrant up” to fail twice on the SSH auth?

Other references