Forklifting Chef Server

Link

From There to Here and Here to There

Due to the requirements of my client I have a need to move our chef server into a completely different hosting environment. This little journey started when I asked that we actually try and preform a restore from our chef backups. When that didn’t work, one of my associates threw a bunch of ruby code at the problem. The solution was a little kludgy and didn’t entirely do what I wanted, so I hit the boards and tried to figure out a better way to do this. After some missteps I came across the knife-backup gem and all was well….. for now.

First Steps

Restore, they said, it’ll be simple they said…. yeah right. To be clear we follow the OpsCode guidelines here https://docs.chef.io/server_backup_restore.html. You basically tar up a bunch of files and take a database dump. The problem we had, like most people, is that while we backup the server, we hardly if ever(never) perform a restore. It turns out that when we perform the restore process nothing really works all that well. After a few hours we were able to trace the issues to hostnames and key files. While this method will work in a crunch I started to wonder if there was a better way. In the world of cloud and amazon things are much more disposable and in essence what I wanted to do was figure out which is easier, standing up a new chef server or going through my highly manual chef backup and restore.

Finding of a Gem

I decided to see if anyone had automated the backup and restore process in chef based on the OpsCode guidelines. After some googling, I decided that a full backup and restore wasn’t what I really wanted. For what I wanted, and how we work in general its more useful to be able to import a set of cookbooks and all their versions. The “all of their versions” part is important, we make heavy use of environments and often have several versions of a cookbook in use at the same time. Enter knife-backup.

Really, Really You Need That Ruby

This is where I sometimes wonder goes through a developer’s mind. Chef is bundled with specific versions of ruby that do not change. Chef 11, a very common version of chef in use, is pegged at version 1.9. The chef versions that I’ve worked with, chef relies mostly on the embedded version of ruby installed with chef. You need to go out of your way to use the system version of ruby. As a result, for better or worse we’ve come to rely on the chef installed ruby rather than the system interpreter for all chef related tasks. To keep out images clean in fact, we don’t even install a system ruby and use only the embedded version. This is why when I attempted to install knife-backup, while the gem installed it couldn’t run because of a ruby version mismatch.

Trials and Tribulations in Ruby

Nothing is without difficulties, using knife-backup requires the ability to run two versions of ruby at once. I recommend that you do not install 2 system versions of ruby. Unless you have a linux distribution has a mature way of handling multiple interpreters (gentoo) you should keep the second interpreter in user space. On Redhat systems I recommend allowing chef to use its embedded ruby and you use RVM to manage the ruby needed for knife backup.

Installing rvm

According to the RVM website its as easy as an curl command

curl -sSL https://get.rvm.io | bash -s stable

However in my world nothing is ever that straight forward. I ran the command and found i cant connect to the Internet. All the environments we work in are protected by proxy servers. The proxy servers filter all requests so that only particular useragents work. This means, curl, the primary method of installing RVM, does not work out of the box for me. I was able to get it going by modifying the .curlrc file in the target machine in the following way:

user-agent=”only useragent our proxies accept”

Success! I now have RVM installed. Now on to installing ruby. RVM works by doing 2 things. First RVM uses manages symlinks and environment variables in userspace to allow you to pick which RVM installed ruby interpreter you want. The second thing RVM does is it creates an installation of ruby from source. In order to kick this off execute an RVM install <ruby version>; RVM user <ruby version>. Specifically we want ruby 2.1 and RVM is going to attempt to create it from the ruby source. In order to create the interpreter from source RVM will attempt to install development tools, i.e gcc, so make sure you have RHN access so you can install all the necessary rpms. Upon executing RVM install 2.1 it should take a while as things are being compiled. Lastly, simply execute RVM use 2.1. execute a ruby -v to verify the version and we are ready to go.

Nothing is Ever Simple

Just as we could not reach the Internet with curl, the method by which ruby reaches out to the Internet to install gems is equally blocked. My next effort is getting ruby to dance the same user-agent dance that curl did. Now, I must admit I could not find out how to change the ruby user-agent easily, although I know it is possible. I instead installed RVM on a machine that had unfettered Internet access, installed the version of ruby I needed, then did a gem install knife-backup. After the install, I located all the gem dependencies for knife-backup and from that list performed a gem download of all the needed gems. Once I had all the gem archives I tared them up and transported them over to Internet restricted box and did a local gem install. This allowed me to get knife-backup installed even though I could not get past the proxy server.

Still, Nothing is Ever Simple

RVM has taken care of ruby and a little bit of tar magic was able to get the gem installed. After installing the knife-backup on machines that could access the source and destination webserver I was ready to go. I executed the backup command:

knife backup export cookbooks environment nodes databags -D tmpbackupdir
tar cvjf tmpbackupdir backup.tbz2
scp backup.tbz2 serverthatcantalktonewchefserver:

Once the backup archive was transferred over I untarred it and ran the restore command:

knife backup restore -D tmpbackupdir

And it failed…. One other requirement of the chef backup gem is chef client >= 12.x. Chef 12.x has a new requirement for cookbooks, they must all have the name attribute defined in the metadata.rb file. Since the cookbooks come from a chef 11 server, where no checking is done for this by the chef 11 clients. When the chef 12 client attempted the import to the chef 11 server, it failed a client side check and died. Fortunately, the layout of the backup archive is <cookbookname>-version. With a little fit of sed and find I was able to on the fly create a name attribute for all the cookbooks with something similar to the following:

find . -type d -maxdepth 1 -exec (cd {}; echo "{}" | sed -e 's/.\/(.*)-[0-9.]*$/name=\1/g' >> metadata.rb) \;

With the name attribute now set I ran the restore command once again. During execution I noticed a curious message about merging cookbooks, but thought nothing of it at first. After the restore I logged into the chef UI and started to take inventory of the versions and contents of the cookbooks and that’s where I understood merging message.

The problem displayed itself as follows, while I was supposed to see every version of every cook that existed on the chef server of origin as well as the databags, nodes, and environments. While the databags, nodes, and environments transferred fine, several cookbooks were missing versions. This was bad, considering one of the chief requirements was to transfer all cookbooks and all versions over.

I started to page through the gem source code to see why exactly this was occurring. After adding some debug statements so I could trace execution I started to understand the issue. It had to do with the chef cookbook path. Chef can only have one version of a cookbook in its path when you are doing a cookbook upload if you want it to behave as expected. The restore script was putting the base directory containing all the versions of all the cookbooks in the chef path. When asked to upload a particular versions of a cookbook, knife would see all the versions in its cookbook path, pick the highest, and just upload that one several times.

The solution was to patch the gem. What I needed to do was make a temp directory point the cookbook path there. From there I would symlink each version of the cookbook into the temp directory and allow the restore function to perform a knife cookbook upload. The patch is below:

--- backup_restore.rb.orig   2015-01-23 12:31:20.289221254 -0500
+++ backup_restore.rb      2015-01-14 15:59:19.780221191 -0500
@@ -159,11 +159,14 @@
     def cookbooks
       ui.info "=== Restoring cookbooks ==="
       cookbooks = Dir.glob(File.join(config[:backup_dir], "cookbooks", '*'))
+      #Make tmp dir
       cookbooks.each do |cb|
+        FileUtils.rm_rf(config[:backup_dir] + "/tmp")
+        Dir.mkdir config[:backup_dir] + "/tmp"
         full_cb = File.expand_path(cb)
         cb_name = File.basename(cb)
         cookbook = cb_name.reverse.split('-',2).last.reverse
-        full_path = File.join(File.dirname(full_cb), cookbook)
+        full_path = File.join(File.dirname(config[:backup_dir] + "/tmp"), cookbook) 
         begin
           File.symlink(full_cb, full_path)
@@ -172,12 +175,15 @@
           cbu.name_args = [ cookbook ]
           cbu.config[:cookbook_path] = File.dirname(full_path)
           ui.info "Restoring cookbook #{cbu.name_args}"
+          ui.info "TMP=" + config[:backup_dir] + "/tmp"
           cbu.run
         rescue Net::HTTPServerException => e
           handle_error 'cookbook', cb_name, e
         ensure
           File.unlink(full_path)
         end
+      ui.info "Deleting TMP"
+      FileUtils.rm_rf(config[:backup_dir] + "/tmp")
       end
     end

After applying the patch, all versions of all cookbooks correctly appeared. Once this process was ironed out I had a fairly easy way to sync the servers moving forward and a backup and restore strategy that works without having to know how to mess with SSL certs.

There is one caveat to all this, my major requirement was to move the cookbooks, not the nodes. However, accounting for existing nodes is very easy and requires 2 steps. First, when knife backup export is execute use the node option to get the node information and the certs. Secondly, on each node, point to the new chef server.

Conclusion

The team I work with is going to develop this method further because it allows the cookbooks to be what is most valuable piece of information, not the chef server itself. In the world of disposable computing this is the way it should be to allow for rapid expansion and recovery.