Defending Against SSH Brute Force Attacks

Just Trying to Host a Website

So here I am trying host a personal website once I figured out a little bit about amazon in 2010. After a month or two of poking around and figuring out how to get the AMI I want running everything looks fine. I can now self host all the pictures and videos of cats I’m wiling to pay for in S3 buckets. At pennies per gigabyte this is alot of cat video and I am very pleased.

Little Website, Little Website Let Me In

Like all good, or at lease paranoid admins I regularly troll all the logs on the system that I have running. I see the “normal” strange web requests on my apache server, but I keep that pretty up to date so I’m not concerned. After looking around for a while I see failed SSH logon attempts all over the place from ip addresses I don’t recognize. The big bad wolf is at my door. After consulting with a colleague at work I learn about fail2ban. This is a unix daemon that watches for events in logs, and then bans the ip address that causes certain log entries via iptables for a set amount of time. Fail2ban also emails me when this happens so I can keep track. SSH is the only service I have issues with since the instance is locked down. I don’t like to white list ip addresses for a cloud VM. I don’t have a VPN into amazon set up and the ip address I administer from often changes from my ISP and because I have administration scripts set up from my mobile phone. Fail2ban seems like the perfect solution. I use it to protect my AWS servers and my machines at home which have limited external access.

Nothing Lasts Forever

The fail2ban solution worked for over three years. Occasionally I would get a persistent brute force attack but on average I was banning about 6-7 ip addresses per day and they wouldn’t be back after the fail2ban cool down period of 3 hours. Then one day in 2014 I started banning hundreds of different ip addresses per day. My inbox quickly goes to 999+ unread messages.

In short, it looks like bot nets with many IPs are attacking the average person trying to brute force a password or using one of the man SSH daemon exploits out there to gain access. As a knee jerk reaction to stem the flow and give me time to think I give in and white list ip addresses I can log in through the amazon web console and my home ISP router. I hate this solution mainly to me because it seems like lazy thinking and I HATE MAINTAINING WHITE LISTS.

What Next (Knock Knock)

What concerns me most is the lack of access to my home servers. I’m a software consultant and access to my home network has thrown many a project a lifeline. It’s almost impossible to know your external ip address from a client’s site the night before if you’ve never been there. I remember a patch set that was submitted to gentoo linux a few years ago. It was an experimental change to the SSH daemon that allowed something called port knocking. The SSH server would appear to be down to the casual observer. When you wanted to connect you would reach out to your server on predefined ports in a predefined order. This communication called knocking is one way. You send the packets, and it appears they are discarded. However, if you used the right combination and order to knock, ssh would then start listening on port 22 and you could log in.

The method mentioned above was a specialized case for SSH. in the intervening years a more general solution was created called knockd. This daemon will enable port knocking for any listening daemon.

The Setup

First make sure you have console access to the machine, or in the case of AWS, don’t save the rules until you are sure that they work so a reboot can get you back in. The easiest way for me to set this up was to SSH into the machine I wanted to configure. I installed knockd unconfigured and went through the pre-setup checklist. This included setting up ip tables to deny all incoming connections and allow established connections to be maintained. If your rules were correctly set then your ssh connection should still be active but no new ssh connections can be established. For example:

iptables -I INPUT -p tcp -s 0/0 --dport ssh -m state --state ESTABLISHED -j ACCEPT
iptables -A INPUT -s 0/0 -j DROP

If you are better at iptables foo than I am then you can set the default policy to deny instead of the deny all rule, its neater and I’ll go to it as soon as I clean up some strangeness in my existing iptables setup.

[options]
 logfile = /var/log/knockd.log
[openSSH]
 sequence = 7324,4566,7446,4324
 seq_timeout = 5
 command = /sbin/iptables -I INPUT 1 -s %IP% -p tcp --dport 22 -j ACCEPT
 tcpflags = syn
 cmd_timeout = 10
 stop_command = /sbin/iptables -D INPUT -s %IP% -p tcp --dport 22 -j ACCEPT
[closeSSH]
 sequence = 9999,3333,7123,6467
 seq_timeout = 5
 command = /sbin/iptables -D INPUT -s %IP% -p tcp --dport 22 -j ACCEPT
 tcpflags = syn

The above will open the ssh port from a specific ip address when you hit the ports 7324,4566,7446,4324 in order and hard close the connection when you hit ports 9999,3333,7123,6467. My only suggestion here is that you pick ports > 1024 and < 65K. After this start the knockd daemon.

Testing It

Tail the knockd log and now try to open up the ssh daemon port using nc as follows:

nc -z myserver.com 7324 4566 7446 4324

You should see the following in the log for the open:

[2014-12-28 15:57] myclientip: openSSH: Stage 1
[2014-12-28 15:57] myclientip: openSSH: Stage 2
[2014-12-28 15:57] myclientip: openSSH: Stage 3
[2014-12-28 15:57] myclientip: openSSH: Stage 4
[2014-12-28 15:57] myclientip: openSSH: OPEN SESAME
[2014-12-28 15:57] openSSH: running command: /sbin/iptables -I INPUT 1 -s myclientip -p tcp --dport 22 -j ACCEPT

You should see the following in the log for a manual close or a timeout:

[2014-12-28 15:46] myclientip: openSSH: command timeout
[2014-12-28 15:46] openSSH: running command: /sbin/iptables -D INPUT -s myclientip -p tcp --dport 22 -j ACCEPT

Make sure you see the timeout close the connection or check iptables every once in a while to ensure you aren’t leaving past ip’s open due to a misconfiguration.

Conclusion

I keep fail2ban running just in case, the number of strange ssh login attempts has dropped to 0. The technique, for now, is very effective. Eventually I’m sure I’ll need a new defense, but I’m hoping for the next three or four years I’m good.

 

Backup – Getting data into Glacier

Backups

Off site backups are an often talked about and rarely done well item for small to medium enterprises. This is usually due to the cost of an offsite facility storage, complexity of backup software, and operational costs. However, offsite backups are critical to keep an enterprise running if an unfortunate event happens to hit the physical computing facility. Often hardware is a trip to BestBuy or the Apple store or Insert your favorite chain here,  but without your data what good is it? AWS offers a low cost solution to this problem with only a little bit of java and shell script knowledge. This post is split into two parts, getting the data in and getting the data out. Currently we will deal with getting your data safely into glacier.

Glacier – What it is and is not

Glacier is a cheap service specifically for bulk uploads and downloads where timeliness is not of the essence. Inventories are taken every 24 hours of your vaults, and it can take hours to get an archive back. To save yourself grief do not think of glacier as disk space, but rather a robot that can get you your data from a warehouse. For disk space S3 and EBS are what you want in the amazon world.

Decide what to back up

This part is usually unnecessarily difficult. The end product is a list of directories that will be fed to a shell script. You do not need everything. Typically you just need user data, transient data, and OS configs. Think of whatever the delta is from a stock OS image to what you have now…. back that delta up. If you want to backup a database, make sure you are using a dump and not relying on the file system to be consistent.

Stage/Chunk it

Here is a script that can do it:
 
 #!/bin/bash
 CPUS=`cat /proc/cpuinfo | grep processor | wc -l`
 DATE=`date +%m%d%Y-%H%M%S`
 RETENTION=2
 BACKUP_PREFIX="/home/nfs_export/backups"
 BACKUPDIR="$BACKUP_PREFIX/$HOSTNAME/inflight/$DATE"
 dirs=(/root /sbin /usr /var /opt /lib32 /lib64 /etc /bin /boot)
 if [ ! -d "$BACKUP_PREFIX/$HOSTNAME/complete" ] ; then `mkdir $BACKUP_PREFIX/$HOSTNAME/complete` ; fi
 `mkdir -p $BACKUPDIR`
 mount /boot
 let CPUS++
 SPLITSIZE=`cat /proc/meminfo | grep MemTotal: | sed -e 's/MemTotal:[^0-9]\+\([0-9]\+\).*/\1/g'`
 SPLITSIZE=$(($SPLITSIZE*1024))
 SPLITSIZE=$(($SPLITSIZE/$CPUS))
 SPLITSIZE=$(($SPLITSIZE/2))
 for dir in "${dirs[@]}"
 do
 TMP=$dir
 TMP=`echo $TMP | sed -e 's/\//_/g'`
 echo "(cd $BACKUPDIR; tar cvf - $dir | split -b $SPLITSIZE - backup_$TMP)"
 `(cd $BACKUPDIR; tar cvfp - $dir | split -b $SPLITSIZE - backup_$TMP)`
 done 
 echo "(cd $BACKUPDIR; find . -type f | xargs -n 1 -P $CPUS xz -9 -e )"
`(cd $BACKUPDIR; find . -type f | xargs -n 1 -P $CPUS xz -9 -e )`
`(cd $BACKUP_PREFIX/$HOSTNAME/inflight; mv $DATE ../complete)`
umount /boot
i=0
for completedir in `(cd $BACKUP_PREFIX/$HOSTNAME/complete; ls -c)`
do
echo "$completedir $i $RETENTION"
if [ $i -gt $RETENTION ] ; then
echo "(cd $BACKUP_PREFIX/$HOSTNAME/complete; rm -rf $completedir)"
`(cd $BACKUP_PREFIX/$HOSTNAME/complete; rm -rf $completedir)`;
fi
let i++
done

This script makes alot of decsions for you, all you really need to do is decide where you are staging your data and what directories you are backing up. This script will create a multi part archive that will be pretty efficient to produce on the hardware it is executed on.

Speaking of staging, you will need formatted disk space to hold the archive while it is being sent up to AWS. Typically you want to be able to hold a weeks worth of backups on the partition. Why a week? This is breathing room to solve any backup issues while not loosing continuity of your backups.

Possibly encrypt it

Take a look at the shell script. If you are extra paranoid use gpg to encrypt each archive piece AFTER they have been compressed. Due to the nature of encryption, you will negate the ability of the compression algorithm to work if you encrypt ahead of time.

Get it into Glacier

First we need to create a vault to put our stuff. Log into your AWS console and go to your glacier management. Select the create a new vault option and remember the name of the vault. Then go to IAM and create a backup user remebering to recorde the access and secret key. Now we are ready to start.

This is custom java program snippet based on the AWS sample code:

for (String s : args) {
try {
ArchiveTransferManager atm = new ArchiveTransferManager(client, credentials);
UploadResult result = atm.upload(vaultName, "my archive " + (new Date()), new File(s));
System.out.println(s + " Archive ID: " + result.getArchiveId());
}
}

The complete code and  pom file to build it are included in the git repository. The pom should compile this program into a single jar that can be executed with the java -jar myjar.jar command first we need to configure the program. Create a properties file in the directory you will be running the java command with the name AwsCredentials.properties to have your secret key, access key, vault name, and aws endpoint. It should look like this:

 
 #Insert your AWS Credentials
 secretKey=***mysecretkey***
 accessKey=***myaccesskey***
 vaultname=pictures
#for the endpoint select your region
 endpointname=https://glacier.us-east-1.amazonaws.com/



Lastly feed the java program a list of the archive files, perhaps like this:

cd stage; ls *| xargs java -jar glacier-0.0.1-SNAPSHOT.one-jar.jar



This will get all the files in the staging directory up to your vault. In 24 hours you will see an inventory report with the size and number of uploads. Remember to save the output, the archive IDs are used to retrieve the data later. Some amount of internal book keeping will need to be done to keep the data organized. Amazon provides a safe place for data in this case, and not an easy way to index or find things.

To be Continued….

Next post… getting it all back after the said Armageddon.

Code

All code is available, excuse the bad SSL cert:

https://coveros.secureci.com/git/glenn_glacier.git

DevOps, Opensource, CI, and the Cloud

CI is a Reality and not Just for Google and Facebook

I’ve worked for many recognizable names and they all have time and energy invested in a software process that only shows results once a quarter. Hundreds of thousands or millions of dollars and you can only see the results once ever 3 months… maybe. Each time the releases have been painful, long, and fraught with errors and frustration. To alleviate the pain of deployment, Continuous Integration and Continuous Delivery are the answer. When ever the idea of a faster release schedule is introduced, usually by a business actor, most IT organizations push back. Typically the reasons are “What we did worked last time so we don’t want to change.”,  “The tools are too expensive”, and “We are not google or facebook”. As I will demonstrate the second argument is no longer valid. Using a set of free open source tools Continuous Integration and Continuous Delivery are easily achievable with some time and effort.

Example

The linked presentation and video example are a demonstration open source CI.

MS PowerPoint (DevOps_Opensource_CI_and_the_Cloud.pptx)

LibreOffice Impress (DevOps_Opensource_CI_and_the_Cloud.odp)

Youtube (https://www.youtube.com/watch?v=gIxCcJAl86M)

MP4 (https://s3.amazonaws.com/aws_wordpress/Coveros+Puppet+and+CI-voiceover.mp4)

The Scenario

In order to backup my statement I needed to prove it out with a non-trivial example. For the purposes of this demonstration I chose to show changes in a drupal site. The work flow could be characterized as follows:

  1. Developer creates tests
  2. Developer creates code
  3. Developer commits code and tests
  4. CI Framework detects changes
  5. CI Framework stands up empty VMs and databases
  6. CI Framework pushes down data, code, and configuration to the blank VMs
  7. CI Framework runs automated tests on the site and returns results

This process is completely hands off from the developer commit to the test results. No ops group needs to be contacted, no changes need to be filed, no e-mails need to be sent to the QA group alerting them to changes. The drupal instance is a multi-box, separate DB and Webserver, the example is not just a out of the box apache web server with the “it works” page.

The Tool Set

The software you use will vary from implementation to implementation, the software listed in this section is just what was needed for the demo. One size does not fit all, however, with the enormous amount of open source projects available there are basic tools for almost any need. Jenkins and Puppet are really the center pieces in this particular demonstration. Jenkins is the control point and executes the jobs and coordinates everything. Puppet is the CM database holding the necessary information to configure a VM and each application on each VM. PERL and sed are used for text manipulation, taking the output from the EC2 API or the puppet REST interface and turning the output into a usable format for Jenkins. EC2 itself is the infrastructure and elastic capacity for testing.

Path to CI and DevOps

Two words, “small steps”. Depending on the development organization you are working in or with trying to do this all at once it risky. It is easier to lay out a road map of implementing each individual technology and then integrating them. Start with OS packaging. Even if your deployment process is manual, developers and ops people can usually latch on to this idea. The two most challenging pieces are typically automated testing and hands off deployments. One requires a change in the way the testing organization works and the other may require a change to policy and personnel of the operations group. However, as more and more CI pieces get implemented, organizationally these changes will make more sense. In the end the inertia built up by increased efficiency and better releases may be the best leverage.

Conclusion

The three arguments against CI are typically cost, what we did worked last time, and while it works for other organizations it can’t work for us. From the demo you can see that CI can work and can be implemented with opens source tools negating the cost argument. While you have to invest in some (not alot) engineering effort, the cost of the tools themselves should be close to zero. Secondly, if you release cycle is more than 2 weeks – 3 weeks from written idea to implementation your process does not work. While there may be various IT heroes that expend monumental effort to make the product function, your development process does not work in a healthy sense negating the “what we do works” argument. Lastly is the argument that since “we” don’t have the resources of google or facebook we can’t possibly adopt a technology this complicated. The demo above was done by one person in about a weeks worth of effort. The tools are available and they use standard technologies. All the pieces exist and are well tested so you don’t need an engineering staff the size of google to implement CI, refuting some of the most common arguments against CI and DevOps.

Linux 3.6 and EC2

EC2 is xen based. For the most part people are using ancient kernel that has been back patched all to wazoo. 2.6.18 and 2.6.28 being really popular. With my laptop sitting in the 3.5 series I was hoping to get my clould VMs somewhere in that range as well. I know that xen went native in the kernel in the 3 series so it is possible.

I lined up my caffeine and started to prepare for a long series of failed boots as I tinkered with my kernel setting moving my 2.6.28 config up into the 3.x series of kernels. Then, out of sheer whimsy I decided to see what the gentoo overlays had in store. Lets see if someone else can do the heavy lifting. Sure enough the sabayon-distro overlay is just what the doctor ordered. In it there is a ebuild for the 3.6 kernel with the appropriate config.

Since I mint my own EC2 images from scratch I have a chroot environment on a server at home to build said image. Before you embark this blog post large body of work is assumed to havebeen done. Specifically you are familiar with Gentoo, that you know how layman works in Gentoo, and lastly you know how to gen up an EC2 gentoo image from thin air.


chroot <path to my ec2 image> /bin/bash
layman -s sabayon-distro
emerge -av ec2-sources
eselect kernel list
eselect kernel set <number next to linux-3.6.0-ec2>
cd /usr/src/linux
#this is a hack to make sure genkernel gets the right config
cp .config /usr/share/genkernel/arch/x86_64/kernel-config
genkernel --makeopts=-j<number of cpus +1> all
#in /boot kernel-genkernel-x86_64-3.6.0-ec2 initramfs-genkernel-x86_64-3.6.0-ec2 should now exist

There are two paths to follow now, you are either upgrading an existing system or you are creating a new ami from scratch. and uploading it. First I will cover the upgrade scenario.

Upgrade

Here is the tricky part, if you are upgrading from 2.6.x to the 3.x series of kernels the device names change for hard drives.  You have two options, go into the inner workings of udev and make drives show up as /dev/sd[a-z]+[0-9]+ or modify grub and fstab accordingly. I went with the latter. First backup your instances’ system drive b snapshotting the current running volume. This way you can get back to the original VM by creating a new AMI. Next, I copied the kernels up to my EC2 instance from my chroot vm work environment and placed them in /boot on my EC2 machine. Next I need to move up the kernel modules.

In my chroot environment:

cd /lib/modules
tar cvf modultes.tar 3.6.0-ec2
scp modules.tar root@myec2instance:/lib/modules

SSH to the EC2 instance:

cd /lib/modules
tar xvf modules.tar

I then changed all the /dev/sda[0-9] entries in /etc/fstab to /dev/xvda1 and made the same change to /boot/grub/menu.1st

kernel /boot/kernel-genkernel-x86_64-3.6.0-ec2 root=/dev/sda1 rootfstype=ext3

changed to:

kernel /boot/kernel-genkernel-x86_64-3.6.0-ec2 root=/dev/xvda1 rootfstype=ext3

Next you reboot.  If all goes well delete your backup snapshot.

New AMI

If you want a new AMI, just make the changes to /etc/fstab and /boot/grub/menu.1st in your chroot environment ant follow the procedure for beaming it all up there in one of the EC2 custom image links listed above.

In Closing

Not only do you have a way to upgrade to 3.6, but now you can make continual kernel tweekes to your EC2 instance in a relatively safe manner.

Posted in AWS

AWS updates on the cheap

Gentoo Build Server/Compute

Intro and Prep

Gentoo compiles everything. After I uploaded my gentoo custom image I realized updating on a 634MB 1 virtual cpu machine may be a bad thing. I would hit the IO limit or the CPU limit and actually have to pay money, ugh. From rich0’s blog lets assume your first instance is up and running, has been for a while, and now needs an update, so keep the chrooted environment you created in step 7.

Exposing your update server

Now you need to have an apache web server set up. It does not need to be exposed to the internet, but that makes it easier. Go into the apache document root and create a symlink into the chrooted environment. Specifically where emerge and quickpkg leave their package files. In my case I did this:
ln -s bounce-bin /home/binserver/bounce/home/portage/distfiles/

Where /home/binserver/bounce is my chrooted environment and /home/portage/distfiles/ is my PKGDIR in make.conf. If you chose the default I believe it would be usr/portage/distfiles in your chroot. Next we need to expose the portage directory of the update server via rsync. First install rsync on the same box as your apache server. Second, share the portage tree of your chroot. For me I have the following configuration in rsync.conf:

[bounce-bin]
path = /home/binserver/bounce/usr/portage
comment = Bin server for outpost.

This is an important point, it will make your EC2 instance sync with your update server instead of the gentoo tree. This keeps your generated package versions in sync with the EC2 portage tree.Your update server should now be ready to go. If you have an internet connection that allows rsync and http make sure they are mapped to the right ports.

Lastly, chroot to the update environment, mount the special file systems and update. For me that process is:

cd <chroot env>
mount -t proc none <chroot env>/proc
mount -o bind /dev <chroot env>/dev
chroot <chroot env> /bin/bash
emerge --sync
emerge -avuNDb world
cd /usr/portage
find /usr/portage -maxdepth 2 -type d | sed -e 's/\/usr\/portage\/.*\///g' | xargs -P1 quickpkg --include-config=y
chown -R apache:apache /home/portage/distfiles

Now we are ready to update.

The EC2 image

Now you need to make your make.conf file point to your update server. Go to whatsmyip.org and get your external ip. In make.conf :

PORTAGE_BINHOST="http://<external ip>/binhost/bounce-bin"
SYNC="rsync://<external ip>/gentoo-portage"

Now on the EC2 image:

emerge --sync
emerge -avGK world

This should download all your updates and install them. Hopefully you are now up to date!

Isolated Compute Server

You have two options, tar or ssh port forwarding magic. If you choose the tar option, you are basically coping the compute server’s portage tree and package directory up the the EC2 instance. If you use the ssh forwarding method, you are basically substituting the <external ip> in the make.conf example for localhost.

SSH Forwarding

This method is preferred, if you the tar method and forget a package you need to move another binary up to the EC2 server. If you use ssh tunnelling, you just run emerge again. In order for ssh forwarding to work  an ssh server needs to be running on the EC2 instance, the EC2 instance must have an addressable ip from your compute server(public or internal vpn) and the ssh port must be allowed through all firewalls between the compute server and the EC2 instance. Once all the prerequisites are met you can do the following:

compute server> ssh -R 873:localhost:873 -R 80:localhost:80 root@ec2tinstance
emerge --sync
emerge -avuNDGK world

Tar method

On the compute server:


cd /
tar cvf /tmp/portage.tar /usr/portage /home/portage/distfiles

Then move the tar ball up to the ec2 server.

cd /
tar xvf portage.tar
emerge -avuNDGK world

-Glenn

Posted in AWS

Welcome to “the cloud”

I have been doing configuration management for over 10 years. With the explosion of “the cloud”, configuration management has moved from an exercise in careful planning and policy to an engineering effort. Most enterprises are slow to pick up on this shift in the industry. This blog and the next series of approaches details several case studies and practical examples.  I hope you enjoy the ride.

-Glenn

 

Posted in AWS