Defending Against SSH Brute Force Attacks

Just Trying to Host a Website

So here I am trying host a personal website once I figured out a little bit about amazon in 2010. After a month or two of poking around and figuring out how to get the AMI I want running everything looks fine. I can now self host all the pictures and videos of cats I’m wiling to pay for in S3 buckets. At pennies per gigabyte this is alot of cat video and I am very pleased.

Little Website, Little Website Let Me In

Like all good, or at lease paranoid admins I regularly troll all the logs on the system that I have running. I see the “normal” strange web requests on my apache server, but I keep that pretty up to date so I’m not concerned. After looking around for a while I see failed SSH logon attempts all over the place from ip addresses I don’t recognize. The big bad wolf is at my door. After consulting with a colleague at work I learn about fail2ban. This is a unix daemon that watches for events in logs, and then bans the ip address that causes certain log entries via iptables for a set amount of time. Fail2ban also emails me when this happens so I can keep track. SSH is the only service I have issues with since the instance is locked down. I don’t like to white list ip addresses for a cloud VM. I don’t have a VPN into amazon set up and the ip address I administer from often changes from my ISP and because I have administration scripts set up from my mobile phone. Fail2ban seems like the perfect solution. I use it to protect my AWS servers and my machines at home which have limited external access.

Nothing Lasts Forever

The fail2ban solution worked for over three years. Occasionally I would get a persistent brute force attack but on average I was banning about 6-7 ip addresses per day and they wouldn’t be back after the fail2ban cool down period of 3 hours. Then one day in 2014 I started banning hundreds of different ip addresses per day. My inbox quickly goes to 999+ unread messages.

In short, it looks like bot nets with many IPs are attacking the average person trying to brute force a password or using one of the man SSH daemon exploits out there to gain access. As a knee jerk reaction to stem the flow and give me time to think I give in and white list ip addresses I can log in through the amazon web console and my home ISP router. I hate this solution mainly to me because it seems like lazy thinking and I HATE MAINTAINING WHITE LISTS.

What Next (Knock Knock)

What concerns me most is the lack of access to my home servers. I’m a software consultant and access to my home network has thrown many a project a lifeline. It’s almost impossible to know your external ip address from a client’s site the night before if you’ve never been there. I remember a patch set that was submitted to gentoo linux a few years ago. It was an experimental change to the SSH daemon that allowed something called port knocking. The SSH server would appear to be down to the casual observer. When you wanted to connect you would reach out to your server on predefined ports in a predefined order. This communication called knocking is one way. You send the packets, and it appears they are discarded. However, if you used the right combination and order to knock, ssh would then start listening on port 22 and you could log in.

The method mentioned above was a specialized case for SSH. in the intervening years a more general solution was created called knockd. This daemon will enable port knocking for any listening daemon.

The Setup

First make sure you have console access to the machine, or in the case of AWS, don’t save the rules until you are sure that they work so a reboot can get you back in. The easiest way for me to set this up was to SSH into the machine I wanted to configure. I installed knockd unconfigured and went through the pre-setup checklist. This included setting up ip tables to deny all incoming connections and allow established connections to be maintained. If your rules were correctly set then your ssh connection should still be active but no new ssh connections can be established. For example:

iptables -I INPUT -p tcp -s 0/0 --dport ssh -m state --state ESTABLISHED -j ACCEPT
iptables -A INPUT -s 0/0 -j DROP

If you are better at iptables foo than I am then you can set the default policy to deny instead of the deny all rule, its neater and I’ll go to it as soon as I clean up some strangeness in my existing iptables setup.

[options]
 logfile = /var/log/knockd.log
[openSSH]
 sequence = 7324,4566,7446,4324
 seq_timeout = 5
 command = /sbin/iptables -I INPUT 1 -s %IP% -p tcp --dport 22 -j ACCEPT
 tcpflags = syn
 cmd_timeout = 10
 stop_command = /sbin/iptables -D INPUT -s %IP% -p tcp --dport 22 -j ACCEPT
[closeSSH]
 sequence = 9999,3333,7123,6467
 seq_timeout = 5
 command = /sbin/iptables -D INPUT -s %IP% -p tcp --dport 22 -j ACCEPT
 tcpflags = syn

The above will open the ssh port from a specific ip address when you hit the ports 7324,4566,7446,4324 in order and hard close the connection when you hit ports 9999,3333,7123,6467. My only suggestion here is that you pick ports > 1024 and < 65K. After this start the knockd daemon.

Testing It

Tail the knockd log and now try to open up the ssh daemon port using nc as follows:

nc -z myserver.com 7324 4566 7446 4324

You should see the following in the log for the open:

[2014-12-28 15:57] myclientip: openSSH: Stage 1
[2014-12-28 15:57] myclientip: openSSH: Stage 2
[2014-12-28 15:57] myclientip: openSSH: Stage 3
[2014-12-28 15:57] myclientip: openSSH: Stage 4
[2014-12-28 15:57] myclientip: openSSH: OPEN SESAME
[2014-12-28 15:57] openSSH: running command: /sbin/iptables -I INPUT 1 -s myclientip -p tcp --dport 22 -j ACCEPT

You should see the following in the log for a manual close or a timeout:

[2014-12-28 15:46] myclientip: openSSH: command timeout
[2014-12-28 15:46] openSSH: running command: /sbin/iptables -D INPUT -s myclientip -p tcp --dport 22 -j ACCEPT

Make sure you see the timeout close the connection or check iptables every once in a while to ensure you aren’t leaving past ip’s open due to a misconfiguration.

Conclusion

I keep fail2ban running just in case, the number of strange ssh login attempts has dropped to 0. The technique, for now, is very effective. Eventually I’m sure I’ll need a new defense, but I’m hoping for the next three or four years I’m good.

 

Backup – Getting data into Glacier

Backups

Off site backups are an often talked about and rarely done well item for small to medium enterprises. This is usually due to the cost of an offsite facility storage, complexity of backup software, and operational costs. However, offsite backups are critical to keep an enterprise running if an unfortunate event happens to hit the physical computing facility. Often hardware is a trip to BestBuy or the Apple store or Insert your favorite chain here,  but without your data what good is it? AWS offers a low cost solution to this problem with only a little bit of java and shell script knowledge. This post is split into two parts, getting the data in and getting the data out. Currently we will deal with getting your data safely into glacier.

Glacier – What it is and is not

Glacier is a cheap service specifically for bulk uploads and downloads where timeliness is not of the essence. Inventories are taken every 24 hours of your vaults, and it can take hours to get an archive back. To save yourself grief do not think of glacier as disk space, but rather a robot that can get you your data from a warehouse. For disk space S3 and EBS are what you want in the amazon world.

Decide what to back up

This part is usually unnecessarily difficult. The end product is a list of directories that will be fed to a shell script. You do not need everything. Typically you just need user data, transient data, and OS configs. Think of whatever the delta is from a stock OS image to what you have now…. back that delta up. If you want to backup a database, make sure you are using a dump and not relying on the file system to be consistent.

Stage/Chunk it

Here is a script that can do it:
 
 #!/bin/bash
 CPUS=`cat /proc/cpuinfo | grep processor | wc -l`
 DATE=`date +%m%d%Y-%H%M%S`
 RETENTION=2
 BACKUP_PREFIX="/home/nfs_export/backups"
 BACKUPDIR="$BACKUP_PREFIX/$HOSTNAME/inflight/$DATE"
 dirs=(/root /sbin /usr /var /opt /lib32 /lib64 /etc /bin /boot)
 if [ ! -d "$BACKUP_PREFIX/$HOSTNAME/complete" ] ; then `mkdir $BACKUP_PREFIX/$HOSTNAME/complete` ; fi
 `mkdir -p $BACKUPDIR`
 mount /boot
 let CPUS++
 SPLITSIZE=`cat /proc/meminfo | grep MemTotal: | sed -e 's/MemTotal:[^0-9]\+\([0-9]\+\).*/\1/g'`
 SPLITSIZE=$(($SPLITSIZE*1024))
 SPLITSIZE=$(($SPLITSIZE/$CPUS))
 SPLITSIZE=$(($SPLITSIZE/2))
 for dir in "${dirs[@]}"
 do
 TMP=$dir
 TMP=`echo $TMP | sed -e 's/\//_/g'`
 echo "(cd $BACKUPDIR; tar cvf - $dir | split -b $SPLITSIZE - backup_$TMP)"
 `(cd $BACKUPDIR; tar cvfp - $dir | split -b $SPLITSIZE - backup_$TMP)`
 done 
 echo "(cd $BACKUPDIR; find . -type f | xargs -n 1 -P $CPUS xz -9 -e )"
`(cd $BACKUPDIR; find . -type f | xargs -n 1 -P $CPUS xz -9 -e )`
`(cd $BACKUP_PREFIX/$HOSTNAME/inflight; mv $DATE ../complete)`
umount /boot
i=0
for completedir in `(cd $BACKUP_PREFIX/$HOSTNAME/complete; ls -c)`
do
echo "$completedir $i $RETENTION"
if [ $i -gt $RETENTION ] ; then
echo "(cd $BACKUP_PREFIX/$HOSTNAME/complete; rm -rf $completedir)"
`(cd $BACKUP_PREFIX/$HOSTNAME/complete; rm -rf $completedir)`;
fi
let i++
done

This script makes alot of decsions for you, all you really need to do is decide where you are staging your data and what directories you are backing up. This script will create a multi part archive that will be pretty efficient to produce on the hardware it is executed on.

Speaking of staging, you will need formatted disk space to hold the archive while it is being sent up to AWS. Typically you want to be able to hold a weeks worth of backups on the partition. Why a week? This is breathing room to solve any backup issues while not loosing continuity of your backups.

Possibly encrypt it

Take a look at the shell script. If you are extra paranoid use gpg to encrypt each archive piece AFTER they have been compressed. Due to the nature of encryption, you will negate the ability of the compression algorithm to work if you encrypt ahead of time.

Get it into Glacier

First we need to create a vault to put our stuff. Log into your AWS console and go to your glacier management. Select the create a new vault option and remember the name of the vault. Then go to IAM and create a backup user remebering to recorde the access and secret key. Now we are ready to start.

This is custom java program snippet based on the AWS sample code:

for (String s : args) {
try {
ArchiveTransferManager atm = new ArchiveTransferManager(client, credentials);
UploadResult result = atm.upload(vaultName, "my archive " + (new Date()), new File(s));
System.out.println(s + " Archive ID: " + result.getArchiveId());
}
}

The complete code and  pom file to build it are included in the git repository. The pom should compile this program into a single jar that can be executed with the java -jar myjar.jar command first we need to configure the program. Create a properties file in the directory you will be running the java command with the name AwsCredentials.properties to have your secret key, access key, vault name, and aws endpoint. It should look like this:

 
 #Insert your AWS Credentials
 secretKey=***mysecretkey***
 accessKey=***myaccesskey***
 vaultname=pictures
#for the endpoint select your region
 endpointname=https://glacier.us-east-1.amazonaws.com/



Lastly feed the java program a list of the archive files, perhaps like this:

cd stage; ls *| xargs java -jar glacier-0.0.1-SNAPSHOT.one-jar.jar



This will get all the files in the staging directory up to your vault. In 24 hours you will see an inventory report with the size and number of uploads. Remember to save the output, the archive IDs are used to retrieve the data later. Some amount of internal book keeping will need to be done to keep the data organized. Amazon provides a safe place for data in this case, and not an easy way to index or find things.

To be Continued….

Next post… getting it all back after the said Armageddon.

Code

All code is available, excuse the bad SSL cert:

https://coveros.secureci.com/git/glenn_glacier.git

broadcom-sta and Gentoo

I have a dell E6520 and I have come quite attached to my wireless. Recently I’ve upgraded my kernel to the 3.6 series and things went almost flawlessly except for the wireless driver. The driver wl.ko kernel module barfed in kernel land and did very very bad things. I was seeing something similar to this in the logs:


general protection fault: 0000 [#1] PREEMPT SMP
Modules linked in: nls_cp437 vfat fat usb_storage uas bbswitch(O) uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media tg3 libphy mei i2c_i801 lib80211_crypt_tkip wl(PO) cfg80211 lib80211 microcode acer_wmi sparse_keymap rfkill mxm_wmi pcspkr wmi ghash_clmulni_intel cryptd kvm_intel snd_hda_codec_hdmi snd_hda_codec_realtek kvm coretemp iTCO_wdt iTCO_vendor_support crc32c_intel snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd serio_raw sdhci_pci sdhci mmc_core soundcore lpc_ich psmouse joydev evdev battery ac acpi_cpufreq mperf processor ext4 crc16 jbd2 mbcache hid_generic hid_logitech_dj usbhid hid sr_mod sd_mod cdrom ahci libahci libata ehci_hcd scsi_mod usbcore usb_common i915 video button i2c_algo_bit drm_kms_helper drm i2c_core intel_agp
intel_gtt [last unloaded: nvidia]
NetworkManager[1733]: wpa_supplicant stopped
NetworkManager[1733]: (wlan0): supplicant interface state: inactive -> down
NetworkManager[1733]: (wlan0): device state change: disconnected -> unavailable (reason 'supplicant-failed') [30 20 10]
NetworkManager[1733]: (wlan0): deactivating device (reason 'supplicant-failed') [10]
systemd[1]: wpa_supplicant.service: main process exited, code=killed, status=11
kernel: CPU 0
kernel: Pid: 1737, comm: wpa_supplicant Tainted: P O 3.6.2-1-ck #1 Acer Aspire 5750G/JE50_HR
kernel: RIP: 0010:[] [] wl_cfg80211_scan+0x8c/0x480 [wl]
kernel: RSP: 0018:ffff880159eb5978 EFLAGS: 00010202
kernel: RAX: ffffffffa085f290 RBX: ffff8801580cd200 RCX: ffff8801580cd200
kernel: RDX: ffff8801580cd200 RSI: ffff88013c912000 RDI: ffff8801580cd200
kernel: RBP: ffff880159eb59b8 R08: 00000000000162c0 R09: 000000000000007c
kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0084161c00000001
kernel: R13: ffff88013c912000 R14: ffff88013c912000 R15: 0000000000000000
kernel: FS: 00007f37e3427700(0000) GS:ffff88015fa00000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000001bdb6e8 CR3: 000000013c839000 CR4: 00000000000407f0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
kernel: Process wpa_supplicant (pid: 1737, threadinfo ffff880159eb4000, task ffff88015823a080)
kernel: Stack:
kernel: ffff880159eb5a18 ffffffffa03fd5d8 ffff880159eb59c8 ffff880159eb5a38
kernel: ffff8801580cd000 0000000000000001 ffff88013c912000 0000000000000000
kernel: ffff880159eb5a18 ffffffffa04022f5 000000000000007c 0000000000000004
kernel: Call Trace:
kernel: [] ? nl80211_pre_doit+0x318/0x3f0 [cfg80211]
kernel: [] nl80211_trigger_scan+0x485/0x610 [cfg80211]
kernel: [] genl_rcv_msg+0x298/0x2d0
kernel: [] ? genl_rcv+0x40/0x40
kernel: [] netlink_rcv_skb+0xa1/0xb0
kernel: [] genl_rcv+0x25/0x40
kernel: [] netlink_unicast+0x19d/0x220
kernel: [] netlink_sendmsg+0x30a/0x390
kernel: [] sock_sendmsg+0xda/0xf0
kernel: [] ? find_get_page+0x60/0x90
kernel: [] ? filemap_fault+0x87/0x440
kernel: [] __sys_sendmsg+0x371/0x380
kernel: [] ? handle_mm_fault+0x249/0x310
kernel: [] ? do_page_fault+0x2c4/0x580
kernel: [] ? restore_i387_xstate+0x1af/0x260
kernel: [] sys_sendmsg+0x49/0x90
kernel: [] system_call_fastpath+0x1a/0x1f
kernel: Code: 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 90 48 8b 86 48 02 00 00 48 85 c0 0f 84 6c 03 00 00 4c 8b 20 4d 85 e4 0f 84 2f 03 00 00 8b 84 24 a8 0a 00 00 4d 8b b4 24 48 06 00 00 a8 02 75 60 49
kernel: RIP [] wl_cfg80211_scan+0x8c/0x480 [wl]
kernel: RSP
kernel: ---[ end trace 62b60f7a71b18301 ]---

The solution I implemented was taken from the gentoo bugtracker and fourms. Below is the reference information.
http://forums.gentoo.org/viewtopic-t-939648-start-0.html
https://bugs.gentoo.org/show_bug.cgi?id=437898
and the solution here:
https://437898.bugs.gentoo.org/attachment.cgi?id=326502

At a high level I created a local overlay, created my own broadcom-sta ebuild and re-emerged the driver.

This is how I implemented the path step by step:

  • vi /etc/make.conf

add:
PORTDIR_OVERLAY="
/var/lib/localoverlay
$PORTDIR_OVERLAY

  • mkdir -p /var/lib/localoverlay/net-wireless/broadcom-sta
  • cd /var/lib/localoverlay/net-wireless/broadcom-sta
  • cp /usr/portage/net-wireless/broadcom-sta/broadcom-sta-5.100.82.112-r2.ebuild /var/lib/localoverlay/net-wireless/broadcom-sta/broadcom-sta-5.100.82.112-r3.ebuild
  • vi /var/lib/localoverlay/net-wireless/broadcom-sta/broadcom-sta-5.100.82.112-r3.ebuild
  • I manually added the patch at https://437898.bugs.gentoo.org/attachment.cgi?id=326502.
  • emerge -v broadcom-sta

– done

Enjoy!

Attack of the Zombie ssh client

Passively sitting and watching my logs I notice the following repeated thousands of times:

Dec 4 08:12:59 apache sshd[15823]: SSH: Server;Ltype: Version;Remote: <someip>-34052;Protocol: 2.0;Client: libssh-0.1
Dec 4 08:12:59 apache sshd[15823]: SSH: Server;Ltype: Kex;Remote: <someip>-34052;Enc: aes128-cbc;MAC: hmac-sha1;Comp: none [preauth]
Dec 4 08:12:59 apache sshd[15823]: SSH: Server;Ltype: Authname;Remote: <someip>-34052;Name: root [preauth]
Dec 4 08:12:59 apache sshd[15823]: Received disconnect from <someip>: 11: Bye Bye [preauth]

Seeing a thousand of anything in my log file besides cron is disconcerting since I run fail2ban. After some research I found the following article: http://taint.org/2008/05/16/165301a.html

According to the article this is insidious because the attack doesn’t log a failure. Its trying to break the host ssh key so it aborts mid transaction. Rather than subject myself to this I figured I could try and add a fail2ban rule and block further attempts. In my /etc/fail2ban/filter.d/sshd.conf file I added the following line to the failregexp:

^%(__prefix_line)sReceived disconnect from <HOST>: 11: Bye Bye \[preauth\]\s*$

Its not perfect but it does what I want. The down side is if you log out legitimatly your fail2ban tolerance in the watch period you will be banned. I’m ok with that imitation. One more attack down… ugh.

 

 

 

Random Enough Passwords

OK, as a side duty to many of the roles I fill, I wind up installing and administering countless small apps, vms, and physical machines. I don’t want a system I created to be hacked because 1234 was a secure enough password. One side effect of this is that I must now use a password manager and back it up. God help me if I loose that file. Additionally, this has caused my internal entropy for generating passwords to drop to 0. In other words, I’m tired of thinking of random passwords. Thanks to an article here: http://blog.colovirt.com/2009/01/07/linux-generating-strong-passwords-using-randomurandom/ , now I don’t have to. To sum it up:


#!/bin/bash
PASSLEN=5
cat /dev/random| tr -dc 'a-zA-Z0-9-_!@#$%^&*()_+{}|:<>?='|fold -w $PASSLEN| head -n 4| grep -i '[!@#$%^&*()_+{}|:<>?=]'

This script will produce a length 5 password and give you 4 different passwords to choose from. This will generate a really random set of passwords, but you must generate entropy by using your system. If this is to slow for you use /dev/urandom instead since random blocks until things are random enough. This is not statically perfect so don’t use this for anything requiring true randomness. If you don’t know what /dev/random or /dev/urandom are this is not the post for you.

-Glenn