Website Deface Detection Script

You’re going to get hacked on the weekend? Panic mode activated.

Today I’m going to show you how easy it is to get a “bot” up and running that watches your website and alerts you if you get hacked (defaced). This bot is going to kindly notify you over telegram. [Skip to the technical part]

The Story (What was the problem I was trying to solve?)

I suppose all admins and techies have had this nightmare at least once in their career where there’s an exploit raging in the wild and services and websites of companies in the same business are going down one by one. Every time you visit your website and it takes just a couple of seconds longer to load, you panic that you are hacked.

We all know software updates are mandatory if you want to keep an internet facing service up and running for a considerable amount of time. An out-of-date software specifically a well-known one is safe to be considered compromised after couple of months. WordPress, Joomla, DotNetNuke and many other famous CMS software are prime target of attackers.

Breaking into an instance of the mentioned CMS software is sometimes as easy as finding a public exploit somewhere on the internet and running it, then you’re in. We all know that we have to update and harden our systems but without proper process in place this task is usually postponed or totally forgotten about.

“Software Update” or not to “Software Update”?

As much as we know that software updates are good and sometimes necessary, it sometimes breaks things. I’m not talking about updating your client software or the application on your mobile phone. In the enterprise world updates can be horrible at times. They can take your system down, what if you restart and the application never comes up again? Sometimes you don’t want to touch a running software in fear of breaking it. If this is a legacy software and you have to update several major versions then surely some things will stop working.

There’s two sides to this story:

  • Vendors pack security updates along with feature upgrades in one package
  • Security updates sometimes require software architecture to be modified

For the above reasons, software updates sometimes break things.

Back to our main story, fear of getting hacked. Past weekend I faced the same challenge. I had to make sure we get informed ASAP if our website gets compromised. I was told to monitor the website manually by visiting it every 2 hours or so. A cumbersome tasks for weekend. Since I automate anything that can be automated, or at least I try, I went about developing a software to monitor our website and notify me in case something fishy happens.

Website Deface Detection – The Theory

DF3

We want to test to see if a website has been defaced. What usually happens is that the whole homepage is replaced with another page which says “You got owned”. There are several techniques and articles about deface detection. I didn’t have time to design and develop a complex machine learning system what I needed was minimal and effective.

Now let’s see some possible algorithms and methods to determine if a website has been hacked:

1. Checksum verification

This method is mostly helpful when you’re working with static pages. You get the content of the webpage, compute a checksum say md5 hash and compare it with your consequent requests and if the page changes the checksum would change.

The limitation of this method is that most websites are dynamic nowadays, no two consecutive requests yield you the exact same output. Date and time shown on the pages change, ViewState variable in .Net websites change with each request. Some CMSs change ids and css class names upon each request.

2. Diff comparison

Diff’ing means comparing two responses from a website and measuring the difference in them. You could count number of different characters, words or even HTML tags. In this method it is also important to define a threshold after which you consider sending an alert. This threshold could be ‘x’ percent difference in number of characters or ‘y’ number of tags added/removed.

2.1. Simple Diff

In Simple Diff method you count number of characters or words that were modified.

2.1. DOM Tree Comparison

In DOM Tree comparison method you don’t simply measure the characters and words that were changed. You parse the DOM and find the tags that were modified. This approach could be more precise but is more costly to implement.

3. Sensitive words detection

Deface pages have a lot in common, they almost certainly contain the word hacked. Based on this observation we can generate a list of words that if seen on a web page can alert us for possible deface.

Combining these methods and fine-tuning the thresholds and other parameters will give us a detector with low false positives and false negatives making our life much easier. Ok enough theory let’s get into the real deal and start getting our hands dirty.

Implementing the “Website Deface Detection Bot”

Let’s review what we’re trying to achieve. We want to download a website, use simple diff method to compare it with a clean version of the site and if the change is over a certain threshold we want the tool to notify us via telegram.

For this, our script will use curl to download a clean baseline version of the site and then in a period, the script will re-download the website and compare it to the baseline. If the threshold is passed telegram bot comes into play. So for the first part we have to get a clean baseline of our website(s).

1. Generating the baseline

We put the list of urls we want to monitor in a text file named urls.list along with  the filename of their snapshot, these files are kept in a directory named baseline.

“urls.list” will look like this:
https://www.bankofamerica.com/ boa.html
https://www.wellsfargo.com/ wf.html

* notice “boa.html” and “wf.html” are names of our local files and not part of the url.

The code below will get urls and filenames from urls.list and use curl to capture the websites and put them in baseline folder. At the time of this test, we know that these websites are running normal. We don’t want one website to keep us waiting for minutes so the connection timeout for each website is set to 5 seconds using connect-timeout switch.

while read -r url filename tail; do
  curl --connect-timeout 5 -o "baseline/$filename" "$url"
done < urls.list

The directory structure will look like this:

./MonitoringBot
./MonitoringBot.sh
./urls.list
./baseline
./boa.html
./wf.html

2. Comparing current status of websites with our baseline

Now we have to check the websites periodically and compare the results with our baseline.

#!/bin/bash
counter=1
threshold=20
while read -r url filename tail; do
  curl -s --connect-timeout 5 -o"./tmp/$filename" "$url"
  wdiff -s "./tmp/$filename" "./baseline/$filename" > ./results/$filename
  cat ./results/$filename | grep "% changed" > ./results/change.txt
  change=$(head -n 1 ./results/change.txt)
  spacesplitarr=( $change )
  changepercent=${spacesplitarr[10]}
  empt=""
  result=${changepercent//%/$empt}
  echo $result > ./results/$filename
  if [ $result -gt $threshold ]
  then
    echo "Please check $url"
    echo "Please check $url More than $threshold % change detected." | ./notify.sh
  fi
  rm ./results/change.txt
  lineCount=$(cat ./urls.list | wc -l)
  echo "($counter/$lineCount): $filename done. $result % Change."
  counter=$((counter+1))
done < urls.list

In the code above we get current snapshot of the website,  save it to “./tmp/$filename” and then compare it with our baseline. Note that we use wdiff which compares the words in two files. One better approach would be to parse and compare the DOM.  We parse the result to get the output which contains the percentage of change in the two snapshots of the websites. We then have a threshold set to 20% and if the change exceeds this amount we notify the user in the terminal and also using the “notify.sh” script which sends a message using telegram bot. We also print our progress in the terminal saying (3/15) links checked for example.

This threshold can be played with to find the optimal value. websites typically have a small amount of change but 20% or 50% could be good values to start with.

3. Notifying the user via Telegram Bot

9ff2f2f01c4bd1b013

 

Telegram has put up a nice guide about its bots. Bots are like normal telegram users which can send messages, receive commands and act upon them. In order to create a telegram bot you have to contact @BotFather. This bot itself is responsible for creation of bots. There are lots of features available for bots so take a look at the provided link. For now we want to create a simple bot that sends messages.

3.1. Creating Telegram Bot

  • Obtaining Access Token

Contact @BotFather using your telegram account. @BotFather will start the conversation and guide you through the process. Upon completion you will be provided with an access token for the HTTP API. You’re good to go and your bot has been created successfully, write down your bot’s access token.

  • Getting chat id

We need chat id to tell the bot where to send its messages. For this we can use @get_id_bot. Start a chat with @get_id_bot and send /my_id command. It will give you a numerical id that we will use later on.

If you want your bot to send messages to a group instead of personal message. You will need to add @get_id_bot to that group, use /my_id command and use the specific id provided for that group.

4.  Notifying the user via Telegram Bot

Now that we have our access token and chat id we can use the following script:

#!/bin/sh
#
#  This script sends whatever is piped to it as a message to the specified Telegram bot
#
message=$( cat )
apiToken=
# example:
# apiToken=123456789:AbCdEfgijk1LmPQRSTu234v5Wx-yZA67BCD
userChatId=
# example:
# userChatId=123456789

sendTelegram() {
        curl -s \
        -X POST \
        https://api.telegram.org/bot$apiToken/sendMessage \
        -d text="$message" \
        -d chat_id=$userChatId
}

if  [[ -z "$message" ]]; then
        echo "Please pipe a message to me!"
else
        sendTelegram
fi

5.  Running The Bot

In order to run the bot there’s a file named scheduler.sh, it runs diffsites.sh every 5 minutes.

#!/bin/bash
while true
do
 sh diffsites.sh
 sleep 300
done

To run the code in the background use nohup, the output of the script will be written to nohup.out file:

nohup ./scheduler.sh &

Now you can sit back, have a cup of coffee and your bot will keep monitoring the websites and alert you once it sees a change.

Final Code

The final code is available on github at https://github.com/silverfoxy/DefaceDetectionBot

[Download as zip]