A Hack By Any Other Name — Part 5

 Part 4 described the technology behind the Skeptical Science website and our initial response to the hack.

 

 

 

 

Detective Del Spooner: Hansel and Gretel.
Susan Calvin: What?
Detective Del Spooner: Two kids, lost in the forest. Leave behind a trail of bread crumbs.
Susan Calvin: Why?
Detective Del Spooner: To find their way home. How the hell did you grow up without reading Hansel and Gretel?
Susan Calvin: Is that really relevant?
Detective Del Spooner: Everything I'm trying to say to you is about Hansel and Gretel. You didn't read it, I'm talking to the wall.
I, Robot (2004)

A Needle in a Field of Haystacks Made of Needles

Many websites use the Apache server program to deliver web pages and files to users or to execute programs that create those pages on the fly.  The Apache program, and indeed any web server, writes log files.  Those log files (commonly called the "apache logs") normally contain a fair amount of limited information on every request made by every visitor to the site, from those for web pages to every image or javascript script file or other file included on the web page.  This includes a host of information about the visitor which is of great value in compiling statistics about visitors.  It’s not complete, but it’s a start.

Here is one of the hacker’s entries from February 23rd:

77.247.181.165 www.skepticalscience.com - [23/Feb/2012:04:52:05 +1100] "GET /comments.php HTTP/1.1" 200 22031 "https://skepticalscience.com/" "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.0" "PHPSESSID=ab1a5faa88ac1878784dcfa719dca226; __utma=198451757.12232104.1329923284.1329923284.1329923284.1; __utmb=198451757.52.10.1329923284; __utmc=198451757; __utmz=198451757.1329923284.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); expanded_dir_list=%3A%3Ahome%3A7-web%3A74%3A95%3Askepticalscience.com%3Apublic%3Awww%3A%3Apics; fm_root_atual=%2Fhome%2F7-web%2F74%2F95%2Fskepticalscience.com%2Fpublic%2Fwww%2F%2F; loggedon=d41d8cd98f00b204e9800998ecf8427e; order_dir_list_by=6D; UserId=6318" 882 22405 urchindyn www.skepticalscience.com

Here’s that same entry, but after I’ve highlighted some key elements to make them easier for the untrained eye to find:

77.247.181.165 www.skepticalscience.com - [23/Feb/2012:04:52:05 +1100] "GET /comments.php HTTP/1.1" 200 22031 "https://skepticalscience.com/" "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.0" "PHPSESSID=ab1a5faa88ac1878784dcfa719dca226; __utma=198451757.12232104.1329923284.1329923284.1329923284.1; __utmb=198451757.52.10.1329923284; __utmc=198451757; __utmz=198451757.1329923284.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); expanded_dir_list=%3A%3Ahome%3A7-web%3A74%3A95%3Askepticalscience.com%3Apublic%3Awww%3A%3Apics; fm_root_atual=%2Fhome%2F7-web%2F74%2F95%2Fskepticalscience.com%2Fpublic%2Fwww%2F%2F; loggedon=d41d8cd98f00b204e9800998ecf8427e; order_dir_list_by=6D; UserId=6318" 882 22405 urchindyn www.skepticalscience.com

From this entry we can see that the hacker’s IP address (at least at that moment) was 77.247.181.165.  The time was 4:52 AM, AEDT, on February 23rd (the server resided in Sydney, Australia, at the time, so all times are AEDT).  The web page he was hitting was comments.php, which is the Skeptical Science page that lists the most recently added comments.  It’s available in the menubar at the top of every page, labeled as “Recent Comments”.  He successfully accessed the page, as demonstrated by the 200 status code, and the size of the resulting page that he was actually served was 22,031 bytes.  His browser is being reported as Firefox, version 5.0, for Windows.  I say “reported” because it is up to the browser to supply this information to the server, and as such it can be spoofed.  In fact, that’s exactly what the Tor browser does; it reports a common “user agent” (which is what this piece of data is called) rather than identifying itself as the Tor browser.

...it is up to the browser to supply this information to the server, and as such it can be spoofed.

After the user agent comes a complex list of the visitor’s cookies.  Beyond his IP address, within the cookies there are two very important identifying pieces of information, which may or may not be present in an entry.  His session ID was ab1a5faa88ac1878784dcfa719dca226.  This is a unique, randomly picked identification ID for every user’s visit to the site.  It’s what lets a website tie all of your activity together, such as the fact that you’ve logged in and who you are.  It automatically expires after 30 minutes of inactivity, to be replaced by a new ID, should you return.

His user ID, as you can see, was 6318.  That’s “francois,” but the Skeptical Science programs don’t use the user’s actual ID behind the scenes, in the programs and in most database tables.  A number associated with each ID is used instead.  The ID number associated with the hacker’s francois user name was 6318.

These are vitally important things to know, because they lets us tie the hacker’s activity together.  Because his IP address (and other data) was constantly changing, you can’t just find every entry for IP address 77.247.181.165.  And that’s only a fraction of what he changed.  Various aspects of his session — his footprints — changed over time.  Consider the following, subsequent log entries, from which I’ve filtered out some confusing, intervening entries, in order to make my point more clear:

87.225.253.174 www.skepticalscience.com - [23/Feb/2012:04:52:23 +1100] "POST /sksadmin.php?Action=Edit&UniqueIdentifier=1&TableName=topic&Search= HTTP/1.1" 200 34372 "-" "FAST Enterprise Crawler/6 (www.fastsearch.com)" "UserId=4955" 316 34780 urchindyn www.skepticalscience.com

77.247.181.163
www.skepticalscience.com - [23/Feb/2012:04:56:27 +1100] "POST /sksadmin.php?Action=Edit&UniqueIdentifier=24&TableName=topic&Search= HTTP/1.1" 200 34373 "-" "FAST Enterprise Crawler/6 (www.fastsearch.com)" "UserId=4955" 317 34781 urchindyn www.skepticalscience.com

77.247.181.163
www.skepticalscience.com - [23/Feb/2012:04:57:53 +1100] "GET / HTTP/1.1" 200 20474 "-" "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.0" "__utma=198451757.12232104.1329923284.1329923284.1329923284.1; __utmb=198451757.53.10.1329923284; __utmc=198451757; __utmz=198451757.1329923284.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)" 520 20912 urchindyn skepticalscience.com

77.247.181.163
www.skepticalscience.com - [23/Feb/2012:04:59:46 +1100] "GET /thread.php HTTP/1.1" 200 7994 "-" "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.0" "__utma=198451757.12232104.1329923284.1329923284.1329923284.1; __utmb=198451757.54.10.1329923284; __utmc=198451757; __utmz=198451757.1329923284.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); PHPSESSID=ba02044b4d80154303866f6fc0da1017" 574 8367 urchindyn skepticalscience.com

In these entries you can see that by this point his IP address had shifted to 87.225.253.174.  His user ID is now 4955.  He’s begun using a completely different user agent, one he has not bothered to mask, something called “FAST Enterprise Crawler,” which is a program used to automate the rapid download of many pages from a website.

...tying together all of the hacker’s activity is a bit of a puzzle

Some minutes later (I removed many intervening “Fast Enterprise Crawler” entries for that same IP address and several others), his IP address has changed to 77.247.181.163, while his user ID remains 4955, but he has avoided getting (or maintaining) a session ID — an artifact of the Crawler program he is using.  A minute and a half after the last Crawler hit, his user agent goes back to his Firefox browser, but his previous session ID and User ID are gone.  The cookies don’t carry over from the Crawler to his browser, of course, but the IP address relates the two.  A little while after that, he’s begun a new session (ID ba02044b4d80154303866f6fc0da1017) , without logging on as any user.

As you can see, tying together all of the hacker’s activity is a bit of a puzzle.  One needs to consider all of the combinations of IP addresses, session IDs, user IDs, and some other tricks to find all of the different spots in the logs that represent all of the hacker’s moves.

Beyond this, something which is not apparent to the reader, because I’ve saved you from that complication, is that these logs are intermixed with all of the activity of all of the visitors simultaneously accessing the site along with the hacker.  Between the first and last of the four log entries shown above were 10,556 such entries for various visitors, of which only 102 belonged to the hacker (and that number is only that high because he was using FAST Enterprise Crawler at the time to automate his actions.

...at the beginning of the search, we did not know when the hack took place, or how to identify any of the hacker’s activity.

The full log for the day of February 23rd had 1,100,905 such entries.  Finding the hacker in that is like finding a needle in a haystack of needles.

But it’s worse than that.  To use the Hansel and Gretel analogy, at least the siblings started at one end of the trail of breadcrumbs.  In our case, at the beginning of the search, we did not know when the hack took place, or how to identify any of the hacker’s activity.  The starting point could have been in late or early March, February, January or before.  Before we could find the breadcrumb trail, we first had to wander through the forest, hoping to stumble across some recognizable fraction of it.

Or, to use the original analogy, our task was like finding a needle in a field full of haystacks made of needles.  The number of log entries for just the year of 2012 to that point was somewhere around 85 million.

 

Malvin: I can't believe it, Jim. That girl's standing over there listening and you're telling him about our back doors?
Jim Sting: [yelling] Mister Potato Head! Mister Potato Head! Back doors are not secrets!
Malvin: Yeah, but Jim, you're giving away all our best tricks!
Jim Sting: They're not tricks.
Wargames (1983)

March 26, 2012 — 2:27 PM EST — Backdoors

Even before the war room was ready, John, Doug and I were feverishly trying to riddle our way in the darkness (well, it was dark in Australia, but mid-day outside of Boston) as to how the hacker got in.  Like everyone else, we were distracted by the SQL injection log files, assuming that must have been the first point of entry.  That wasted a lot of our time, but it provided the starting point for properly tracing the hack.  We talked through a lot of options.  The obvious thought was that somehow the hacker got an injection log file, stole John’s credentials, and logged in that way.  Except that wouldn’t give you access to the entire database, only to John’s special administrative functions.  The altered compilation of the forum refuted that approach.

...the backups simply weren’t accessible through a browser, even if you knew where they were.

I asked John if the hacker could have gotten hold of a database backup, but the answer to that was an instant “no”.  First, the script which created the backups put them, as any such script should, into the document tree above the publicly accessible pages.  Unlike the SQL injection log files, the backups simply weren’t accessible through a browser, even if you knew where they were.  It was not possible.  Second, the backups were also quickly moved off site and deleted, as the last phase of the backup script.  There was no way the hacker got a database backup that way, unless he had web host administration credentials, which also would give him direct access to the database and he wouldn’t need a backup.

There were only two ways a hacker could get such credentials.

If the hacker either worked at or knew someone who worked at the web host service, he could steal John’s credentials that way.  This is how Edward Snowden accomplished much of his “hack.”  He just made use of an ID that had been briefly “loaned” to him by a (since fired) friend at the NSA.  One of the most famous hackers of all time, Kevin Mitnick, claims that he solely used such social engineering to steal credentials for unauthorized access.

Yet another possibility was that the hacker had hacked into either Doug’s or John’s personal home computer, somehow stole web host administrative credentials from there, and got into the database that way.

Neither of these options seemed likely.  One also couldn’t get such web host credentials from the SQL injection log files.  They only had John’s website credentials, which only let you do things through the application itself.  Along those lines, I asked John to bump up my user capabilities to match his, so that I could see exactly what someone could do by logging in as John.

It even let one view the contents of the table, although only ... fifty rows at a time

That led to the first issue we found.  There would usually be two ways of maintaining data in the system.  One would be “through the application,” through the pages of the web site itself.  But that would of course be limited to what John had programmed.  If he’d made a page to let you do something, then that was how you did it and all you could do.  If there was no page for doing something, then John’s other alternative was to access the database directly, through his web host credentials and tools.  But doing things that way was often cumbersome.

To get around that, John programmed one more thing, something that many a programmer, myself included, has done from time to time.  He gave himself a backdoor into the database, a generic administrative page that let him view and change any row in the database in any table, a “super admin panel”.  It even let one view the contents of the table, although only (like comments and forum threads) fifty rows at a time.  One couldn’t download the entire database, or even an entire table, but it made it easy to find, view and modify single table rows.  This was particularly important in the infancy of the system, when at first only he, and eventually Dan Bailey, handled all of the maintenance and user requests and everything themselves.

It couldn’t be the way that the hacker stole the entire database...

Something like that is a boon to a developer, but is also a horrible security risk on a public website.  Someone with John’s credentials would be able to study and corrupt the entire database.  It couldn’t be the way that the hacker stole the entire database (not 46,000 posts at 50 posts a page), but it was too dangerous to exist, let alone when we knew there was a hacker in the system.

I probably went overboard in my email to John, where I said:

“OH MY GOD THIS SHOULD NOT EXIST!!!  NO  NO NO!!!!  You need to kill this functionality. I'm sure it helps, but it can't be here or anywhere!”

John shut the super admin panel down immediately.  He also checked to see how many users had that capability, and the answer was “too many”, including a “test” ID he’d created years back, called uwatest — ID 4955.  He didn’t even remember having given it “super” capabilities.  Either way, he quickly disabled that ID.

The second security risk (after the SQL injection log files) had now been closed.

Again, this couldn’t have been the way in.  It might have helped the hacker out, but it didn’t get him into the locked room.  His way in was still unknown.

 

The Warlock: [to Matt] Why did you bring a cop to my command center?
John McClane: [laughs] Command center? It's a basement.
-- Live Free or Die Hard (2007)

March 26, 2012 — 5:12 PM EST — The War Room

While I was stomping out fires, Doug Bostrom was doing the truly important and more involved work of setting up a separate server that we could use to go through the log files and the programs to figure out what had happened.  Doug’s foremost job was to make that possible.

Doug set up a networked Linux computer, to which he copied all of the programs and the logs.  We and any others that we invited to help could connect to it to conduct research without either detection or interference by the hacker.  Doug even set up a working second copy of the entire website so that we could safely test our theories of how the intrusion occurred.

He also got a head start looking at the log files himself, which lead to our first scent of the hacker’s trail.

With our war room set up we were at last finally free to roll up our sleeves and start the hunt.

 

Taz 'Rat' Finch: How many languages do you speak?
Dr. Conrad Zimsky: Five, actually.
Taz 'Rat' Finch: Well, I speak one... One Zero One Zero Zero. With that I could steal your money, your secrets, your sexual fantasies, your whole life. Any country, any place, any time I want. We multitask like you breathe. I couldn't think as slow as you if I tried.
The Core (2003)

“grep sed gawk”

One more bit of technobabble is necessary to understand what transpired next.  As has been stated, the log files just for the year of 2012, up to the date the hacker released the files, amounted to more than 85 million entries.  There was no way that a human being was going to go through that and find anything.

To work with the files, the quickest, simplest tools are a set of standard Unix programs known as grep, sed and gawk (along with cut, more, cat, and some others that are too basic to mention).

“grep” lets a programmer quickly scan a file for a pattern.  For example, the following:

grep “^77\.247\.181\.165” logfile.txt

will find every log entry in logfile.txt that begins with the IP address 77.247.181.165.  The following will find all such entries, while excluding the various image and css files that just confuse the trail:

grep “^77\.247\.181\.165” logfile.txt | egrep -vi "\.gif HTTP/1.[01]|\.jpg HTTP/1.[01]|\.css HTTP/1.[01]|\.js HTTP/1.[01]|\.png HTTP/1.[01]|\.ico HTTP/1.[01]"

A programmer needs to know all about “regular expressions” to create the proper patterns, as well as being aware of important options like -v (find everything that doesn’t match the pattern).

“sed” lets a programmer quickly change a file, in this case to remove extraneous detail that makes it more difficult to read and focus on what matters.  For example, the following command removes many of the extraneous, confusing cookies:

sed ’s/__utm.*=.*[;”]//g’ logfile.txt >logfilecleaned.txt

The commands must often be strung together, such as the following, which finds all activity for user ID 6304 from every daily log file in February and March, and from it makes a list of the IP addresses used:

grep “[\” ]UserId=6304[;\”]” 0[2-3][0-2][0-9]s.txt | cut -d\  -f1 | sort | uniq >id6304ips.txt

Eventually, however, such commands become awkward and too limited.  “gawk” (or its ancestor, “awk”) does everything that grep and sed and the rest do, but with far more power.  gawk is a complete programming language/utility, which lets a programmer easily write more complex programs that can consider multiple patterns, multiple files or rows at one time, save discoveries and data in memory for later use, and more.  If “grep” is a machine gun, “gawk” is a main battle tank.

If you have quick fingers, know your regex (regular expressions), and know what it is you want to do, then scanning 85 million rows representing 90 different days chunked into 1,000 separate files becomes, if not a piece of cake, at least feasible.

 

Inspector: [looking for fingerprints on a keyboard] Dust the colon and the backslash key! Only geeks use those keys.
Antitrust (2001)

March 26, 2012 —  2:12 PM PST — Happy Hunting

At two in the afternoon in Seattle (I don’t know this, but I’m pretty sure it must have been raining), Doug sent me my own credentials and access information for making use of the war room.  He then mentally rolled up his sleeves, and dove into the logs.  While I was still stumbling around, trying to figure out what was where and what I was even looking at, he perused a number of things, ultimately running a search to find all accesses of the SQL injection log files.  At 3:58 PM PST he sent John and I an e-mail with the subject line “bingo”.  He’d located the first dump of any log file, on February 21st.  This looked, then, like it was the likely beginning of the intrusion.  It also gave us an IP address with which to start.

199.48.147.37 www.skepticalscience.com - [21/Feb/2012:10:07:41 +1100] "GET /logs/2012-02-21.txt HTTP/1.1" 200 3661007 "-" "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.0" "PHPSESSID=dc2cbe0b2abcf0f069b234019197c8ca; __utma=198451757.686892950.1329774687.1329774687.1329774687.1; __utmb=198451757.25.10.1329774687; __utmc=198451757; __utmz=198451757.1329774687.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); UserId=6304; expanded_dir_list=%3A%3Ahome%3A7-web%3A74%3A95%3Askepticalscience.com%3Apublic%3Awww%3A%3Aimages%3A%2Fhome%2F7-web%2F74%2F95%2Fskepticalscience.com%2Fpublic%2Fwww%2F%2F; fm_root_atual=%2Fhome%2F7-web%2F74%2F95%2Fskepticalscience.com%2Fpublic%2Fwww%2F%2F; loggedon=d41d8cd98f00b204e9800998ecf8427e; order_dir_list_by=1A" 920 3664947 urchindyn www.skepticalscience.com

Doug also discovered, while looking at the long list of log file downloads the hacker had conducted through February and March, that the hacker used a different IP address every time.  A quick check revealed that they were all Tor relay nodes.  They’d give no hint as to the hacker’s true identity, and would also make piecing his activity together that much more difficult.

...he went straight and unerringly to that log file, with no guessing involved.

We also looked for 404 errors, attempts by the hacker to download files that didn’t exist.  There were a few such errors throughout the month, reflecting the hacker’s groping attempts to grab files that had either already been purged, or else didn’t yet exist.  But there were no such errors on February 21st, and none at all prior to his first, successful grab of 2012-02-21.txt.  On the day he started, he went straight and unerringly to that log file, with no guessing involved.  This confirmed, once and for all, that the log files were only a subject of, not the cause of, the hack.

But most critically, we had a starting point in the hacker’s breadcrumb trail.

Things accelerated from there, except for the fact that we all had full time jobs, and worked in three different time zones.  As John was going to bed, I was just waking up, and Doug was in a deep sleep.  That alone greatly impeded communication and teamwork.  Even during our disparate daylight hours, we had many other things to do.  I, in particular, was knee-deep in my paying project at the time.  I had no time during the day, and little energy left over after five to help in researching the hack, and what time I did put into the hack sucked energy from me for everything else.

 

Susan Calvin: [panicking] Uhh... End Program! Shutdown!
Detective Del Spooner: [clicks remote, stereo off] Doesn't feel good, does it? People's shit malfunctioning around you.
I, Robot (2004)

March 27, 2012 — 9:45 AM, AEDT — Nerves

At 9:45 AM John reported yet another SQL injection attack consisting of 49 distinct queries, originating from Jakarta, Indonesia.  It was probably harmless, especially considering that we’d already been hacked, but it certainly added to our nerves.  We’d also received a foolish, taunting e-mail — one quickly traced to someone in Houston, Texas, someone who hadn’t the desire or skill to hide is IP address.  It was obviously not from the hacker, but it demonstrated the mind-set of the people who were enjoying the fact that we’d been hacked:

Date: 26 March 2012 2:18:31 AM AEST
Name: Hacked?
Email: Hacked@gmail.com
Message: There is more to come and you will not like it.

A little later — just after midnight in Seattle — Doug also discovered that the SQL injection log file from February 21st was corrupted.  Every log file begins with entries on or moments after  00:00:00 on that day, denoting the first activity on a site that is in constant use, around the world and around the clock.  But the log file from February 21st started with a query at 12:07 PM.  Twelve hours were inexplicably missing from that log file, although the explanation for why the file was incomplete was obvious.  The hacker had truncated the file to cover his tracks.

Part 6 describes the methods used by the hacker and details his activity on February 21st.

What had the hacker done during those missing twelve hours?

To be continued...

Posted by Bob Lacatena on Thursday, 13 March, 2014


Creative Commons License The Skeptical Science website by Skeptical Science is licensed under a Creative Commons Attribution 3.0 Unported License.