"Crack Me If You Can"

Back to [Teams] [Top]

Team john-users

Link to original writeup (external)

Resources

Active Members	16
Nicks	Aleksey Cherepanov, bartavelle, Brad Tilley (team 16Crack), elijah, Frank Dittrich, groszek, guth, Isif, JimF, Matt Weir, RichRumble, samu, Sergey, smooge, Solar Designer, ukasz
Software	John the Ripper, custom scripts, 16Crack
Hardware	Average of 150 CPU cores, 4 GPUs.

Preface

The contest was fun and challenging, it helped us test some experimental John the Ripper code and identify areas for further improvement. As of this writing (August 18, 2011), we already have experimental patches implementing MSCash2 in CUDA (thanks, ukasz) and implementing pkzip encryption cracking (thanks, JimF). We didn't have those prior to and during the contest...

We'd like to thank KoreLogic for organizing the event. We would also like to thank all other teams who participated and made it tough for us to compete. ;-)

Resources

In addition to the 16 active team members listed above, Brandon Enright contributed four 8-core Amazon EC2 instances (32 cores total) and Michael Boman provided remote access to a quad-core machine.

Software: John the Ripper (with various patches), custom scripts, 16Crack (used by Brad only), pdfcrack (no luck), fcrackzip (no extra cracks compared to trivial shell scripts around unzip), rarcrack and crark (no luck, but JtR cracked the password instead), ElcomSoft's password recovery tools (no additional cracks)

Hardware: mostly 8-core servers (some of them also doing something else at the same time), but also all other kinds of machines (desktops, laptops, servers) ranging from dual-core to 12-core, Amazon EC2 instances mentioned above. 3 low-end to mid-range NVidia GPUs (used only on phpass hashes using john-1.7.8-allcuda-0.2 by ukasz), one ATI Radeon HD 5770 (used for real-world'ish testing of john-1.7.8-jumbo-5-opencl-1 rather than to make much progress in the contest). The number of CPU cores in use was growing slowly from 0 to approx. 300 by the end of contest (we did not prepare well, so some machines were put to use as late as 3 hours before contest end, and additionally some of the servers were inappropriate to use without someone watching after them), with the average estimated at around 150.

Preparations

Two days before contest start, we restored our file exchange server (actually an OpenVZ container) from a backup dump from last year, and started creating accounts for some new team members. (The scripts used to process and submit cracked passwords had to be revised slightly for the new contest, but this was not known in detail before contest start, so this step was taken during the first hours of the contest.)

With John the Ripper being our primary tool (almost the only password cracking tool we used, in fact), and with us having access to many more CPUs than GPUs, we needed a way to manage the many CPU cores efficiently. Thus, a customized contest-only edition of John the Ripper was made and some scripts were written (but only made usable for the 2nd day of the contest, unfortunately), which made it slightly easier for us to manage multiple multi-core machines. Other changes in the contest edition of John the Ripper included revised incremental mode and sse-intrinsics.S pre-compiled from .c using Intel's compiler (for optimal performance at MD5-based hashes).

We also generated new .chr files from RockYou passwords, and uploaded some wordlists and some rulesets to our file server, including KoreLogic's ruleset from the 2010 contest revised to make more extensive use of the rule preprocessor in JtR and re-ordered for decreasing rule efficiency.

We definitely could have prepared a lot better.

Approach, observations, mistakes

Based on last year's experience and on password cracking experience in general, we expected to derive all sorts of patterns from cracked passwords and apply those to crack even more passwords. This is also what other well-performing teams did in these two contests.

The password-protected .zip's were cracked with shell one-liners running "unzip -P" and reading passwords from a wordlist. Luckily, this worked. (The .zip support implemented in JtR -jumbo was limited to WinZip/AES, not supporting the older pkzip encryption.) Brad Tilley (team 16Crack) was the first to crack the "defcon" password for our team.

The .rar was cracked with JtR, running password.lst with --rules for several hours on an 8-core machine. RichRumble did this.

To derive patterns, "fast" hashes were attacked first - NT and raw MD5. In fact, due to us having more machines than people, two 8-core machines were running JtR in incremental mode (for lengths up to 11) against these hashes almost until the end of contest, even though this was not the best use of resources (by far), as far as points are concerned.

The --external=DateTime mode was used on all saltless hashes when this pattern was noticed. Then more focused attacks were run with custom scripts against salted hashes (on just the date formats actually seen).

Similarly, the "Mississippi" and "obsessiveness" patterns were noticed and tested against various hash types (wasting time when tested against the slowest hashes, as it turned out).

Not all of our machines were fully online, and not all people were available at all times. This resulted in us having to give out large yet non-critical jobs to team members who expected to be offline for a while. For example, this might be why we performed so well at DES (even though we did not crack the DES hashes found in coredumps being unsure what they were), which was otherwise not an optimal use of resources considering the low points earned per DES-based crypt hash (although the 100k bonus compensated that somewhat).

The mscash2 and bf hashes were successfully attacked almost exclusively with incremental mode. Late in the contest (too late), we also started locking it to specific letter-digit patterns that we saw in passwords cracked by that point. Unfortunately, we wasted lots of resources testing other patterns against these hashes - patterns seen in passwords for other hash types, but somehow not for these. It was weird (unrealistic) to find plenty of short passwords (4 to 6 characters long), yet not find any from RockYou's top 1000, nor username-derived. So we kept probing for other patterns, wordlist entries, etc. but found none, besides the trivial ones:

$ fgrep '$DCC2$' john.pot | cut -f2- -d: | \
	sed 's/[a-z]/l/g; s/[0-9]/d/g' | sort | uniq -c | sort -rn
    148 llllll
     64 lllllll
     61 dd-dd-dd
     55 dddddd
     20 llllld
     12 lllll
     12 lllldd
      4 llllldd
      4 llll
      3 lllllld
      2 lllddd
$ fgrep '$2a$' john.pot | cut -f2- -d: | \
	sed 's/[a-z]/l/g; s/[0-9]/d/g' | sort | uniq -c | sort -rn
    158 llllll
     54 dddddd
     51 lllllll
     44 dd-dd-dd
     17 lllldd
     14 llllld
     14 lllll
      3 llll
      2 lllddd

As seen on phpass and bsdi hashes that we cracked, we presumably could also find passwords built upon "pennteller" and "hate", but perhaps not much else. (KoreLogic has not yet released the plaintexts as of this writing, and we did not spend further resources cracking the hashes after contest end, hence the uncertainty.)

Although we did notice cracked passwords for these hashes starting with one of just a handful of letters (except for those starting with a digit, indeed), we did not use this knowledge in any way, thinking that it was an artifact of our use of incremental mode (which tries more likely characters before less likely ones). Thus, we did not manually restrict the search to just these starting letters, which was probably a mistake. We did generate new .chr files based on already cracked passwords, which would have achieved a similar effect, especially with our revised incremental mode, but we did so based on all cracked passwords (excluding only those that came from challenges), for all hash types, naively expecting patterns from other hash types to show up on the extra-slow hashes as well. And, of course, cracked passwords for all hash types combined started with all other letters as well.

At the same time, we cracked many far more complicated passwords for other hash types, and even phrases of up to six words (mostly idioms found in wordlists as-is, though). Some very short passphrases were even found with the revised incremental mode (up to 3 words, length 11). We also used trivial Perl scripts to combine words from tiny wordlists into 2-, 3-, and 4-word "phrases".

Note: this does not mean that passphrases are weak or a bad idea in general; it merely means that some of them contain well-known or predictable combinations of words, or too few too common words. It also means that some hash types should not be used for password hashing. With the resources we had, in the 48 hours of contest we would not be able to crack 3-word combinations generated by pwqgen with default settings and hashed with bcrypt (known as bf in this contest): https://www.openwall.com/passwdqc/

What we liked and didn't like

Overall, the contest was great, thanks to KoreLogic and all teams.

We liked:

The scoring system. While last year's contest demonstrated that with equal value of each cracked password, slow and salted hashes are not worth attacking very hard, if at all, this year's has demonstrated that they can nevertheless be attacked if the passwords are sufficiently valuable. (However, contrary to what outside observers might think, it has not demonstrated that those stronger hashes are almost as vulnerable as the weaker ones, despite of the numbers of passwords cracked being comparable. This is the case only due to extremely weak passwords that a properly configured system should not allow to be set, or at least should warn the user about.)
The presence of passphrases. We missed those last year.
Additional challenges in the contest, yet not terribly important to the teams' overall scoring (otherwise this would not be a password hash cracking contest anymore).
A concern, though, was that some of the challenges could require use of non-free and closed-source tools.

Some things we found slightly disappointing were:

Weird weights for some of the hashes: no distinction between saltless and salted (semi-)fast hashes, mscash2 being valued too high (whereas it is actually a lot easier to attack than bf, considering its GPU-friendliness, albeit not by our CPU-focused team).
For example, the weights could be:
- bf - 100000
- mscash2 - 50000
- phpass-md5 - 12000
- md5-crypt - 10000
- md5_gen(28) - 10000
- bsdi - 5000
- des - 700
- md5_gen(12) - 700
- md5_gen(16) - 700
- mssql - 700
- oracle11 - 700
- phps - 700
- ssha - 700
- md5_gen(22) - 12
- md5_gen(23) - 12
- mysql-sha1 - 12
- raw-sha512 - 12
- raw-sha1 - 11
- md5_gen(0) - 10
- nt - 10
considering the speed of hash computation, number of different salts in contest hashes (for each hash type), and some special properties of these hashes (such as the length limit with des). The 60x to 70x gap between saltless and salted hashes proposed here is roughly sqrt(number of salts), which is consistent with the use of logarithmic scale by hash computation speed.
Passwords still not being very realistic (even though KoreLogic might not agree). Username-based passwords not seen on slow hashes.
No non-ASCII passwords, or maybe we failed to find them (despite of having wasted a little bit of time on trying to do so). OK, at least this is almost realistic - those passwords are in fact very rare. So we can't really expect to have both a non-negligible number of non-ASCII passwords, but realistic passwords overall.

A neutral comment:

The bf and bsdi hashes could actually be even slower, to match real-world systems where these hashes are used. For example, bf is nowadays often used at $2a$08, not $2a$05, which it was in the contest (and which JtR uses for benchmarking for historical reasons). This would be 8 times slower. The default of 725 iterations for bsdi is in fact seen on some real-world systems, although reasonable settings are much higher. When phpass falls back to CRYPT_EXT_DES (the PHP name for this hash type), it uses these hashes at 65535 iterations (90 times slower than in the contest) when called with 8 for the $iteration_count_log2 parameter to the PasswordHash constructor, like its test program does and like some web apps that have integrated phpass do. Such changes could make the contest more realistic and would not make these hashes appear weaker than they actually are (in real-world uses). However, they could make it too hard to attack the hashes reasonably in just 48 hours, so this is not obviously a good change to make in the contest. If the change is made, then of course the weights would need to be adjusted accordingly (using a logarithmic scale). An alternative is to document the "cost" settings of variable-cost hashes used in the contest in some prominent place such that people do not draw erroneous conclusions about the hashes from the contest results.

Thanks for reading this far (or did you just scroll down?)

Alexander