For most hashes where MD5 is used, building a proper md5 format is likely not the best bet overall. A format is not trivial. It requires maintainance and will likely requires specific enhancements to get it to perform optimally on all hardware. Likely there will need to be 'generic' C code done, then it will need code to tie it into CPU specific optimizations, such as SSE, MMX, intrisic SSE, GPU, ... ... ... This will also mean that to stay up to date, the format will require ongoing work and mainainance. However, there is one format which may reduce a lot of this maintainance work to very little. Now, that format itself will need to be kept up to date, but any formats that are built upon its internal workings. That format is $dynamic$. In this 'format', there is a scripting language, where a format developer only need to describe the actual operations properly, and the format is 'done', and working. This document will go over how to 'build' a format that uses this $dynamic$ format, how to optimize it to work faster, and how to build a 'thin' quasi format which insulates the end user from the $dynamic$ format line building. **** Introduction **** To start off with, a little background on 'how' and 'where' to build the scripts that run $dynamic$, what interanal data structures are available to be used. The 'where' which a format developer can easily build into john, is to add a new $dynamic$ format 'script', into john.ini file (john.conf). This file usually is located in the current directory where john is run out of (but the --config=file can override the default behavior). Within the john.conf, a new 'section' can be added for a md5 genercic format. The new 'section' will be set by using this section naming: [List.Generic:dynamic_NUM] You replace the NUM with the sub-format number (from 1001 to 9999). Pick a number that is not used. Within this 'section', there will be multiple lines added. These lines are primarily of the form: Type=Value The actual contents of these scripts will be addressed later. That will be the 'How', and preforming this is actually outside of the intro section. The 'Data' and runtime information is this: Inside of the $dynamic$ format, there are 2 input buffers (actually ALL data is arrays of 128 of each buffer type). There is input1 and input2 buffers. The main operations on these buffers is to clear them, and to append data, to build string which will later be md5 hashed. There are also 2 output buffers. These buffers will receive the md5 hashing from the 2 input buffers. NOTE, when the format processing is complete, the results MUST be placed into output1 buffer. This is where all of the comparison functions will check against. In the format, there is a salt (if the format is salted). There may also be a second salt value. There are also 'keys' value(s). These are the passwords being tested at this given time. There are also 8 'constant' strings which can be used within a format. A format such as md5-po has a couple of constants within it. There are also numerous optimization 'flags' which do special things when loading keys or salts, and there are numourous special 'optimization' primative functions within the format, for speedup of certain operations. **** Simple format building **** We will start out with a few simple formats, and simply 'show' how to build a straight forward script. The scripts may or may not be optimal. Later we will optimize these somewhat. When building the formats here, there will be comments interspersed, listing just what is being done, and why. we will build these formats: dynamic_1030 md5($p.$p) dynamic_1031 md5($s.md5($p).$p) dynamic_1032 md5(md5($s).md5($p).$p) [List.Generic:dynamic_1030] Expression=dynamic_1030: md5($p.$p) Func=DynamicFunc__clean_input Func=DynamicFunc__append_keys Func=DynamicFunc__append_keys Func=DynamicFunc__crypt_md5 Test=$dynamic_1030$42b72f913c3201fc62660d512f5ac746:test1 Here is the exact same format, with some comments added, describing the sub-sections, and exactly what is being done. #first line is the section name. It MUST be of the format shown. [List.Generic:dynamic_1030] # #the next line, is a required line. It serves 2 purposes. It is output #in john, when the format 'starts'. Also, the dynamic_# art is used #to destinguish this exact format (so the command line of --sub=dynamic_1030 #would specify this and only this format) # Expression=dynamic_1030: md5($p.$p) # #This is the set of functions. This is the ONLY section of the format #where order IS important. The functions ARE handled one after the #other, from top to bottom, to perform the string operations, and md5 #operations which are needed to perform the hash of this format #The functions ARE a required part of the format. # #first step, clean the input. All work for this format is done using #only input 1 and output 1 buffers. Func=DynamicFunc__clean_input # #Step 2, append the keys. Note, the buffer is clean, so this is simply #the same as Input=keys (but required 2 steps, the clean and append keys). Func=DynamicFunc__append_keys # #Step 3, append keys again (the format is ($p.$p) or keys appended to keys. Func=DynamicFunc__append_keys # #Step 4, final step performs md5 of $p.$p This will properly leave the #results in output1 Func=DynamicFunc__crypt_md5 # #This is test string. These ARE required. You can provide more than #one. 5 or 6 are best, to make sure the format is valid. # Test=$dynamic_1030$42b72f913c3201fc62660d512f5ac746:test1 # There are also TestA= and TestU= lines. The TestA= lines are ONLY loaded # if --encoding=utf8 is NOT set on the command line (not running in utf8 mode), # and the TestU= lines are only loaded IF the --encoding=utf8 command is used. Ok, here is the second format. The format being done is md5($s.md5($p).$p) Here are a few comments about this format: 1. There is a Flag= value. This is because this is a Salted format. This REQUIRES the MGF_SALTED flag. 2. We only use input 1 and output 1. 3. There are a couple of calls to crypt (md5). The first simply gets md5($p) and puts it into output1, which will later be appeneded in base-16 format as we build our string. 4. After the first crypt (md5), we clear our input buffer, then put the salt in, append the base-16 of md5($p), and then append $p 5. Finally, and call to crypt is done, which leaves the results in output1, so the rest of the $dynamic$ format can properly compare it. [List.Generic:dynamic_1031] Expression=dynamic_1031: md5($s.md5($p).$p) Flag=MGF_SALTED Func=DynamicFunc__clean_input Func=DynamicFunc__append_keys Func=DynamicFunc__crypt_md5 Func=DynamicFunc__clean_input Func=DynamicFunc__append_salt Func=DynamicFunc__append_from_last_output_as_base16 Func=DynamicFunc__append_keys Func=DynamicFunc__crypt_md5 Test=$dynamic_1031$a459f60614498dbdd9a79dcc9c538749$aabbccdd:test1 Now, here is the final format: md5(md5($s).md5($p).$p) [List.Generic:dynamic_1032] Expression=dynamic_1032: md5(md5($s).md5($p).$p) Flag=MGF_SALTED Func=DynamicFunc__clean_input Func=DynamicFunc__append_salt Func=DynamicFunc__crypt_md5 Func=DynamicFunc__clean_input2 Func=DynamicFunc__append_keys2 Func=DynamicFunc__crypt2_md5 Func=DynamicFunc__clean_input Func=DynamicFunc__append_from_last_output_as_base16 Func=DynamicFunc__append_from_last_output2_to_input1_as_base16 Func=DynamicFunc__append_keys Func=DynamicFunc__crypt_md5 Test=$dynamic_1032$042d1f15ed57929a2ac8ee4f0a924679$aabbccdd:test1 Ok, now that these have been built, here are a few 'benchmarks' listing that they are WORKING, and what speed they are working: Here is MinGW build 'x86' john_x86 -test -for=$dynamic$ -sub=dynamic_1030 Benchmarking: dynamic_1030: md5($p.$p) [128x1 (MD5_Go)]... DONE Raw: 3530K c/s john_x86 -test -for=$dynamic$ -sub=dynamic_1031 Benchmarking: dynamic_1031: md5($s.md5($p).$p) [128x1 (MD5_Go)]... DONE Many salts: 1945K c/s Only one salt: 1890K c/s john_x86 -test -for=$dynamic$ -sub=dynamic_1032 Benchmarking: dynamic_1032: md5(md5($s).md5($p).$p) [128x1 (MD5_Go)]... DONE Many salts: 1016K c/s Only one salt: 1031K c/s Here is MinGW build of SSE2 john_sse2 -test -for=$dynamic$ -sub=dynamic_1030 Benchmarking: dynamic_1030: md5($p.$p) SSE2 [SSE2 32x4 (.S)]... DONE Raw: 7250K c/s john_sse2 -test -for=$dynamic$ -sub=dynamic_1031 Benchmarking: dynamic_1031: md5($s.md5($p).$p) SSE2 [SSE2 32x4 (.S)]... DONE Many salts: 5065K c/s Only one salt: 4436K c/s john_sse2 -test -for=$dynamic$ -sub=dynamic_1032 Benchmarking: dynamic_1032: md5(md5($s).md5($p).$p) SSE2 [SSE2 32x4 (.S)]... FAILED (get_hash[0](0)) Here is some timings to check against: john_x86 -test -for=$dynamic$ -sub=dynamic_0 Benchmarking: dynamic_0: md5($p) (raw-md5) [128x1 (MD5_Go)]... DONE Raw: 4005K c/s john_sse2 -test -for=$dynamic$ -sub=dynamic_0 Benchmarking: dynamic_0: md5($p) (raw-md5) SSE2 [SSE2 32x4 (.S)]... DONE Raw: 10740K c/s **** Optimizations of prior formats **** For format 1030, the speed should be very close to that of dynamic_0. In both formats, there is only 1 call to md5(). However, we are seeing that the (1030) is slower than (0). The explanation of this, is that the (0) format has an optimization used, which we can not use in the (1030). The (1030) is likely about as optimal as it can be made in the current $dynamic$ format. The optimization for format (0) is: Flag=MGF_KEYS_INPUT What that does, is to place the keys directly into the input field, and then later, when john gets the keys back (it does this if a hash is cracked), john gets them from the input. In the (1030) format, we load the keys, into the 'keys' arrays. We then have to call a function to clean input buffer 1, and to append the keys (twice). Thus, what we have is additional memory movement, and that slows things down. However, to use the MGF_KEYS_INPUT optimization, we would have had to keep the input1 buffer prestine and ONLY put in the keys (passwords). Since we had to append the keys twice, we simply 'blew' that requirement, and thus, could NOT use it. At a later time, we will show a format WHERE we can use this optimization. For format 1031, there also appears to be no optimizations available. For 1032, there are optimizations. In this format, we notice that we have this sub expression: md5($s). Well, there is an optimization, which when it loads the input file, it converts all salts into md5($s) and uses that value instead. So, at startup time, we perform md5 hashes of all salts, but at runtime, we simply place the salt into the building string, instead of performing a MD5 on the salt. So, in the 1032, we had 3 calls to crypt. By using this optimization, we can reduce that to 2 crypts. The starting format is: md5(md5($s).md5($p).$p) This optimization makes the format 'behave' at runtime, like it is md5($s.md5($p).$p), which was format 1031. Note, after we make this optimzation, the timings will be almost identical to the 1031 timings. Also note, the Test string for 1032 and 1042 are exactly the same. These are the same formats. It is just that 1042 performs fewer crypt calls per test. Also note, in the 'original' run of SSE2, the 1032 format failed. This failure, is due to the SSE2 / MMX code only working for strings up to 54 bytes (optimization reason). The length of this string: md5($s).md5($p) is 64 bytes by itself, and we also append $p to that. Thus, our string is OVER 54 bytes in length, and thus, can not be used in SSE2 mode. We do have a couple work arounds for this, to get it working properly on SSE2 builds. We can use a flag which simply stops SSE2 dead in its tracks (and preforms all work using x86 code). This is flag MGF_NOTSSE2Safe [List.Generic:dynamic_1042] Expression=dynamic_1042: md5(md5($s).md5($p).$p) Flag=MGF_SALTED Flag=MGF_SALT_AS_HEX Flag=MGF_NOTSSE2Safe Func=DynamicFunc__clean_input Func=DynamicFunc__append_keys Func=DynamicFunc__crypt_md5 Func=DynamicFunc__clean_input Func=DynamicFunc__append_salt Func=DynamicFunc__append_from_last_output_as_base16 Func=DynamicFunc__append_keys Func=DynamicFunc__crypt_md5 Test=$dynamic_1042$042d1f15ed57929a2ac8ee4f0a924679$aabbccdd:test1 Once the above changes have been done, here are the speeds: john_x86 -test=5 -for=$dynamic$ -sub=dynamic_1031 Benchmarking: dynamic_1031: md5($s.md5($p).$p) [128x1 (MD5_Go)]... DONE Many salts: 2007K c/s Only one salt: 1913K c/s john_x86 -test=5 -for=$dynamic$ -sub=dynamic_1032 Benchmarking: dynamic_1032: md5(md5($s).md5($p).$p) [128x1 (MD5_Go)]... DONE Many salts: 1052K c/s Only one salt: 1030K c/s john_x86 -test=5 -for=$dynamic$ -sub=dynamic_1042 Benchmarking: dynamic_1042: md5(md5($s).md5($p).$p) [128x1 (MD5_Go)]... DONE Many salts: 1420K c/s Only one salt: 1372K c/s john_sse2 -test=5 -for=$dynamic$ -sub=dynamic_1042 Benchmarking: dynamic_1042: md5(md5($s).md5($p).$p) SSE2 [128x1 (MD5_Go)]... DONE Many salts: 1416K c/s Only one salt: 1372K c/s We can also perform even more optimizations in the format. What we do in this format, is we md5 the salt (when we first load the file). Thus the salts which john works with, are really md5($s) (same as we did in format 1042). Then we use a different flag, which puts the md5($p) into offset 32 of input1 (where we want it). Then we simply overwrite the data in input 1 with the salt (which is md5($s) in base-16 format), then force set length to 64, then append the keys, then crypt. [List.Generic:dynamic_1052] Expression=dynamic_1052: md5(md5($s).md5($p).$p) Flag=MGF_SALTED Flag=MGF_SALT_AS_HEX Flag=MGF_KEYS_BASE16_IN1_Offset32 Flag=MGF_NOTSSE2Safe Func=DynamicFunc__overwrite_salt_to_input1_no_size_fix Func=DynamicFunc__set_input_len_64 Func=DynamicFunc__append_keys Func=DynamicFunc__crypt_md5 Test=$dynamic_1052$042d1f15ed57929a2ac8ee4f0a924679$aabbccdd:test1 Here are the benchmarks for the above format: john_x86 -test=5 -for=$dynamic$ -sub=dynamic_1052 Benchmarking: dynamic_1052: md5(md5($s).md5($p).$p) [128x1 (MD5_Go)]... DONE Many salts: 2251K c/s Only one salt: 1369K c/s john_sse2 -test=5 -for=$dynamic$ -sub=dynamic_1052 Benchmarking: dynamic_1052: md5(md5($s).md5($p).$p) SSE2 [128x1 (MD5_Go)]... DONE Many salts: 2251K c/s Only one salt: 1369K c/s Now, note the speed for 'many salts'. It is very close to the speed of (1031), actually faster. This speed is the speed john will have for a normal password cracking, where you have dozens (or hundreds, or 1000's) of password hashes to crack. To understand WHY this format is this much faster (the 'Many salts', is the normal way to benchmark the speed of a salted hash), is to understand what is happening under the hood within john's 'crypt all' loop. while (!feof(password_file)) { for (i = 0 to max_num_passwords) SetKey(i, getnextpassword(password_file)); if (salted) { while (z<salt_count) { SetSalt(salt[z]); crypt_all for (all_binaries_for_salt[z]) CheckForMatched(binary) } } } The above code is certainly not 'exact', but should show close enough, the algorithm used within john. Now, the algorithm as used within $dynamic$ will be shown (specifically for the flag MGF_KEYS_BASE16_IN1_Offset32). - SetKey() is called numerous times. This will set a 'dirty flag' for the keys inside of $dynamic$. - SetSalt() will be called. The salt handed to us is actually md5($s), since MGF_SALT_AS_HEX is set The SetSalt() calls are happening within the 'while(z<salt_count)' loop in john. - crypt_all is called. Within crypt_all, $dynamic$ knows that we want the base-16 md5($p) to be placed at offset 32 within input1. So the first call to crypt_all (for the first salt), will cause the md5($p) to be computed, and to be placed at offset 32. Then the script will overwrite the starting bytes of input1 with the 32 bytes of the salt, then the length is set to 64, then the key is appened, then a crypt, and then comparisons. - NOW, we are at the next loop within the 'while(z<salt_count)'. - Then john loads the next salt [ SetSalt() ]. - Then john calls crypt_all. At this time, there have been NO additional SetKey() calls. Thus, $dynamic$ knows that the base-16 text of md5($p) is STILL located at offset 32 of Input1. So, the format DOES NOT perform this crypt again (until new SetKey() function calls happen). - This SetSalt .. crypt_all .. compare continues until all salts are tested. However, there will be no crypt calls to md5($p) again, UNTIL the working code within john calls SetKey() again (when starting with new passwords, after all salts have been checked). Now, in the final format, we start from 1042, and do NOT turn off the sse2 code. What we do, is to turn off SSE2 when it is not valid. This will generate x86 code (generic) that runs exactly the same as in 1042 (the 2 function calls of DynamicFunc__SSEtoX86_switch_output1 and DynamicFunc__X86toSSE_switch_output1 are no-ops in x86 builds). However, in SSE mode, the first crypt will be done using SSE. Thus, as we see, the speed went from 1420k, up to almost 1800k. But note, this is NOT as fast as format 1052, for 'many' salts. [List.Generic:dynamic_1062] Expression=dynamic_1062: md5(md5($s).md5($p).$p) Flag=MGF_SALTED Flag=MGF_SALT_AS_HEX Func=DynamicFunc__clean_input Func=DynamicFunc__append_keys Func=DynamicFunc__crypt_md5 Func=DynamicFunc__SSEtoX86_switch_output1 Func=DynamicFunc__clean_input Func=DynamicFunc__append_salt Func=DynamicFunc__append_from_last_output_as_base16 Func=DynamicFunc__append_keys Func=DynamicFunc__crypt_md5 Func=DynamicFunc__X86toSSE_switch_output1 Test=$dynamic_1062$042d1f15ed57929a2ac8ee4f0a924679$aabbccdd:test1 john_sse2 -test=5 -for=$dynamic$ -sub=dynamic_1062 Benchmarking: dynamic_1062: md5(md5($s).md5($p).$p) SSE2 [SSE2 32x4 (.S)]... DONE Many salts: 1792K c/s Only one salt: 1715K c/s So all in all, 1032, 1042, 1052, 1062 were all equivalent (1032 was not, since it fails in SSE2 builds, but that was 'fixed' in 1042). They all run using differing sets of flags, differing sets of Function primatives, and have different runtime speeds. However, in the end, they all Now, the above format 1062 is slower than 1052. This is due to the final crypt still having to be done in x86 mode. However, in 1062, we crypt EVERY password for each salt. Thus you can see there is no speed gain between many salts, and 1 salt. Yes, the md5($p) IS done using SSE2 which is much faster, but in version 1052, when there are multiple salts, the slower md5($p) is done only 1 time per password. Now, the flag MGF_KEYS_BASE16_IN1_Offset32 (or other flags like it), CAN be used in SSE2 to get much faster behavior, however, it has to be in a format that IS SSE2 friendly. Here is an example: md5(md5($p).$s) In this format, we CAN build an SSE2 friendly format, that is VERY fast. For this test, we will set the salt length to a fixed size of 12. Here is a very easy to read, but also very far from optimal format for the above type: [List.Generic:dynamic_1033] Expression=dynamic_1033: md5(md5($p).$s) Flag=MGF_SALTED Func=DynamicFunc__clean_input Func=DynamicFunc__append_keys Func=DynamicFunc__crypt_md5 Func=DynamicFunc__clean_input Func=DynamicFunc__append_from_last_output_as_base16 Func=DynamicFunc__append_salt Func=DynamicFunc__crypt_md5 Test=$dynamic_1033$e9fb44106edf60419d26a10b5439d0c7$aabbccddeeff:test1 SaltLen=12 john_x86 -test -format=$dynamic$ -subf=dynamic_1033 Benchmarking: dynamic_1033: md5(md5($p).$s) [128x1 (MD5_Go)]... DONE Many salts: 1918K c/s Only one salt: 1889K c/s john_sse2 -test -format=$dynamic$ -subf=dynamic_1033 Benchmarking: dynamic_1033: md5(md5($p).$s) SSE2 [SSE2 32x4 (.S)]... DONE Many salts: 5479K c/s Only one salt: 4922K c/s Here is a MUCH more optimal version (1043). This version will use the flag MGF_KEYS_BASE16_IN1 to load the md5($p) into input 1, at the start of that string. That will ONLY be done, if there is a SetKeys() change. Then we simply set the input length to 32, append the salt, and call crypt. [List.Generic:dynamic_1043] Expression=dynamic_1043: md5(md5($p).$s) Flag=MGF_SALTED Flag=MGF_KEYS_BASE16_IN1 Func=DynamicFunc__set_input_len_32 Func=DynamicFunc__append_salt Func=DynamicFunc__crypt_md5 Test=$dynamic_1033$e9fb44106edf60419d26a10b5439d0c7$aabbccddeeff:test1 SaltLen=12 john_x86 -test -format=$dynamic$ -subf=dynamic_1043 Benchmarking: dynamic_1043: md5(md5($p).$s) [128x1 (MD5_Go)]... DONE Many salts: 4128K c/s Only one salt: 1890K c/s john_sse2 -test -format=$dynamic$ -subf=dynamic_1043 Benchmarking: dynamic_1043: md5(md5($p).$s) SSE2 [SSE2 32x4 (.S)]... DONE Many salts: 13096K c/s Only one salt: 4834K c/s So in this case, we see that the 'only 1 salt' speed is pretty much a wash. However, the 'many salts' speed, has gone from 1900k to 4100k for non-sse, and from 5500k to 13100k. NOTE, the above format is actually dynamic_6 (also dynamic_7) format.