Q: What is the lz77 compression? What is the terminology you used? A: The lz77 compression is a quite simple algorithm, which works in stream. You have to "copy" the uncompressed bytes from the stream, and sometime, you encounter what I've called "restart blocks" which tells two things: a backward jump, and a lenght. The jump value tells you of how many bytes you have to "rewind", and the length tells you how many bytes you have to copy from there. For example if you have the compressed stream: One Ring to rule them all, find. then you will decompress it by: One Ring to rule them all, One Ring to find them. because when you hit the , it tells you to copy 12 bytes from 27 bytes backward, and when you hit the , it tells you to copy 5 bytes from 28 bytes backward. Q: What are the differences between lz77, lzss and lzw? A: The lzw is the algorithm name implemented by PKZIP, and thus by WinZIP. But it uses basically the same scheme as the lz77 algorithm. The lzss is the name for a fixed maximum backward length window algorithm. I've named my stuff 'lz77' because it's the generic name for it. The lzw is in fact two algorithms in one: the Huffman and the lz77. Q: How is the file stored? A: From a general manner, you have a 4 bytes length to know where to stop when you are uncompressing. As you may have noticed, you have normal bytes and compressed blocks. So you can have a bitmap which tells you if the bytes are normal bytes or restart blocks. As an example, let's look the Lord of the Rings compression: |Bitmaps | Datas |--------|---------------------------|--------| |00000000| 4f 6e 65 20 52 69 6e 67 |One Ring| |00000000| 20 74 6f 20 72 75 6c 65 | to rule| |00000000| 20 74 68 65 6d 20 61 6c | them al| |00010000| 6c 2c 20<1b 90>66 69 6e 64|l, ..find| |10000000|<1b 20>2e |...| If you have a 0 in the bitmap, then you know you have to simply copy the byte you find, and if you have a 1, you have to read the two next bytes, 'unpack' them to find the backward jump and the length, then copy the corresponding bytes. As an exercise, try to find how I've 'packed' the length and the backward jump into the above array. Since you need two bytes to write a restart block, you will have to add three to the unpacked length. Here is the final form of the packed file: 0000: 31 00 00 00 00 4F 6E 65 20 52 69 6E 67 00 20 74 1 One Ring t 0010: 6F 20 72 75 6C 65 00 20 74 68 65 6D 20 61 6C 08 o rule them al. 0020: 6C 2C 20 1B 90 66 69 6E 64 01 1B 20 2E l, ..find.. .. Note how the bitmaps are stored. Q: How are the jump and length packed? A: You may find infos on the net like: O3 O2 O1 O0 L3 L2 L1 L0 | OB OA O9 O8 O7 O6 O5 O4 This means that the two infos are packed like this: the offset (jump) has the four lower bits in the four upper bits of the first byte, then the eight upper bits are directly stored into the second byte. Thus, the backward jump is stored on 12 bits. The length is stored in four bits, and they are stored in the four lower bits of the first byte. So the length may vary from 3 to 18. Q: What is the relationship between the schemes into the software and the thing you pasted above? A: It's quite easy. The computation formula is the following: decomp_length = shift(val1 & scheme.l_mask_1, scheme.l_shft_1) | shift(val2 & scheme.l_mask_2, scheme.l_shft_2); decomp_jump = shift(val1 & scheme.j_mask_1, scheme.j_shft_1) | shift(val2 & scheme.j_mask_2, scheme.j_shft_2); So, if you want to write the eight parameters for the above thingy, you will have to set them to this: l_mask_1 = 0x0f | l_mask_2 = 0x00 | j_mask_1 = 0xf0 | j_mask_2 = 0xff l_shft_1 = 0 | l_shft_2 = 0 | j_shft_1 = -4 | j_shft_2 = 4 What is he most difficult to understand is the jump value. We want to build the jump value as: OB OA O9 O8 O7 O6 O5 O4 O3 O2 O1 O0. Let's first work on the first byte. -) We have to only keep the Ox bits -) Then we have to move them to the right. The mask is here to suppress the unwanted bits. So for our case, the mask is 0xf0 since it's the four high bits. The shift is here to move the bits. A negative value means to move them to the right, and a positive means to the left. So since we have to make the bits go to the four lower bits, we have a shift of -4 For the second byte: -) We keep all the bits, but -) we have to make room for the four bits from the first byte. The mask don't have to suppress anything, so it's set to 0xff. And now, the shift is set to 4 to make enough room. Finally, the software will build the jump with those four parameters. Q: What are the others parameters? A: You have: 1iscomp overlap negative 16bits opposite This will change the behaviour of the compression/decompression algorithm. The 1iscomp will inverse the bitmap from the default. The default is, a 1 in the bitmap is uncompressed. The overlap is a boolean to tell the compressor to use the overlap trick. Some roms does use it, but is a little hard to explain here. This is the same for the negative trick. The 16bits behavior is quite specific to the FF6 PSX. Everything is coded using 16 bits words instead of 8 bits words. And thus, the compression algorithm is more difficult. As for now, it's broken. The opposite behavior is to compute the jump offset in a different way. Instead of taking it as the backward jump value, it will be an offset from the maximum backward jump possible. Q: What are the overlap and the negative tricks? A: You really want to know? *sigh* Ok, here we go. The negative is still the simplier to understand. When you are uncompressing a lz77 stream, you read backward jumps and lengths. The maximum value for a jump and a length are determined on how the two infos are packed into the restart block. For example, if you have 12 bits for the jump and 4 for the length, you will have a maximum of 0xfff for the jump and 18 for the length (18 because it's 0xf + 3) Here comes the negative trick: when you are uncompressing the first bytes, you will still be able to have a jump which is *beyond* the beginning of the decompressed file. Later, when your file will grow up, this won't be possible anymore of course, since its size will be over the maximum jump size. So, sometime, you can consider that *if* you have a negative value, it's not a fault: you just read zeros. Understood? BTW, most of the case, the negative restart will be very near the begin of the file. So if you have to copy five bytes distant of 1000 bytes from the beginning of the file, *this* is probably a fault, or a misunderstood scheme. Next, the overlap trick. It's more difficult to understand though. You have to understand the decompressor's algorithm a bit. When it hits a restart block, it computes the jump and the length, then it will begin to copy the bytes from the old decompressed datas, one byte after the other. So the trick is here: when you are copying the bytes, the algorithm will be able to reuse them immediately for the current copy, since they are already written. The restart block will use "itself". Let's have an example: If you want to compress the stream: 123123123123 you will have two solutions. Either with, either without the overlap trick. Without the overlap trick, here is the compressed stream: 123123 This is obvious hu? Now, with the overlap trick: 123 Try to take a piece of paper and to decompress those two thingy, maybe this will help you understand. Q: When do I have to active the overlap and negative tricks? A: Well, for the decompressor, it's not really necessary, because it won't crash if the flags are not enabled. But it will warn you so you will be able to know that the tricks are enabled into the lz77 stream. So if you see that those tricks are enabled, then active the flags so the compressor will be able to take care of them. Something else: you may also test if the tricks are working. Maybe they wasn't used when your original lz77 stream was created, but the decompression algorithm used by the game *maybe* is able to handle them.