Q: What is the lz77 compression? What is the terminology you used? A: The lz77 compression is a quite simple algorithm, which works in stream. You have to "copy" the uncompressed bytes from the stream, and sometime, you encounter what I've called "restart blocks" which tells two things: a backward jump, and a lenght. The jump value tells you of how many bytes you have to "rewind", and the length tells you how many bytes you have to copy from there. For example if you have the compressed stream: One Ring to rule them all, find. then you will decompress it by: One Ring to rule them all, One Ring to find them. because when you hit the , it tells you to copy 12 bytes from 27 bytes backward, and when you hit the , it tells you to copy 5 bytes from 28 bytes backward. Q: What are the differences between LZ77, LZ78, LZSS and LZW? A: The LZW is a modification of the LZ77/8 algorithms by Terry Welch. There is a sliding dictionnary instead of a sliding window. The LZ77 and LZ78 took their name from J. Ziv and A. Lempel, made in 1977 and in 1978. The LZSS is the same, with a bitmap telling the backwards pointers. So the exact name for the algorithm I wrote is 'lzss'. You can have some good informations here: http://www.image.ee.cityu.edu.hk/~loben/thesis/index.html Q: How is the file stored? A: From a general manner, you have a 4 bytes length to know where to stop when you are uncompressing. As you may have noticed, you have normal bytes and compressed blocks. So you can have a bitmap which tells you if the bytes are normal bytes or restart blocks. As an example, let's look the Lord of the Rings compression: |Bitmaps | Datas |--------|---------------------------|--------| |00000000| 4f 6e 65 20 52 69 6e 67 |One Ring| |00000000| 20 74 6f 20 72 75 6c 65 | to rule| |00000000| 20 74 68 65 6d 20 61 6c | them al| |00010000| 6c 2c 20<1b 90>66 69 6e 64|l, ..find| |10000000|<1b 20>2e |...| If you have a 0 in the bitmap, then you know you have to simply copy the byte you find, and if you have a 1, you have to read the two next bytes, 'unpack' them to find the backward jump and the length, then copy the corresponding bytes. As an exercise, try to find how I've 'packed' the length and the backward jump into the above array. Since you need two bytes to write a restart block, you will have to add three to the unpacked length. Here is the final form of the packed file: 0000: 31 00 00 00 00 4F 6E 65 20 52 69 6E 67 00 20 74 1 One Ring t 0010: 6F 20 72 75 6C 65 00 20 74 68 65 6D 20 61 6C 08 o rule them al. 0020: 6C 2C 20 1B 90 66 69 6E 64 01 1B 20 2E l, ..find.. .. Note how the bitmaps are stored. Q: How are the jump and length packed? A: You may find infos on the net like: O3 O2 O1 O0 L3 L2 L1 L0 | OB OA O9 O8 O7 O6 O5 O4 This means that the two infos are packed like this: the offset (jump) has the four lower bits in the four upper bits of the first byte, then the eight upper bits are directly stored into the second byte. Thus, the backward jump is stored on 12 bits. The length is stored in four bits, and they are stored in the four lower bits of the first byte. So the length may vary from 3 to 18. Q: What is the relationship between the schemes into the software and the thing you pasted above? A: It's quite easy. The computation formula is the following: decomp_length = shift(val1 & scheme.l_mask_1, scheme.l_shft_1) | shift(val2 & scheme.l_mask_2, scheme.l_shft_2); decomp_jump = shift(val1 & scheme.j_mask_1, scheme.j_shft_1) | shift(val2 & scheme.j_mask_2, scheme.j_shft_2); So, if you want to write the eight parameters for the above thingy, you will have to set them to this: l_mask_1 = 0x0f | l_mask_2 = 0x00 | j_mask_1 = 0xf0 | j_mask_2 = 0xff l_shft_1 = 0 | l_shft_2 = 0 | j_shft_1 = -4 | j_shft_2 = 4 What is he most difficult to understand is the jump value. We want to build the jump value as: OB OA O9 O8 O7 O6 O5 O4 O3 O2 O1 O0. Let's first work on the first byte. -) We have to only keep the Ox bits -) Then we have to move them to the right. The mask is here to suppress the unwanted bits. So for our case, the mask is 0xf0 since it's the four high bits. The shift is here to move the bits. A negative value means to move them to the right, and a positive means to the left. So since we have to make the bits go to the four lower bits, we have a shift of -4 For the second byte: -) We keep all the bits, but -) we have to make room for the four bits from the first byte. The mask don't have to suppress anything, so it's set to 0xff. And now, the shift is set to 4 to make enough room. Finally, the software will build the jump with those four parameters. Q: What are the others parameters? A: You have: 1iscomp overlap negative 16bits ptrb filling inverse onejump window This will change the behaviour of the compression/decompression algorithm. The 1iscomp will inverse the bitmap from the default. The default is, a 1 in the bitmap is uncompressed. The overlap is a boolean to tell the compressor to use the overlap trick. Some roms does use it, but is a little hard to explain here. This is the same for the negative trick. The 16bits behavior is quite specific to the FF6 PSX. Everything is coded using 16 bits words instead of 8 bits words. And thus, the compression algorithm is more difficult. As for now, it's broken. The ptrb means "pointer behavior". It can take the value 0, 1 or 2. The behavior '0' is the common one: the jump is the number of bytes to count back to find the start of the needle to copy. The '1' is to tell the jump is an offset from the maximum backward jump possible (ie from the start point of the window). The '2' is a very dumb behavior: it tells that the jump is the absolute offset. The window *won't* slide. The filling option activates the RLE behavior. The inverse is a boolean to tell to reverse the order of the bits in the bitmaps The onejump tells to add one to the final jump value. Some streams are shifted by one because by default, a jump of 0 means nothing. So it is dumb to miss one possible backward jump. The window offset tells the absolute window start for the 2th pointer behavior. Yeah, this is necessary. Coders of Metal Max are pretty crazy. Q: What are the overlap and the negative tricks? A: You really want to know? *sigh* Ok, here we go. The negative is still the simplier to understand. When you are uncompressing a lzss stream, you read backward jumps and lengths. The maximum value for a jump and a length are determined on how the two infos are packed into the restart block. For example, if you have 12 bits for the jump and 4 for the length, you will have a maximum of 0xfff for the jump and 18 for the length (18 because it's 0xf + 3) Here comes the negative trick: when you are uncompressing the first bytes, you will still be able to have a jump which is *beyond* the beginning of the decompressed file. Later, when your file will grow up, this won't be possible anymore of course, since its size will be over the maximum jump size. So, sometime, you can consider that *if* you have a negative value, it's not a fault: you just read zeros. Understood? BTW, most of the case, the negative restart will be very near the begin of the file. So if you have to copy five bytes distant of 1000 bytes from the beginning of the file, *this* is probably a fault, or a misunderstood scheme. Next, the overlap trick. It's more difficult to understand though. You have to understand the decompressor's algorithm a bit. When it hits a restart block, it computes the jump and the length, then it will begin to copy the bytes from the old decompressed datas, one byte after the other. So the trick is here: when you are copying the bytes, the algorithm will be able to reuse them immediately for the current copy, since they are already written. The restart block will use "itself". Let's have an example: If you want to compress the stream: 123123123123 you will have two solutions. Either with, either without the overlap trick. Without the overlap trick, here is the compressed stream: 123123 This is obvious hu? Now, with the overlap trick: 123 Try to take a piece of paper and to decompress those two thingy, maybe this will help you understand. Q: When do I have to active the overlap and negative tricks? A: Well, for the decompressor, it's not really necessary, because it won't crash if the flags are not enabled. But it will warn you so you will be able to know that the tricks are enabled into the lzss stream. So if you see that those tricks are enabled, then active the flags so the compressor will be able to take care of them. Something else: you may also test if the tricks are working. Maybe they wasn't used when your original lzss stream was created, but the decompression algorithm used by the game *maybe* is able to handle them. Q: What is the filling thing? A: It's an RLE like behavior. In Valkyrie Profile, when the length hit the maximum value, then it means it's a RLE block, and not a restart block. That's why you've got the eight options: fmask1 fshft1 fmask2 fshft2 vmask1 vshft1 vmask2 vshft2 Basically, they are the same as the {j,l}{mask,shft}{1,2} options, but this time you tell where to find the 'value' which has to be copied, and the 'filling', ie the number of times this value will be copied. Q: What is the 2th filling behaviour? A: Get out of here. It's a ugly ugly ugly hack. Forget it. Now!