summaryrefslogtreecommitdiff
path: root/FAQ.lz77
diff options
context:
space:
mode:
Diffstat (limited to 'FAQ.lz77')
-rw-r--r--FAQ.lz77207
1 files changed, 207 insertions, 0 deletions
diff --git a/FAQ.lz77 b/FAQ.lz77
new file mode 100644
index 0000000..7bd7e0a
--- /dev/null
+++ b/FAQ.lz77
@@ -0,0 +1,207 @@
+
+
+
+Q: What is the lz77 compression? What is the terminology you used?
+A: The lz77 compression is a quite simple algorithm, which works in stream.
+ You have to "copy" the uncompressed bytes from the stream, and sometime,
+ you encounter what I've called "restart blocks" which tells two things:
+ a backward jump, and a lenght. The jump value tells you of how many bytes
+ you have to "rewind", and the length tells you how many bytes you have to
+ copy from there. For example if you have the compressed stream:
+
+ One Ring to rule them all, <J:27,L:12>find<J:28,L:5>.
+
+ then you will decompress it by:
+
+ One Ring to rule them all, One Ring to find them.
+
+ because when you hit the <J:27,L:12>, it tells you to copy 12 bytes from
+ 27 bytes backward, and when you hit the <J:28,L:5>, it tells you to
+ copy 5 bytes from 28 bytes backward.
+
+
+
+Q: What are the differences between lz77, lzss and lzw?
+A: The lzw is the algorithm name implemented by PKZIP, and thus by WinZIP.
+ But it uses basically the same scheme as the lz77 algorithm. The lzss
+ is the name for a fixed maximum backward length window algorithm.
+ I've named my stuff 'lz77' because it's the generic name for it.
+ The lzw is in fact two algorithms in one: the Huffman and the lz77.
+
+
+
+Q: How is the file stored?
+A: From a general manner, you have a 4 bytes length to know where to stop
+ when you are uncompressing. As you may have noticed, you have normal
+ bytes and compressed blocks. So you can have a bitmap which tells you
+ if the bytes are normal bytes or restart blocks. As an example, let's
+ look the Lord of the Rings compression:
+
+ |Bitmaps | Datas
+ |--------|---------------------------|--------|
+ |00000000| 4f 6e 65 20 52 69 6e 67 |One Ring|
+ |00000000| 20 74 6f 20 72 75 6c 65 | to rule|
+ |00000000| 20 74 68 65 6d 20 61 6c | them al|
+ |00010000| 6c 2c 20<1b 90>66 69 6e 64|l, ..find|
+ |10000000|<1b 20>2e |...|
+
+ If you have a 0 in the bitmap, then you know you have to simply copy the
+ byte you find, and if you have a 1, you have to read the two next bytes,
+ 'unpack' them to find the backward jump and the length, then copy the
+ corresponding bytes. As an exercise, try to find how I've 'packed' the
+ length and the backward jump into the above array.
+
+ Since you need two bytes to write a restart block, you will have to add
+ three to the unpacked length.
+
+ Here is the final form of the packed file:
+
+0000: 31 00 00 00 00 4F 6E 65 20 52 69 6E 67 00 20 74 1 One Ring t
+0010: 6F 20 72 75 6C 65 00 20 74 68 65 6D 20 61 6C 08 o rule them al.
+0020: 6C 2C 20 1B 90 66 69 6E 64 01 1B 20 2E l, ..find.. ..
+
+ Note how the bitmaps are stored.
+
+
+
+Q: How are the jump and length packed?
+A: You may find infos on the net like:
+
+ O3 O2 O1 O0 L3 L2 L1 L0 | OB OA O9 O8 O7 O6 O5 O4
+
+ This means that the two infos are packed like this: the offset (jump) has the
+ four lower bits in the four upper bits of the first byte, then the eight upper
+ bits are directly stored into the second byte. Thus, the backward jump is
+ stored on 12 bits. The length is stored in four bits, and they are stored in
+ the four lower bits of the first byte. So the length may vary from 3 to 18.
+
+
+
+Q: What is the relationship between the schemes into the software and the
+ thing you pasted above?
+A: It's quite easy. The computation formula is the following:
+
+ decomp_length = shift(val1 & scheme.l_mask_1, scheme.l_shft_1) |
+ shift(val2 & scheme.l_mask_2, scheme.l_shft_2);
+ decomp_jump = shift(val1 & scheme.j_mask_1, scheme.j_shft_1) |
+ shift(val2 & scheme.j_mask_2, scheme.j_shft_2);
+
+
+ So, if you want to write the eight parameters for the above thingy, you
+ will have to set them to this:
+
+ l_mask_1 = 0x0f | l_mask_2 = 0x00 | j_mask_1 = 0xf0 | j_mask_2 = 0xff
+ l_shft_1 = 0 | l_shft_2 = 0 | j_shft_1 = -4 | j_shft_2 = 4
+
+ What is he most difficult to understand is the jump value. We want to build
+ the jump value as: OB OA O9 O8 O7 O6 O5 O4 O3 O2 O1 O0.
+
+ Let's first work on the first byte.
+ -) We have to only keep the Ox bits
+ -) Then we have to move them to the right.
+
+ The mask is here to suppress the unwanted bits. So for our case, the mask is
+ 0xf0 since it's the four high bits. The shift is here to move the bits. A
+ negative value means to move them to the right, and a positive means to the
+ left. So since we have to make the bits go to the four lower bits, we have
+ a shift of -4
+
+ For the second byte:
+ -) We keep all the bits, but
+ -) we have to make room for the four bits from the first byte.
+
+ The mask don't have to suppress anything, so it's set to 0xff. And now, the
+ shift is set to 4 to make enough room.
+
+ Finally, the software will build the jump with those four parameters.
+
+
+Q: What are the others parameters?
+A: You have:
+
+ 1iscomp overlap negative 16bits opposite
+
+ This will change the behaviour of the compression/decompression algorithm.
+
+ The 1iscomp will inverse the bitmap from the default. The default is, a 1 in
+ the bitmap is uncompressed.
+
+ The overlap is a boolean to tell the compressor to use the overlap trick.
+ Some roms does use it, but is a little hard to explain here.
+
+ This is the same for the negative trick.
+
+ The 16bits behavior is quite specific to the FF6 PSX. Everything is coded
+ using 16 bits words instead of 8 bits words. And thus, the compression
+ algorithm is more difficult. As for now, it's broken.
+
+ The opposite behavior is to compute the jump offset in a different way.
+ Instead of taking it as the backward jump value, it will be an offset
+ from the maximum backward jump possible.
+
+
+
+Q: What are the overlap and the negative tricks?
+A: You really want to know? *sigh*
+ Ok, here we go. The negative is still the simplier to understand. When you
+ are uncompressing a lz77 stream, you read backward jumps and lengths. The
+ maximum value for a jump and a length are determined on how the two infos
+ are packed into the restart block.
+
+ For example, if you have 12 bits for the jump and 4 for the length, you will
+ have a maximum of 0xfff for the jump and 18 for the length (18 because it's
+ 0xf + 3)
+
+ Here comes the negative trick: when you are uncompressing the first bytes,
+ you will still be able to have a jump which is *beyond* the beginning of the
+ decompressed file. Later, when your file will grow up, this won't be possible
+ anymore of course, since its size will be over the maximum jump size.
+
+ So, sometime, you can consider that *if* you have a negative value, it's not
+ a fault: you just read zeros. Understood? BTW, most of the case, the negative
+ restart will be very near the begin of the file. So if you have to copy five
+ bytes distant of 1000 bytes from the beginning of the file, *this* is
+ probably a fault, or a misunderstood scheme.
+
+
+ Next, the overlap trick. It's more difficult to understand though. You have
+ to understand the decompressor's algorithm a bit. When it hits a restart
+ block, it computes the jump and the length, then it will begin to copy the
+ bytes from the old decompressed datas, one byte after the other.
+
+ So the trick is here: when you are copying the bytes, the algorithm will
+ be able to reuse them immediately for the current copy, since they are
+ already written. The restart block will use "itself".
+
+ Let's have an example:
+
+ If you want to compress the stream:
+
+ 123123123123
+
+ you will have two solutions. Either with, either without the overlap trick.
+ Without the overlap trick, here is the compressed stream:
+
+ 123123<J:6,L:6>
+
+ This is obvious hu? Now, with the overlap trick:
+
+ 123<J:3,L:9>
+
+ Try to take a piece of paper and to decompress those two thingy, maybe this
+ will help you understand.
+
+
+
+Q: When do I have to active the overlap and negative tricks?
+A: Well, for the decompressor, it's not really necessary, because it won't
+ crash if the flags are not enabled. But it will warn you so you will be
+ able to know that the tricks are enabled into the lz77 stream. So if you
+ see that those tricks are enabled, then active the flags so the compressor
+ will be able to take care of them.
+
+ Something else: you may also test if the tricks are working. Maybe they
+ wasn't used when your original lz77 stream was created, but the decompression
+ algorithm used by the game *maybe* is able to handle them.
+
+