summaryrefslogtreecommitdiff
path: root/FAQ.lz77
blob: 7bd7e0a24a9b158dc4821fd377019af179a5ffab (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207



Q: What is the lz77 compression? What is the terminology you used?
A: The lz77 compression is a quite simple algorithm, which works in stream.
   You have to "copy" the uncompressed bytes from the stream, and sometime,
   you encounter what I've called "restart blocks" which tells two things:
   a backward jump, and a lenght. The jump value tells you of how many bytes
   you have to "rewind", and the length tells you how many bytes you have to
   copy from there. For example if you have the compressed stream:

     One Ring to rule them all, <J:27,L:12>find<J:28,L:5>.

   then you will decompress it by:

     One Ring to rule them all, One Ring to find them.

   because when you hit the <J:27,L:12>, it tells you to copy 12 bytes from
   27 bytes backward, and when you hit the <J:28,L:5>, it tells you to
   copy 5 bytes from 28 bytes backward.
   


Q: What are the differences between lz77, lzss and lzw?
A: The lzw is the algorithm name implemented by PKZIP, and thus by WinZIP.
   But it uses basically the same scheme as the lz77 algorithm. The lzss
   is the name for a fixed maximum backward length window algorithm.
   I've named my stuff 'lz77' because it's the generic name for it.
   The lzw is in fact two algorithms in one: the Huffman and the lz77.



Q: How is the file stored?
A: From a general manner, you have a 4 bytes length to know where to stop
   when you are uncompressing. As you may have noticed, you have normal
   bytes and compressed blocks. So you can have a bitmap which tells you
   if the bytes are normal bytes or restart blocks. As an example, let's
   look the Lord of the Rings compression:

    |Bitmaps | Datas
    |--------|---------------------------|--------|
    |00000000| 4f 6e 65 20 52 69 6e 67   |One Ring|
    |00000000| 20 74 6f 20 72 75 6c 65   | to rule|
    |00000000| 20 74 68 65 6d 20 61 6c   | them al|
    |00010000| 6c 2c 20<1b 90>66 69 6e 64|l, ..find|
    |10000000|<1b 20>2e                  |...|

   If you have a 0 in the bitmap, then you know you have to simply copy the
   byte you find, and if you have a 1, you have to read the two next bytes,
   'unpack' them to find the backward jump and the length, then copy the
   corresponding bytes. As an exercise, try to find how I've 'packed' the
   length and the backward jump into the above array.

   Since you need two bytes to write a restart block, you will have to add
   three to the unpacked length.

   Here is the final form of the packed file:

0000: 31 00 00 00  00 4F 6E 65  20 52 69 6E  67 00 20 74    1    One Ring  t
0010: 6F 20 72 75  6C 65 00 20  74 68 65 6D  20 61 6C 08    o rule  them al.
0020: 6C 2C 20 1B  90 66 69 6E  64 01 1B 20  2E             l, ..find.. ..

   Note how the bitmaps are stored.



Q: How are the jump and length packed?
A: You may find infos on the net like:

     O3 O2 O1 O0 L3 L2 L1 L0 | OB OA O9 O8 O7 O6 O5 O4

   This means that the two infos are packed like this: the offset (jump) has the
   four lower bits in the four upper bits of the first byte, then the eight upper
   bits are directly stored into the second byte. Thus, the backward jump is
   stored on 12 bits. The length is stored in four bits, and they are stored in
   the four lower bits of the first byte. So the length may vary from 3 to 18.



Q: What is the relationship between the schemes into the software and the
   thing you pasted above?
A: It's quite easy. The computation formula is the following:

    decomp_length = shift(val1 & scheme.l_mask_1, scheme.l_shft_1) |
		    shift(val2 & scheme.l_mask_2, scheme.l_shft_2);
    decomp_jump   = shift(val1 & scheme.j_mask_1, scheme.j_shft_1) |
		    shift(val2 & scheme.j_mask_2, scheme.j_shft_2);


   So, if you want to write the eight parameters for the above thingy, you
   will have to set them to this:

   l_mask_1 = 0x0f  |  l_mask_2 = 0x00  |  j_mask_1 = 0xf0  |  j_mask_2 = 0xff
   l_shft_1 = 0     |  l_shft_2 = 0     |  j_shft_1 = -4    |  j_shft_2 = 4

   What is he most difficult to understand is the jump value. We want to build
   the jump value as: OB OA O9 O8 O7 O6 O5 O4 O3 O2 O1 O0.

   Let's first work on the first byte.
     -) We have to only keep the Ox bits
     -) Then we have to move them to the right.

   The mask is here to suppress the unwanted bits. So for our case, the mask is
   0xf0 since it's the four high bits. The shift is here to move the bits. A
   negative value means to move them to the right, and a positive means to the
   left. So since we have to make the bits go to the four lower bits, we have
   a shift of -4

   For the second byte:
     -) We keep all the bits, but
     -) we have to make room for the four bits from the first byte.

   The mask don't have to suppress anything, so it's set to 0xff. And now, the
   shift is set to 4 to make enough room.

   Finally, the software will build the jump with those four parameters.


Q: What are the others parameters?
A: You have:

     1iscomp   overlap   negative   16bits   opposite

   This will change the behaviour of the compression/decompression algorithm.

   The 1iscomp will inverse the bitmap from the default. The default is, a 1 in
   the bitmap is uncompressed.

   The overlap is a boolean to tell the compressor to use the overlap trick.
   Some roms does use it, but is a little hard to explain here.

   This is the same for the negative trick.

   The 16bits behavior is quite specific to the FF6 PSX. Everything is coded
   using 16 bits words instead of 8 bits words. And thus, the compression
   algorithm is more difficult. As for now, it's broken.

   The opposite behavior is to compute the jump offset in a different way.
   Instead of taking it as the backward jump value, it will be an offset
   from the maximum backward jump possible.



Q: What are the overlap and the negative tricks?
A: You really want to know? *sigh*
   Ok, here we go. The negative is still the simplier to understand. When you
   are uncompressing a lz77 stream, you read backward jumps and lengths. The
   maximum value for a jump and a length are determined on how the two infos
   are packed into the restart block.
   
   For example, if you have 12 bits for the jump and 4 for the length, you will
   have a maximum of 0xfff for the jump and 18 for the length (18 because it's
   0xf + 3)
   
   Here comes the negative trick: when you are uncompressing the first bytes,
   you will still be able to have a jump which is *beyond* the beginning of the
   decompressed file. Later, when your file will grow up, this won't be possible
   anymore of course, since its size will be over the maximum jump size.
   
   So, sometime, you can consider that *if* you have a negative value, it's not
   a fault: you just read zeros. Understood? BTW, most of the case, the negative
   restart will be very near the begin of the file. So if you have to copy five
   bytes distant of 1000 bytes from the beginning of the file, *this* is
   probably a fault, or a misunderstood scheme.
   
   
   Next, the overlap trick. It's more difficult to understand though. You have
   to understand the decompressor's algorithm a bit. When it hits a restart
   block, it computes the jump and the length, then it will begin to copy the
   bytes from the old decompressed datas, one byte after the other.
   
   So the trick is here: when you are copying the bytes, the algorithm will
   be able to reuse them immediately for the current copy, since they are
   already written. The restart block will use "itself".
   
   Let's have an example:
   
   If you want to compress the stream:
   
      123123123123
   
   you will have two solutions. Either with, either without the overlap trick.
   Without the overlap trick, here is the compressed stream:
   
      123123<J:6,L:6>
   
   This is obvious hu? Now, with the overlap trick:
   
      123<J:3,L:9>
   
   Try to take a piece of paper and to decompress those two thingy, maybe this
   will help you understand.



Q: When do I have to active the overlap and negative tricks?
A: Well, for the decompressor, it's not really necessary, because it won't
   crash if the flags are not enabled. But it will warn you so you will be
   able to know that the tricks are enabled into the lz77 stream. So if you
   see that those tricks are enabled, then active the flags so the compressor
   will be able to take care of them.
   
   Something else: you may also test if the tricks are working. Maybe they
   wasn't used when your original lz77 stream was created, but the decompression
   algorithm used by the game *maybe* is able to handle them.