The type of length calculation and termination can be chosen at construction time. ---- Tricks that have been tried to improve speed ----
1) Merging Qe and mPS and doubling the lookup tables
Merge the mPS into Qe, as the sign bit (if Qe>=0 the sense of MPS is 0, if Qe<0 the sense is 1), and double the lookup tables. The first half of the lookup tables correspond to Qe>=0 (i.e. the sense of MPS is 0) and the second half to Qe<0 (i.e. the sense of MPS is 1). The nLPS lookup table is modified to incorporate the changes in the sense of MPS, by making it jump from the first to the second half and vice-versa, when a change is specified by the swicthLM lookup table. See JPEG book, section 13.2, page 225.
There is NO speed improvement in doing this, actually there is a slight decrease, probably due to the fact that often Q has to be negated. Also the fact that a brach of the type "if (bit==mPS[li])" is replaced by two simpler braches of the type "if (bit==0)" and "if (q<0)" may contribute to that.
2) Removing cT
It is possible to remove the cT counter by setting a flag bit in the high bits of the C register. This bit will be automatically shifted left whenever a renormalization shift occurs, which is equivalent to decreasing cT. When the flag bit reaches the sign bit (leftmost bit), which is equivalenet to cT==0, the byteOut() procedure is called. This test can be done efficiently with "c<0" since C is a signed quantity. Care must be taken in byteOut() to reset the bit in order to not interfere with other bits in the C register. See JPEG book, page 228.
There is NO speed improvement in doing this. I don't really know why since the number of operations whenever a renormalization occurs is decreased. Maybe it is due to the number of extra operations in the byteOut(), terminate() and getNumCodedBytes() procedures.
3) Change the convention of MPS and LPS.
Making the LPS interval be above the MPS interval (MQ coder convention is the opposite) can reduce the number of operations along the MPS path. In order to generate the same bit stream as with the MQ convention the output bytes need to be modified accordingly. The basic rule for this is that C = (C'^0xFF...FF)-A, where C is the codestream for the MQ convention and C' is the codestream generated by this other convention. Note that this affects bit-stuffing as well.
This has not been tested yet.
4) Removing normalization while loop on MPS path
Since in the MPS path Q is guaranteed to be always greater than 0x4000 (decimal 0.375) it is never necessary to do more than 1 renormalization shift. Therefore the test of the while loop, and the loop itself, can be removed.
5) Simplifying test on A register
Since A is always less than or equal to 0xFFFF, the test "(a & 0x8000)==0" can be replaced by the simplete test "a < 0x8000". This test is simpler in Java since it involves only 1 operation (although the original test can be converted to only one operation by smart Just-In-Time compilers)
This change has been integrated in the decoding procedures.
6) Speedup mode
Implemented a method that uses the speedup mode of the MQ-coder if possible. This should greately improve performance when coding long runs of MPS symbols that have high probability. However, to take advantage of this, the entropy coder implementation has to explicetely use it. The generated bit stream is the same as if no speedup mode would have been used.
Implemented but performance not tested yet.
7) Multiple-symbol coding
Since the time spent in a method call is non-negligable, coding several symbols with one method call reduces the overhead per coded symbol. The decodeSymbols() method implements this. However, to take advantage of it, the implementation of the entropy coder has to explicitely use it.
Implemented but performance not tested yet.