Talk:Range coding
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||
|
I removed my article 'Patentophobia'
It does not contain scientific information. —Preceding unsigned comment added by 74.244.80.58 (talk • contribs) 31 March 2007
The difference between arithmetic and range encoding does not exist
It is simpler than anyone may think. Arithmetic encoding reduce interval [0,1) to a shorter one and allow to pick any value from this interval. Because the value is optional and it is fraction the shortest one is picked up. The range encoder reduce interval [0,n^n) to shorter one and allow to pick any number. Since the number is optional, but it is integer the one that has longest tail of zero bits is taken. (Here n^n is size of the message raised to the power of the size of message). The details can be found in article “Anatomy of Range Encoder”. Google the name and find it. The authors of arithmetic encoders like to claim that they compute the fraction, but when you look at the code you can see that it is not a fraction, it is integer or numerator of this fraction, so while saying that they write arithmetic encoder they actually write range encoder. So, the difference does not exist. Arithmetic encoder got its name because only arithmetic operations involved into encoding and decoding (multiplication, addition, division, subtraction), so range encoder qualifies. The founder of range encoder G.N.N.Martin named known to that moment arithmetic encoder as range encoder emphasizing the idea of algorithm. The story how range encoder acquired its name is very simple it is same as American Natives are called Indians - a little mess.
Next topic
I've rewritten what was here to be grammatically correct English. Later I hope to go back and edit, to put in detail on actual range encoding, and remove the claim added by Suns which seems (as far as I can tell) to be based solely on one particular implementation of range encoding, not on range encoding itself. (Why is arithmetic coding on the page arithmetic encoding, Huffman coding on the page Huffman encoding, but range encoding on the page Range encoder?)
Range encoding and arithmetic encoding
What exactly is the difference between range and arithmetic encoding? I can't seem to see any difference in principle. This sentence from the article
- Arithmatic coding can be thought of as a form of range encoding with the range starting at zero and extending to one.
seems to confirm my suspicion that they are slightly different ways of describing the same thing. CyborgTosser (Only half the battle) 12:10, 23 Jan 2005 (UTC)
- That's the tricky thing. They are very nearly the same thing. However, the differences that there are, and the fact that range encoding has a practice that much more closely approximates its theory than arithmetic coding does, may make an important difference legally, in determining what patents do and don't cover. Range encoding, in both theory and practice, involves ranges of integers. Arithmetic coding in theory works with rational numbers between 0 and 1, but since practical considerations cause the precision to be limited to typically 32 bits, it's as if they were using 32-bit ranges of integers. There may be differences that I don't know about -- I don't pretend to be an expert -- but as far as I know, the differences that there are are mostly legal in nature. -- Antaeus Feldspar 19:01, 23 Jan 2005 (UTC)
- Yeah exactly! When implemented in computers, range and arithmetic encoders produce the same sequence of bits as far as I can see.
- -- (Someone anonymous, who left no signature.)
- As far as I can tell (having looked at both arithmetic coding and range coding from time to time), they're the same thing, just interpreted in two, different but related ways. If you've got an arithmetic coder (encoder/decoder), you've got the corresponding range coder - it's the same thing. And vice-versa, too.
- With arithmetic coding, the actual arithmetic code, which is a sequence of (coding) symbols (not the sequence of coded symbols), is taken as representing a non-negative rational number in the range [0,1). The representation does not include the initial "0.", because that's always going to be there, so is left as implicit. That rational number is itself taken as representing the range (of real numbers) in which it lies, which is taken as representing the message that's encoded as the arithmetic code. The (coding) symbols in the arithmetic code are basically taken as the digits following the "0." part of the rational number, with enough digits to specify the (single message) range it falls in. As I understand it, the range specified contains (at least) all the reals which would have, as their most significant digits after the point (the "0." part), the arithmetic code itself.
- With range coding, the actual range code is taken as representing a subset of natural numbers from a set of natural numbers. The range code itself is interpreted as being a sequence of digits, those digits being the most significant digits of all the natural numbers in the subset the range code represents, but with zero or more leading zeroes added. In other words: it's a prefix common to all the naturals in the subset, padded with leading zeroes in the most significant digit places. (Without the seemingly redundant leading zeroes, the prefix "154" could be taken as representing the subset {154 000 to 154 999} as well as the subset {154 000 000 to 154 999 999} - which subset is it supposed to represent? Prefixes "00000154" and "00154" don't have this problem.) In turn, the subset itself represents a subset (of which it is a subset), which in turn represents a message. As I understand it, one of those subsets (either the subset of naturals which start with the range code as a common prefix, or the message-representing subset that includes the first subset as a subset) is the so-called range identified by the range code.
- But given a range code, we can just as easily interpret it as the corresponding arithmetic code, and vice-versa. To reinterpret a range code as an arithmetic code, all we've got to do is interpret it as the part after the implicit "0." part of (a representation of) a rational number. To reinterpret an arithmetic code as a range code, we just interpret it as a common prefix in the range coding way. Or, to put it another way, if you've got a range code, you can find the corresponding arithmetic code by doing the following:-
- Take the range code as representing the smallest natural number in the range it represents. (This is easy. You just take the digits of the range code itself, and append as many zeroes as necessary. The resulting, appended range code will now represent a 'range' of just one, single number. That's the number we want.)
- Divide that number by the size (the cardinality) of the full set of natural numbers for all possible messages (in that range coding scheme, anyway). Or, to put it another way:-
- call the number base being used for this range coding b;
- call the number of digits in the appended range code, including leading zeroes, N;
- raise b to the power of N, calling the result d; and
- divide the number represented by that appended range code by d.
- The result is a number in the range [0,1), a suitable representation of which would be the corresponding arithmetic code. That suitable representation would be the sequence of digits following the initial "0.", with as many digits as are necessary to identify the range representing the message. If the original range code uses only as many digits as necessary, and if we use only as many digits as necessary in the corresponding arithmetic code, the two codes will be identical (and so we've gone round the houses, taking the scenic route, when we could've just taken a (very) short cut).
- We can do the same the other way round, to get from arithmetic codes to corresponding range codes, too.
- The big upshot of this is that an arithmetic encoder/decoder is also the corresponding range encoder/decoder - it's up to you which you happen to choose to interpret it as being. After all, a range encoder is something that turns messages into corresponding range codes, and an arithmetic coder is something that turns messages into corresponding arithmetic codes, and corresponding range codes and arithmetic codes are identical. So, if it's a range coder, it's an arithmetic coder, and vice-versa.
- This may have significant implications for the rumours that exist regarding arithmetic coding patents and range coding, but I am not a lawyer.
- As for what range coding actually is - arithmetic coding - unless I've badly misunderstood it, it really is just another way of interpreting exactly the same computational, algorithmic, symbol-manipulating, data-processing thing.
- --Simon G Best 17:26, 17 July 2006 (UTC)
I've now edited the article to bring it into line with the arithmetic coding article (which I've also edited). This is to correct and clarify the relationship between arithmetic coding and range coding, since they're really just two, slightly different ways of understanding the same thing. --Simon G Best 15:54, 21 July 2006 (UTC)
- I don't really like the change. It confuses to the patent issue, which, as far as I can tell, is the only reason anyone cares about range coding. The reason range coding is believed to be without patent issues is because (again, as far as I can tell) it's an older form of arithmetic coding. Thus it was prior knowledge by the time any patents could have been filed that wouldn't have expired by now. The claim that arithmetic coding is a form of range coding seems wrong to me, since range coding claims to be a specific method, a fixed procedure to relate input to output. In contrast, arithmetic coding is a family of methods and, although it initially related specific input to specific output, it now has variants which trade off compression ratio (output) for performance (speed), and which consider how the probability models should adapt. The problem with arithmetic coding then is that there are lots of variants, so, while some forms of arithmetic coding are under patent, others aren't. So relabeling a specific type of arithmetic coding as "range coding" has the benefit that people know, "This is the type of arithmetic coding that can be done without worrying about patents." Of course, this is no small feat, as patent issues have prevented the adoption of arithmetic coding in applications where, technologically speaking, it would have significantly helped. Calbaer 22:10, 21 July 2006 (UTC)
- Okay, I've read through Martin's paper, and it would seem to contradict much of what you thought, and the main beliefs that are held (at least by many who refer to range coding) about range coding in relation to arithmetic coding (though the issue of patents is (as is so often the case with patents) not straight-forward).
- The key points from the paper are:-
- Contextual modelling is not really dealt with in the paper (though there are specific examples). It does appear that Martin had context modelling in general in mind.
- Any base can be used. (It's not just binary, neither is it just base 256.)
- Renormalization is digit-at-a-time, not byte-at-a-time, as so often claimed (and often the case in implementations). (Except when bytes are digits, of course.)
- There's a postscript, with references, that indicate that arithmetic coding predates range coding.
- I'll expand on these, and related, issues, as it's really quite interesting to see how generically Martin was thinking, and what questions these issues raise.
- ===Contextual modelling===
- Martin leaves the issue of contextual modelling (or whatever you want to call it (statistical modelling, probability modelling, etc)) open. He includes some examples, but they seem to be included more for the purpose of illustrating range encoding itself. It does seem clear that he had context modelling in general in mind, as he makes statements that simply couldn't be true if he only had certain kinds of context models in mind. For example, the very first paragraph is this:-
Redundancy in a message can be thought of as consisting of contextual redundancy and alphabetic redundancy. The first is illustrated by the fact that the letter Q is nearly always followed by the letter U, the second by the fact that the letter E is far more common than the letter X. Range encoding is an algorithm for removing both sorts of redundancy.
- And, a few paragraphs down, he writes this:-
Many techniques are almost optimal in a wide variety of situations, but none are universally applicable. In contrast, range encoding may be used to remove all the redundancy that we can describe in any digitised message. It can produce encodings to any base.
- (Emphasis mine.) That simply wouldn't be true, unless he intends range coding to be used with any context model.
- ===Renormalization issues===
- Byte-at-a-time renormalization is often said to be one of the differences between range encoding and arithmetic coding. But this simply isn't in Martin's paper. Instead, he describes digit-at-a-time renormalization, and says that range encoding can be used with "any base" - that includes binary. On this issue, range encoding would seem to be as general as arithmetic coding. The stuff about byte-at-a-time renormalization as a difference would therefore seem to be something of a myth when it comes to range encoding itself, though it does happen to be done in a number of range encoders in practice.
- ===Range encoding and arithmetic coding===
- As range encoding, as defined by Martin in his paper, is actually very general (any context model may be used (which would include adaptive models, etc), any number base may be used, etc), and as it's described in quite a high-level, mathematical way (rather than a low-level implementational way), it's really difficult to see how range encoding is not the same as arithmetic coding. (He even includes "delayed digits" and bit-stuffing!)
- ===Patents===
- Since Martin refers to arithmetic coding, and as it seems arithmetic coding predates range coding, the argument that range coding must be patent-free because of its age must also apply to arithmetic coding - but it doesn't. My vague understanding of this issue is that unexpired arithmetic coding related patents cover such things as actual implementations, the uses of certain kinds of context models, and the like. It's not arithmetic coding per se that's patented. If, as really does seem to be the case from Martin's paper itself, range encoding is the same as arithmetic coding, then all those arithmetic coding related patents must also apply to range encoding. But this issue could really do with input from a patent lawyer.
- ===Finishing off===
- The paper's well worth reading - it's not long, and really does show what range encoding actually is (as it is, after all, held to be the definitive paper). But this, for now, is what I've got to contribute on the matter. Sorry if this is a bit of a long item for the talk page, but I really have managed to get 'into the groove' on range encoding :-)
- --Simon G Best 01:38, 22 July 2006 (UTC)
- I think you misunderstand what I'm saying, especially about the patent issues. The term of patent in the U.S. is 20 years from the filing date of the application. (I'm also pretty sure that no country has a longer patent period than 20 years.) That means that nothing that was considered prior art in 1985 could be patented today. This includes range coding. It also includes many forms of arithmetic coding (see article). But because range coding is more specific than arithmetic coding, implementing range coding in its original form doesn't infringe patents. Implementing arithmetic coding in its original form doesn't infringe patents either, but the water's been muddied enough that people just fear using arithmetic coding; it's hard for a judge to tell what's frivolous and for an engineer to tell what's legal. I realize the same could be argued of range coding, but, as it is a more specific application, one can be more confident in the application not being patented. However, those that want to be really safe will go with Huffman codes and the like, avoiding anything invented after software patents were established. Calbaer 05:57, 22 July 2006 (UTC)
- On the patent issues, I think we more or less agree, at least with the following:-
- Range encoding itself (as defined by Martin) is not patented. (It's been too long for any patents there might have been to still be in force.)
- Arithmetic coding itself is not patented (anymore) (for the same reasons).
- A number of more recent developments with arithmetic coding are patented. That's why it's said that arithmetic coding is (still) patent-encumbered.
- Have I understood this correctly? Are we actually in agreement on those points? (This still leaves the issue of whether or not arithmetic coding related patents apply to corresponding range encoders. But we can come back to that.)
- On the matter of whether or not arithmetic coding is range encoding, it seems we're not (yet) in agreement. Perhaps I've misunderstood what arithmetic coding is, but if, as you say, range encoding is "more specific" than arithmetic coding, then there must be at least one example of arithmetic coding (even if it's just theoretical) that isn't also range encoding. Could you refer me to one? I would find it enlightening :-)
- --Simon G Best 17:23, 22 July 2006 (UTC)
- I'm just saying that I've seen range coding used in specific contexts (a certain procedure) and arithmetic coding used in general contexts (many procedures and different associated methods, e.g., adaption, binary tree coding with contexts, m-ary coding via reduction to binary, etc.). This could be because I haven't seen range coding much at all, but I assumed it was because range coding was used in a much more limited sense. And if the term "range coding" is never used for processes currently under patent while arithmetic coding certainly is), this explains the "range coding is patent free" claim. If you can find someone who claims that range coding, as practiced by in large, violates a patent — contradicting Antaeus Feldspar's earlier comment — that would prove me wrong about this. It would also be useful to find someone who knows a bit about range coding itself, e.g., the author of the web sites you've pointed to and/or the author of the prior version of the web page. Because I'm not that someone.
- On the patent issues, I think we more or less agree, at least with the following:-
- To go back to my original point: It isn't that range coding and arithmetic coding can't be considered alike. It's that saying they're the same and patent issues are confused pretty much says that range coding isn't notable, at least not more than any other method in a random paper on source coding. Or that it's notable because people think mistaken things about it. As is, the article itself is confusing because range coding is presented as a tutorial device, while I've never seen it used as such. I know you just wanted to eliminate confusion and misperception, but I think the revision causes confusion itself.
- Anyway, I think that's all I'll have to say about the matter. Hopefully someone who knows more about this can confirm or deny what you've said about the subject. Calbaer 06:02, 23 July 2006 (UTC)
- Oh, I see. I get your point now. I was missing the obvious - range encoding is notable because it's believed to be free from arithmetic coding related patents. My 'clarification' in the article is certainly woefully inadequate. I'll edit the article to make its notability clearer (as I've left it completely obscure), though you're right that it needs more knowledgeable input.
- --Simon G Best 17:29, 23 July 2006 (UTC)
- "If you can find someone who claims that range coding, as practiced by in large, violates a patent — contradicting Antaeus Feldspar's earlier comment — that would prove me wrong about this." I'm a bit distressed, because I think this comment could easily be misread to claim that I made a comment somewhere claiming that range coding violates patents, which I have not. -- Antaeus Feldspar 13:50, 13 April 2007 (UTC)
Pseudocode/Sample code
Range coders can be very simple, nevertheless some example codelets would be nice and ease understanding. -- 89.247.44.39 (talk) 11:07, 26 March 2008 (UTC)
- maybe take it directly from Martin's DCC paper, or add at least some reference mentioning the sample? -- 89.247.44.39 (talk) 11:10, 26 March 2008 (UTC)
- not wanting to contradict you much, but from as far as I've implemented range coders (only once..) I believe that they can not be very simple. Still, source rarely hurts.. Harold Aptroot (talk) 01:12, 23 July 2008 (UTC)
- I added some source code which I think is pretty simple. It's a little weird using base 10 but I did that to align with the numerical example worked through in the article, and the source code produce the same results as the example.--98.247.148.214 (talk) 06:58, 21 April 2009 (UTC)
- You should just write out the decoder section in one chunk. The paragraph explaining the differences between the encoders and decoders is confusing — Preceding unsigned comment added by 24.63.194.183 (talk) 15:51, 18 August 2012 (UTC)