Linear feedback shift register?
Actually, algorithms based on LFSR are very common. CRC is actually directly based on LFSR. Of course, in computer science classes people talk about polynomials when they're talking about how the input value is supposed to be XORed with the accumulated value, in electornics engineering we talk about taps instead. They are the same just different terminology.
CRC32 is a very common one. It's used to detect errors in Ethernet frames. That means that when I posted this answer my PC used an LFSR based algorithm to generate a hash of the IP packet so that my router can verify that what it's transmitting isn't corrupted.
Zip and Gzip files are another example. Both use CRC for error detection. Zip uses CRC32 and Gzip uses both CRC16 and CRC32.
CRCs are basically hash functions. And it's good enough to make the internet work. Which means LFSRs are fairly good hash functions. I'm not sure if you know this but in general good hash functions are considered good random number generators. But the thing with LFSR is that selecting the correct taps (polynomials) is very important to the quality of the hash/random number.
Your code is generally toy code since it operates on a string of ones and zeros. In the real world LFSR work on bits in a byte. Each byte you push through the LFSR changes the accumulated value of the register. That value is effectively a checksum of all the bytes you've push through the register. Two common ways of using that value as a random number is to either use a counter and push a sequence of numbers through the register, thereby transforming the linear sequence 1,2,3,4 to some hashed sequence like 15306,22,5587,994, or to feed back the current value into the register to generate a new number in seemingly random sequence.
It should be noted that doing this naively using bit-fiddling LFSR is quite slow since you have to process bits at a time. So people have come up with ways using pre-calculated tables to do it eight bits at a time or even 32 bits at a time. This is why you almost never see LFSR code in the wild. In most production code it masquerades as something else.
But sometimes a plain bit-twiddling LFSR can come in handy. I once wrote a Modbus driver for a PIC micro and that protocol used CRC16. A pre-calculated table requires 256 bytes of memory and my CPU only had 68 bytes (I'm not kidding). So I had to use an LFSR.
Since I was looking for a LFSR-implementation in Python, I stumbled upon this topic. I found however that the following was a bit more accurate according to my needs:
def lfsr(seed, mask):
result = seed
nbits = mask.bit_length()-1
while True:
result = (result << 1)
xor = result >> nbits
if xor != 0:
result ^= mask
yield xor, result
The above LFSR-generator is based on GF(2k) modulus calculus (GF = Galois Field). Having just completed an Algebra course, I'm going to explain this the mathematical way.
Let's start by taking, for example, GF(24), which equals to {a4x4 + a3x3 + a2x2 + a1x1 + a0x0 | a0, a1, ..., a4 ∈ Z2} (to clarify, Zn = {0,1,...,n-1} and therefore Z2 = {0,1}, i.e. one bit). This means that this is the set of all polynomials of the fourth degree with all factors either being present or not, but having no multiples of these factors (e.g. there's no 2xk). x3, x4 + x3, 1 and x4 + x3 + x2 + x + 1 are all examples of members of this group.
We take this set modulus a polynomial of the fourth degree (i.e., P(x) ∈ GF(24)), e.g. P(x) = x4+x1+x0. This modulus operation on a group is also denoted as GF(24) / P(x). For your reference, P(x) describes the 'taps' within the LFSR.
We also take a random polynomial of degree 3 or lower (so that it's not affected by our modulus, otherwise we could just as well perform the modulus operation directly on it), e.g. A0(x) = x0. Now every subsequent Ai(x) is calculated by multiplying it with x: Ai(x) = Ai-1(x) * x mod P(x).
Since we are in a limited field, the modulus operation may have an effect, but only when the resulting Ai(x) has at least a factor x4 (our highest factor in P(x)). Note that, since we are working with numbers in Z2, performing the modulus operation itself is nothing more than determining whether every ai becomes a 0 or 1 by adding the two values from P(x) and Ai(x) together (i.e., 0+0=0, 0+1=1, 1+1=0, or 'xoring' these two).
Every polynomial can be written as a set of bits, for example x4+x1+x0 ~ 10011. The A0(x) can be seen as the seed. The 'times x' operation can be seen as a shift left operation. The modulus operation can be seen as a bit masking operation, with the mask being our P(x).
The algorithm depicted above therefore generates (an infinite stream of) valid four bit LFSR patterns. For example, for our defined A0(x) (x0) and P(x) (x4+x1+x0), we can define the following first yielded results in GF(24) (note that A0 is not yielded until at the end of the first round -- mathematicians generally start counting at '1'):
i Ai(x) 'x⁴' bit pattern
0 0x³ + 0x² + 0x¹ + 1x⁰ 0 0001 (not yielded)
1 0x³ + 0x² + 1x¹ + 0x⁰ 0 0010
2 0x³ + 1x² + 0x¹ + 0x⁰ 0 0100
3 1x³ + 0x² + 0x¹ + 0x⁰ 0 1000
4 0x³ + 0x² + 1x¹ + 1x⁰ 1 0011 (first time we 'overflow')
5 0x³ + 1x² + 1x¹ + 0x⁰ 0 0110
6 1x³ + 1x² + 0x¹ + 0x⁰ 0 1100
7 1x³ + 0x² + 1x¹ + 1x⁰ 1 1011
8 0x³ + 1x² + 0x¹ + 1x⁰ 1 0101
9 1x³ + 0x² + 1x¹ + 0x⁰ 0 1010
10 0x³ + 1x² + 1x¹ + 1x⁰ 1 0111
11 1x³ + 1x² + 1x¹ + 0x⁰ 0 1110
12 1x³ + 1x² + 1x¹ + 1x⁰ 1 1111
13 1x³ + 1x² + 0x¹ + 1x⁰ 1 1101
14 1x³ + 0x² + 0x¹ + 1x⁰ 1 1001
15 0x³ + 0x² + 0x¹ + 1x⁰ 1 0001 (same as i=0)
Note that your mask must contain a '1' at the fourth position to make sure that your LFSR generates four-bit results. Also note that a '1' must be present at the zeroth position to make sure that your bitstream would not end up with a 0000 bit pattern, or that the final bit would become unused (if all bits are shifted to the left, you would also end up with a zero at the 0th position after one shift).
Not all P(x)'s necessarily are generators for GF(2k) (i.e., not all masks of k bits generate all 2k-1-1 numbers). For example, x4 + x3 + x2 + x1 + x0 generates 3 groups of 5 distinct polynomals each, or "3 cycles of period 5": 0001,0010,0100,1000,1111; 0011,0110,1100,0111,1110; and 0101,1010,1011,1001,1101. Note that 0000 can never be generated, and can't generate any other number.
Usually, the output of an LFSR is the bit that is 'shifted' out, which is a '1' if the modulus operation is performed, and a '0' when it isn't. LFSR's with a period of 2k-1-1, also called pseudo-noise or PN-LFSR's, adhere to Golomb's randomness postulates, which says as much as that this output bit is random 'enough'.
Sequences of these bits therefore have their use in cryptography, for instance in the A5/1 and A5/2 mobile encryption standards, or the E0 Bluetooth standard. However, they are not as secure as one would like: the Berlekamp-Massey algorithm can be used to reverse-engineer the characteristic polynomal (the P(x)) of the LFSR. Strong encryption standards therefore use Non-linear FSR's or similar non-linear functions. A related topic to this are the S-Boxes used in AES.
Note that I have used the int.bit_length()
operation. This was not implemented until Python 2.7.
If you'd only like a finite bit pattern, you could check whether the seed equals the result and then break your loop.
You can use my LFSR-method in a for-loop (e.g. for xor, pattern in lfsr(0b001,0b10011)
) or you can repeatedly call the .next()
operation on the result of the method, returning a new (xor, result)
-pair everytime.