Monday, January 22, 2007

A000002: Kolakoski Sequence
1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1, 1, 2, 1, 1, ...

As you may have guessed, this sequence consists entirely of 1s and 2s. Also A[1]=1, A[2]=2, and A[n] is the length of the nth run of consecutive identical terms. So since the 2nd run of terms is of length 2, A[3]=2 and A[4]!=2, i.e., A[4]=1. But since A[3]=2, the 3rd run has length 2, which makes A[5]=1 as well. And so on.

This is another fractal sequence! Apply run-length encoding to the sequence, assuming it starts with 1 and omitting value information, and we get back the original sequence. It carries a homework assignment, too - no one has proven whether the density of 1s is 1/2.

A007782 is a sequence that came out of attempts to prove just that. It is a count of the number of distinct consecutive subsequences of A000002 of length n. I'm not sure how it was generated, but when I looked it had only 25 terms and the tag "hard", which means more terms are desired. So I generated some more! There's probably a better way, but mine was: "generated by starting with a set containing '1', taking the first unused element X from that set, adding to that set the two strings resulting from interpreting X as a run-length encoding of a string starting with '1' or '2' and containing only '1's and '2's and any unadded substrings of those strings, and repeating 100000 times."

Of course, there's no real reason to just use '1' and '2'. Two symbols is more elegant than more, but how about '1' and '3'? Then we get
1, 3,3,3,1,1,1,3,3,3, 1,3,1, 3,3,3,1,1,1,3,3,3, 1,3,3,3,1, 3,3,3,1,1,1,3,3,3, 1,3,1, 3,3,3,1,1,1,3,3,3, 1, 3,3,3,1,1,1,3,3,3, 1, 3,3,3,1,1,1,3,3,3, 1,3,1, 3,3,3,1,1,1,3,3,3, 1,3,3,3,1, 3,3,3,1,1,1,3,3,3, 1,3,1, 3,3,3,1,1,1,3,3,3, 1, 3,3,3,1,1,1,3,3,3, 1,3,1, 3,3,3,1,1,1,3,3,3, 1, 3,3,3,1,1,1,3,3,3, 1,3,1, 3,3,3,1,1,1,3,3,3, 1,3,3,3,1, 3,3,3,1,1,1,3,3,3, 1,3,1, 3,3,3,1,1,1,3,3,3, 1, 3,3,3,1,1,1,3,3,3, 1, ...
Let's give these clumps names! A=1, B=333111333, C=131, and D=13331. Now what does the sequence look like? ABCBDBCBABABCBDBCBABCBABCBDBCBABA... Every second character is a B. Is there a pattern of non-Bs? ACDCAACDCACACDCAA... It's not clear from so few, but regardless, we can tell one thing. The density of '1's in this sequence is definitely lower than 1/2. One upper bound is the density of '1's in CBCBCBCBCB..., which is 5/12. This happens, of course, because 1 and 3 have the same parity. What about something like 1 and 4?
1, 4,4,4,4,1,1,1,1,4,4,4,4,1,1,1,1, 4,1,4,1, 4,4,4,4,1,1,1,1,4,4,4,4,1,1,1,1, 4,1,4,1, 4,4,4,4,1,4,4,4,4,1, 4,4,4,4,1,1,1,1,4,4,4,4,1,1,1,1, 4,1,4,1, 4,4,4,4,1,1,1,1,4,4,4,4,1,1,1,1, 4,1,4,1, 4,4,4,4,1,4,4,4,4,1, 4,4,4,4,1,1,1,1,4,4,4,4,1,1,1,1,4,1,1,1,1,4,4,4,4,1,1,1,1,4,4,4,4,1, 4,4,4,4,1,1,1,1,4,4,4,4,1,1,1,1, 4,1,4,1, 4,4,4,4,1,1,1,1,4,4,4,4,1,1,1,1, 4,1,4,1, 4,4,4,4,1,4,4,4,4,1, 4,4,4,4,1,1,1,1,4,4,4,4,1,1,1,1, 4,1,4,1, 4,4,4,4,1,1,1,1,4,4,4,4,1,1,1,1, 4,1,4,1, 4,4,4,4,1,4,4,4,4,1, 4,4,4,4,1,1,1,1,4,4,4,4,1,1,1,1,4,1,1,1,1,4,4,4,4,1,1,1,1,4,4,4,4,1, 4,4,4,4,1,1,1,1,4,4,4,4,1,1,1,1, 4,1,4,1, 4,4,4,4,1,1,1,1,4,4,4,4,1,1,1,1, 4,1,4,1, 4, 1,4,1,4, 1,1,1,1,4,4,4,4,1,1,1,1,4,4,4,4, ...
Relabeling with A=1, B=4444111144441111, C=4141, D=4444144441, we get ABCBCDBCBCD*BCBCDBCBCD*BCBC???...
That * is a sequence that doesn't quite match the others... and later when it is expanded we suddenly get a whole string of new patterns. The difference in parity means that some of our patterns have more 4s and some have more 1s, and it quickly becomes chaotic trying to figure out if one dominates the other. It turns out we're not going to solve that unsolved problem in the space of this post!

1 Comments:

Blogger King Aardvark said...

Wow, mathematicians are weird, weird people.

11:26 AM  

Post a Comment

<< Home