Sometimes we all get bored. Sometimes we play games, but sometimes we don't have computers handily available. Those times, we have to get creative.
Thus, I challenge you all to a little contest.
I've come up with (as far as I know) a new method for encoding basic text, and I'd like to see how long it takes for people with bigger brains than mine to figure it out.
The details and a sample of encoded text are at:
The official challenge will run until the end of November, depending on interest. At that point, I'll either extend the deadline, post the algorithm, or, if nobody seems to have taken it up, do nothing.
If you have questions or want me to verify a guess, you can either use the contact page on the site, or post here.
Sometimes we all get bored. Sometimes we play games, but sometimes we don't have computers handily available. Those times, we have to get creative.
I think your example text is possibly shorter than what most people will want. I'd request another few sentences at least.
Could you post a link to an example of this type of question with a solution so those who have never attempted something like this can have a basis from which to start solving?
#4 (edited by musicalman 2017-10-24 08:55:42)
I am not even going to attempt this to preserve my sanity. But I've posted it to Twitter. I can't say it'll actually get attention though since I'm not all that popular. Far from popular actually. But, we'll see what happens. I'll be checking back periodically to see how people get on with it.
If you like what you're reading, please give a thumbs-up.
@Aprone: I can certainly work up some more content and will post it at some point today.
@TJT1234: Not really, especially because I'm not good at the "decoding other people's algorithms" part myself.
What I will do is include additional content as Aprone requested, and add another (vague) pointer to the hints section.
Update: I've added a little more info and a second (about 10 times larger) set of text to work on.
Has anybody actively started hacking on this? If so, how is your experience so far?
This is my first attempt making a challenge like this, and while I want it to be somewhat difficult, I'd also like to avoid needless frustration for people.
I'm taking a crack at it John, for at least as long as I can spare time today to work on it. I've got a few ideas and I think I've ruled out a few things, so we'll see how it goes.
I'm tempted to explain what I've done so far on it, but at the same time that could give my competition a leg up! LOL!
I suppose I do have a question that may help me. Are you sure the original message did not contain any typos before you encoded it? I understand if this is not an applicable question to ask, for example, words may not be spelled correctly in English, but that could be what you intended. With that in mind, maybe my alternative question should be Are you sure the original message was how you intended it before you encoded it?
I'll guarantee the accuracy of the smaller chunk of text, as I encoded and decoded it a couple different times to test various things.
I won't swear by the second, but the program that created it did correctly create the first section, so I feel pretty confident that everything was accurate.
So far, the page has registered 55 hits (45 this morning). The only activity I've seen is here, on the forum. This of course doesn't mean other places haven't been active, just that I don't know of them.
Do you want people to keep their methods private so as not to mess up other people who don't want help? If you actually want to hear the steps people are taking, and what they've tried so far, let me know and I'll happily summarize what I've done so far.
@Aprone: I'd been planning for people to work together a bit, but hadn't thought of the spoiler angle. I am pretty curious what you've come up with, but if you feel like its telling too much, feel free to just use the contact page or shoot me an email.
John, I think I can explain what I've done so far without giving away enough to spoil anyone else's fun.
I started out by putting together a small program that would break down your example messages, to show me how often each letter was used. I also grabbed several of your longer forum posts and the text from your web page so that I'd have many example sentences for how you personally write. The program broke that down as well to show how much each letter was used.
There is a pretty typical curve to how many times each letter is used in typical english sentences, and the posts you've written were close enough to that to be considered normal. What I wanted to see was if the encoded example message followed a similar curve. It did, or at least it was a stretched out version of it.
This let me make a few guesses. First off, this told me that your encryption method is almost certainly a letter-for-letter substitution, rather than a hash. It also means that the letter-for-letter substitution isn't the kind where a "T" means "B" now, but now that it's been used once already the next "T" means "D". I don't recall the name for those kinds of shifting methods, but I believe I've ruled that out because of how everything still fit with the letter curve you use in normal conversation.
The lowercase a was used so much that it is clearly some sort of delimiter. For now I'm just assuming it is a space between words, though the number of them is so high that it seems to have a separate purpose that I haven't figured out yet.
The curve is stretched out to basically twice the size I see in your normal conversations, meaning everything is case sensitive. A capital "D" in your encrypted message is not the same thing as a lower case "d" if things were to get translated back into the original secret message.
If the letter-for-letter substitution only used twice as many encoded letters for the sake of recreating the upper and lower case of the original message, then I wouldn't expect to see the list double like this. It's entirely possible that your encoded example messages contain Most of the 26 letters of the alphabet in lower case, but it's not likely it contains almost all 26 letters in upper case too. So this leads me to believe that you may be doing some randomizing when you encode a message. What I mean by this is, the original message could be "rrrrr", but your system may have 2 separate letters that an "r" can become, so at random it will decide which to use. Those 5 "r"'s could be encoded to be something like "tNNtt", where the "t" and the "N" are interchangeable and both turn back into an "r" when the message is decoded. It's my current theory anyway.
I also listed all of the words in your encoded example sentences to look for patterns. For this I was assuming that the letter "a" separates words. I found 2 pairs of words that are identical except one has 1 extra letter at its end. The game was on trying to figure out what those words could be, taking into consideration my charts of letter frequency to help me guess what the most likely letters were in some of the spots. For example, the letter "e" is the most common letter you normally use (as it basically is for us all), and the coded letter "I" is the most used. This doesn't mean "I" is "e", but it could mean "I" will be one of your most used letters, "A I O T E". When I look at an encoded word, "HG33r1I" for example, I know I'm looking for a 7 letter word that Can become an 8 letter word by adding 1 letter to the end. None of the letters in the word match any others, except for the double letters "33".
For a while I found myself wanting that word to be "pattern". It seemed like the words pattern and patterns might both show up in your encrypted examples, all of the letters were unique, and the double letter was in the right spot. Seemed like a good start. The "3" is the 6th most used letter in your example, which makes this plausible since the letter "t" is your second most used in normal conversation. The "H", "G", and "1" matched up similarly well with "p", "a", and "n". "r" for "e" and "I" for "n" were a bit too far off of expected for me to be comfortable however. Don't get me wrong, they were close, but just far enough away to make me start to question my approach.
I've ultimately set the above plan aside because an odd bump in the letter graph reminded me that I'm possibly dealing with 2 code letters equaling the same decoded letter. This means that my list of code letter frequency can't accurately be relied on until I figure out how those add up. If I'm on the right track with this then it may mean the order of these will be different. I'm not sure of a good way to explain what I mean, or at least not one I can type out quickly enough now that I see I'm very low on time.
I've been running errands for most of the day and it's time for work soon. If I'm not too tired I may take another look at this after work, before I go to sleep. Hopefully I'm not going in entirely the wrong direction, but if I am hopefully another participant will jump in and show me where I've taken a wrong turn. haha!
Your post details exactly why I'd be horrible at figuring something like this out myself. Writing up a pattern matching program is something I just hadn't considered at all. I'm really glad you seem to be enjoying this. A couple repeated hints ahead, so if you'd rather avoid potential spoilers stop here.
We're not dealing with encryption here at all, so you're right to disregard Cesar cipher-style shifts and hashing.
Also, remember that the first example was encoded and decoded (more than once) entirely by hand, and that I came up with the algorithm the day I posted the challenge.
I don't want to say more than that for fear of tipping you off to what exactly you've got right or wrong so far, but I'm seriously impressed.
Continuing from Aprone's observations...
So, if we run with the idea that a is a word delimiter, then there are 3 single-letter words in the first sample. That's kinda weird if we're talking standard English, but there is an English writing system with loads of single-letter words. Not that this is helping me much, heh.
Well I had a chance to look at this puzzle again last night, and I tried a few more things. I first checked to see if both messages happened to be lengths that were divisible by the same number. If they had been it would most likely have been a coincidence since my previous tests seem to have ruled out hashing-type encryption, but I wanted to check.
I attempted to pair each "a" with the letter directly following it, thinking that could solve a few odd things about the letter frequencies I found the other day. My thought was that if each "a" was paired up with another symbol, it could represent not only spaces but also punctuation marks. First off it would explain why there are too many "a"s (or at least too many for what I'd expect on a normal letter-for-letter exchange). Second, your individual writing style uses quotes, brackets, and other punctuation more than the average poster. This wasn't intended to be either a compliment or an insult, but rather just an observation, hehe. Since you use a variety of punctuation marks in normal conversation, it stands to reason that you'd want your text encoding technique to preserve those as well as the text of the message. Lastly, it would remove a lot of the single letter "words" in your encoded examples.
So for now I'm looking at the puzzle from this angle, though it isn't looking nearly as promising as I'd hoped.
CAE, have you had any luck yet? I can tell I'm looking for another guy to chime in with some sort of revelation I've overlooked.
#16 (edited by CAE_Jones 2017-10-26 17:24:42)
I tried writing it out on an index card and wound up writing the second one backward, so I think that might be the opposite of progress .
Unless the first word is index. But that seems like it'd be very slow to code by hand.
 Nope, wrong. If you squint _really_ hard, the one I tried might give you "index of us should?", and I had to cheat 3 separate times to get it that coherent. So unless you count pruning hypotheses, no progress here. [/edit]
There's not much I can say without giving way too much away. Again though, don't think encryption. Seriously. Don't think encryption, don't think hashing. The first sentence takes less than 10 minutes to encode by hand without a calculator of any sort (I could have used paper just as easily as a computer).
If you go down any form of encryption road, you're overcomplicating things.
Well I've just finished a few Hours of working on this again. Hannibal and Kane got pulled into helping, and the 3 of us tossed ideas back and forth on vent for a long time. It was very helpful having people there to come up with new ideas and to shoot down ones that were doomed to failure, but I just couldn't see it yet.
I know you've said that this isn't a shifted cipher, but I still keep going back to this being a letter-for-letter swap. We went through so many ideas that I couldn't even begin to remember them all to write about them here.
I believe my previous theory about each letter being randomly assigned one of 2 possible encoded symbols is wrong. Hannibal (or was it Kane?) found a very elegant way to disprove that idea.
The current theory is that "t" is used as a line break, which caused a few other plans to be destroyed.
I used a program to generate 488 double letter words of the correct length, to match up with one from the puzzle (that appears twice). Using the letters it would fill-in, each of the 488 was simulated being fit into the puzzle and possible 12 letter long words were searched to see if they would then fit into the puzzle. The idea was that any letter substitution would have to work where the 7 letter word is present, as well as the one 12 letter long word. The resulting list was actually very large, including tons and tons of potentials pairings. If the double letter portion of the 7-letter word is a consonant then the surrounding letters would be vowels. There are only very few possible words with double vowels in that position, and those would be surrounded by consonants. These 2 rules really narrowed down the list of pairs considerably.
I hit a brick wall once I further narrowed down the list, taking into consideration that the 7th and 10th positions in the 12-letter word are the same symbol. That gave me only 4 results. None of the 4 pairings worked when put into the puzzle. It would create words elsewhere that are just not possible. What really got under my collar was the fact that "pattern" showed up in that final 4! From the very beginning clues keep pushing me back to the word "pattern", and then I convince myself I'm going in the wrong direction. Once again, "pattern" rears its ugly head. If we pretend "pattern" would fit nicely into the puzzle, the paired word would be "repatriation", which refers to returning someone to their home country. I wondered if the original message may have been spy-themed and could talk about a captured spy being sent back to where they came. Oh well.
For now I think I'm going to have to take a break from working on this. I have a lot of projects I need to be working on, but it's been fun! Thanks for the puzzle John, and I hope to hear that others make more progress and figure this thing out.
You're getting closer, that's for sure. I did say this isn't a shift. I didn't say there wasn't a character-to-character mapping (not that there is or isn't, but I wanted to clarify my meaning).
I find the "t is a line break" theory to be really interesting on a couple of levels, but until its solved or the challenge ends, I'm keeping those to myself.
I'm really glad you guys are having fun with this, and will admit to wondering if you didn't come up with the answer and dismiss it during that chat session.
Some quick stats:
So far the challenge page has received 72 hits, and the only known activity is here on the forum (surprisingly I haven't received a single email via the site's contact page).
I've finally changed my direction entirely. I was about to drift off to nap-land on the sofa when an idea popped into my head. I got up and checked the frequency graph from earlier, and sure enough my hunch was correct and I was looking at it wrong. When zoomed out, it does very much resemble the normal english frequency graph (stretched to roughly double the length), but when I take the time to zoom in it is clearly wrong.
The frequency of the symbols in the encoded messages jump by 7 each time, or close enough to 7 that I'm saying that's what's happening. Less than a 0.5% deviation cannot be coincidence, especially not on such a relatively small piece of sample text.
I don't really have time to explore this idea further, but it's given me a new direction to go once I have time to return to this.
#21 (edited by CAE_Jones 2017-10-27 16:04:56)
The idea of t as a line break or punctuation made me realize that both samples start with b, so I went back to check, and it also seems that every t-delineated chunk also starts with not just b, but bu. I think ti is more likely punctuation than a line break, since the string "tabu" shows up in the middle of the long sample.
Of course, it could be a coincidence and every sentence starts with the same two letters (in which case, th and wh make the most sense).
Given that our sentence-starting words in this model are buZ and bui, that implies that b is probably w, since we have "who" and "why", but the only 3-letter th words are "the" and "thy", and I'd be even more impressed if the trick here is "thy"
That kinda kills my hypothesis about the single-letter words, though. Are there any common sentence-starting pairs that don't have h as the second letter, and form 3-letter words?
This is, of course, all hanging on the idea that character substitution is what we're at.
Sentence starters, under the a=space, t=fullstop model:
buZ, bui4I, buiR
[edit2]Also, all our single-letter words are capitalized. You could even count the exclamation point as a capitalized 1, . I only found U, K, X, and !, IIRC. This could be a coincidence, but I'm leaning toward doubtful.[/edit2]
[edit3]If we go with bu being a symbol or tag, instead of letters, then we also have an isolated Z as the first word of the shorter sample. Which would imply that the uppercase loners theory is more likely.[/edit3]
Review the larger sample again. Your post above is missing something.
I'm considering posting another sample to the site (and maybe making them downloadable). If I do, keep an eye on this thread at the start of november.
Does anybody have a preference as to the way in which samples are delivered (in-page only, downloadable, both)?
I just went to put together a third sample of text, and discovered to my horror that my encoder had a small glitch that was causing it to omit a character. Since this character never appeared in the section I'd done by hand, it looked like everything was fine when I reversed the process.
Let this be a warning to everyone: do not develop software hastily. It will come back to bite you.
I've repaired the encoder and re-processed both the existing long sample and my new one, which is about 2.5 times longer. Because of the nature of the glitch, a sizeable portion of the messages have changed slightly. The underlying scheme has not, however, and the changes should be easily noticeable and may even give you some clues.
I'm very sorry for the mixup; hopefully having the longer text and ability to cross-reference against the old example will make up for the loss of time.
All the examples are on the site at:
As I add more text, I've been considering creating a zip file of "resources", including files for each sample. If anybody would like to see this, let me know.