The idea then is to compare the compressibility of amateur creative writing with that of experts. To accomplish this, I took 95 of the top 100 most downloaded works from Project Gutenberg. I figure that these count as very creative works given that they’re still popular now, ~100 years later. For amateur writing, I downloaded 107 fanfiction novels listed as “extraordinary” from

I then selected the strongest open source text compression algorithm, as ranked by Matt Mahoney’s compression benchmarkpaq8pxd. I ran each work through the strongest level of compression, and then compared the ratio of compressed to uncompressed space for each work.


I must admit a certain amount of disappointment that we weren’t able to distinguish between literature and fanfiction by compressiblity. That would have been pretty neat.

So, what does this failure mean? There at least six hypothesis that get a boost based on this evidence:

  • Creativity and compression are unrelated.
  • A view of humans as compressors is wrong.
  • Human compression algorithms (the mind) and machine compression algorithms are distinct to the point where one cannot act as a proxy for the other.
  • Compression algorithms are still too crude to detect subtle differences.
  • Fanfiction is as creative as literature.

– Robb Seaton, Creativity, Fan Fiction, and Compression

After reading the article I understand what he was doing better, and I think I agree with him that it’s quite likely computer compression algorithms are too crude to rank compressability of fiction in the same way that the human brain does.

But he also completely neglected these hypotheses:

  • The most popular works of literature on Gutenberg are not the most creative or ‘expertly’ creative works of English literature, ie
  • Popularity (let alone popularity on Gutenberg), by itself, is not a direct predictor for how mentally challenging, surprising or unpredictable a work of literature is.

Even if one accepts that creativity, at its most basic level, is exactly the same as unpredictability, he’s taken for granted that creativity (=unpredictability) is what people value the most in literature. This is far from proven! In fact, he started his previous post with research suggesting that what people find visually beautiful is the most predictable, i.e. symmetrical, computer-generated composite ‘averaged’ images of faces, cars etc.

I’d be interested to see what science and computer analysis could tell us about what qualities are linked to the popularity of literature, both classic and modern. I suspect that intuitively few people would guess a 1:1 relationship with unpredictability or any alternate definitions of creativity. I would predict that it would be a factor, but I’d never imagine it to be a predictive one. That would mean that the very most popular (or respected)  works of classic literature would also be the very least predictable. From my experiences with the so-called literary canon, as well as with bestsellers, I would find that very hard to believe.

Reblogging for commentary

Creativity, Fan Fiction, and Compression
Tagged on: