The idea then is to compare the compressibility of amateur creative writing with that of experts. To accomplish this, I took 95 of the top 100 most downloaded works from Project Gutenberg. I figure that these count as very creative works given that they’re still popular now, ~100 years later. For amateur writing, I downloaded 107 fanfiction novels listed as “extraordinary” from

I then selected the strongest open source text compression algorithm, as ranked by Matt Mahoney’s compression benchmarkpaq8pxd. I ran each work through the strongest level of compression, and then compared the ratio of compressed to uncompressed space for each work.


I must admit a certain amount of disappointment that we weren’t able to distinguish between literature and fanfiction by compressiblity. That would have been pretty neat.

So, what does this failure mean? There at least six hypothesis that get a boost based on this evidence:

  • Creativity and compression are unrelated.
  • A view of humans as compressors is wrong.
  • Human compression algorithms (the mind) and machine compression algorithms are distinct to the point where one cannot act as a proxy for the other.
  • Compression algorithms are still too crude to detect subtle differences.
  • Fanfiction is as creative as literature.

