The whole idea for writing computer code to generate haiku struck me one day while I was at work.
Ecologists use a lot of statistical programs, but a popular one is
R, which is more than just a statistical program in that it is also a programming
language. I've been learning to use this program and language recently to work with ecological data and analyses. A few weeks ago, I began a project where I wrote code in R to parse text. We had species descriptions for over 9,000 plant species in South Africa, and I started to learn how to use
regular expressions to extract usable bits of data from the paragraph descriptions (for example, the height of the plants or the months when they flower).
While I was learning how to manipulate text in this way, my mind wandered to imagining other possibilities for this type of code. And that's how the idea was born.
Honestly, I may never have followed up on the idea if I hadn't shared it with my friend Jared, poet and editor of the online magazine
The Jivin' Ladybug. Jared got really,
really excited, which got me even more excited, which meant that I went home after talking with him and spent the next two nights up late at night, working on code.
In order to get syllable counts for words, I downloaded a file of 185,000 words with hyphenation marks from the
Moby Project, which is a public-domain lexicon. I was very grateful to find something I could freely use to let me easily calculate syllables.
And that was all I needed in order to generate random-word haiku.
The code I wrote starts out by selecting a random word and then checks the syllable count. If adding that word to the first line won't make the syllable count for the line go over 5, it adds the word to the line. Then it does it again, until there are 5 syllables in the first line. Then it does the same thing for the second line (only with 7 syllables instead of 5), and the third line.
Here is another random word haiku-- the second one that the program wrote:
urushiol glaze
smallholding jabbing Pliny
hypostyle hinder
For those of you who are familiar with R or other programming languages, the code used to generate random word haiku is after the cut. For the rest of you, all that junk after 'Read More' is just the stuff that tells the computer do to what I described a couple paragraphs above.