Researching erectile dysfunction (ED) in 2019 reveals that the condition is becoming more relevant to younger generations as it becomes humanized and better understood. As a result new start ups like Hims and Roman have begun to target a younger generation of men. If this health condition is gaining so much traction online, what is hidden beneath the text?
We decided to do analysis on a group of articles that talk about ED and compare it to another text. For the analysis we collected text from 22 different articles on (ED) and analytically compared to it to Robert Louis Stevenson's text The Strange Case of Dr. Jekyll and Mr. Hyde. To perform the analysis we used the open source web application Voyant Tools made by Stéfan Sinclair and Geoffrey Rockwell. Voyant Tools is a powerful text analysis application that can identify hidden statistical characteristics, and provide a variety of visualizations based on the data.
Stevenson's text was chosen because it represents an important part of influential literature, and is able to serve as a text that is highly distinct from a blog post or an online article. Another more discrete reason we are analyzing Jekyll and Hyde is due to the coincidental thematic consistency with erectile dysfunction. Jekyll and Hyde represent a dual personality much like how some people separate their sexuality from their everyday life. Effectively they create a dual persona, someone who is for the bedroom and someone who is for the rest of the world.
Performing the Analysis
Before we can analyze the text we must prepare a corpus. A corpus is roughly defined as a body of structured text meant for statistical analysis. We looked at only the top ranking articles on ED in google and keyword tracking tools, and copied the text into a .txt file. Then we cleaned up the data to remove anything unnecessary not relating to the primary text.
We used 22 different articles all from different web pages and authors to build a corpus of 25,777 words. Jekyll and Hyde has 25,722 words and is roughly equal to the ED corpus making it an ideal comparison based on text size.
Voyant Tools visualization technology allows us to see a variety of graphs, visual shapes, and many other forms of data extrapolation. For this analysis we will look specifically at the knot's visualization of both corpus's. The knots visualization allows you see each occurrence of repeated keywords in the text. The more the line bends the more the keyword is repeated. The longer the line stretches without bending represents no occurrences of the keyword. Depending on the frequency for certain keywords, different patterns emerge in the knots visualization.
Looking at this visualization we see that the ED corpus looks like a flower and the Jekyll and Hyde Corpus looks like blob with ears on a leash. But wait, turn the image and now the ED Corpus looks like purple tree being struck by lightning. Turn the image for the Jekyll and Hyde corpus and it looks like an industrial crane moving a pink blob, or maybe it looks more like Hyde himself with a cane.
Use the sliders within the iframe picture above to move the image around. What do you see when you move the knots around?
With Voyant Tools we can look specifically at the statistics for each of the corpus's. These statistics alone can provide most of what we can know about a text. The Strange Case of Dr. Jekyll and Mr. Hyde has 25,722 words with 3,974 unique word forms with a vocabulary density of 0.154. The average words per sentence is 22.1 and the most frequent words in the corpus are: said (130); utterson (128); mr (122); hyde (96); jekyll (82); man (77); lawyer (67); poole (61); like (59); sir (59).
When compared to the ED corpus we can see right away there are differences in all categories of the summary. The ED corpus has 25,777 total words with only 3,377 of them being unique word forms. The vocabulary density is 0.131 with 18.7 words per sentence. The most frequent words used are: erectile (275); dysfunction (266); ed (243); blood (241); men (183); penis (164); erection (121); cause (90); doctor (88); sexual (87).
These stats reveal that the ED corpus uses specific words much more frequently, is less dense, and uses fewer words per sentence when compared to the 19th century Jekyll and Hyde corpus. Based on this comparison we can conclude that the 19th century text is more dense and has more unique words throughout the text compared to the ED corpus.
What to make of this data?
For now, we could claim that text density could be related to ease of readability. Attempting to read the Jekyll and Hyde corpus requires more mental focus than reading the ED corpus does. This is most likely due to the subtle difference of syntax from the 19th century. Even though text from that era is still readable, it still takes more of my mental capacity to read a 19th century text than it does for a contemporary online article. With the statistics from Voyant Tools we can somewhat correlate the text density with my attempt at gauging the readability of the text.
Before we can attempt to make other claims let's look at the purpose of these text's to know if any further relationships can be uncovered.
We know the average Blog post/article is typically written to answer search queries, and is intended to expand the scope of knowledge for that specific topic or keyword. Blog posts/articles also generate traffic for websites which can then be monetized in various ways. If blog posts/articles are written with good copy writing they can even influence someone to take action somewhere on the website. Specific and repeated keywords are also an important convention for online articles to ensure that they are found online in search engines.
Literature of the 19th century relied on none of the above conventions. Stevenson wrote how he saw fit to complete the literary work in the most authentic and creative way possible. If anything classic literary texts usually attempt to evoke self reflection, and make you think beyond yourself. The usual course of action for folks in the 19th century was to talk about it or buy more books, both of benefit to the author.
Both online articles and literary text seem to have a similar purpose. Sustained attention. The blog article wants to you keep reading more of what is on the site in order to monetize you, and the literary text wants to capture your imagination so that you continue to read more of the authors work. With both text's having the same purpose we would have to look at more data relating to the performance of each text to conclusively know more.
Generalizing the Data
At the present moment we could generalize the data and claim that text density relates to readability, and that online articles tend to favor readability to ensure they can be consumed quickly. Older literary texts are a bit more dense with the focus of being evocative so the reader can loose themselves in the text.
There is likely more dense text relating to ED on the internet, and probably less dense literary text that would force our generalization to change. This analysis reveals that we require more samples to see a bigger picture. Our conclusion can only be conjecture until we know more about other text's. We would also benefit from additional data regarding the performance of each text to further understand the purpose.
In the end there is still much to learn when comparing text from contemporary online articles and classic literature. There is still even more work to done to compare like minded articles against each other, and other combinations of text. More can be said about my personal gauge on how readable a text is. If I had been reading 19th century text regularly, I might have claimed that there is no difference in readability. However, I would probably have had to disclose the fact that I was regularly reading text from that era.
So what is actually found within the online text of ED? A drawing of a flower... Coincidence? Absolutely. Is it uncanny? It sure is.
Literary health is good mental health
Reading not only keeps you thinking about the world around you, it is also good for mental health. For those who struggle with ED in any way we hope that this data driven literary nod has taken your mind somewhere more enlightening. After all, ED is a complex condition that has both physical and psychological components to diagnose. Reading could be a way to help ease the psychological component of ED. Regardless of who you become we hope that you stay well read and seek medical help if and when you need it.
Sinclair, Stéfan and Geoffrey Rockwell. “Knots.” Voyant Tools. 2019. Web. 8 Jul 2019. <http://voyant-tools.org/>
Sinclair, Stéfan and Geoffrey Rockwell. “Summary.” Voyant Tools. 2019. Web. 8 Jul 2019. <http://voyant-tools.org/>
Jekyll and Hyde Corpus Summary
ED Corpus Summary
Jekyll and Hyde Corpus
Erectile Dysfunction Corpus
*Backlink* already placed:
If you end up having to take medication, lets hope that your ED prescription turns you into a good version of yourself and not like the dastardly Mr. Hyde.