Is a picture worth a thousand words? — A data driven approach
Spoiler alert — No, it’s not!
I get the point, it is just an English language adage but come on, isn’t is disturbing? A thousand words is a lot.
I feel unless you know more about the picture (more words about the picture) and have some context to it, it is not worth a lot (say, 1000) of words.
So, if a picture is not worth a thousand words, how many words worth is it exactly?
Recently, I found Conceptual Captions, a new dataset with 3.3M images annotated with captions.
Conceptual Caption images and their raw descriptions are harvested from the web, and therefore represent a wider variety of styles. More precisely, the raw descriptions are harvested from the Alt-text HTML attribute associated with web images. — https://ai.google.com/research/ConceptualCaptions
Here are a few (2) examples of images with corresponding captions from the dataset.
I cleaned and processed the Conceptual Captions Dataset to count the number of words for every caption. This is how the data looked after processing.
I created a histogram in Tableau. Following are the results.
As we can see, the histogram is positively skewed. Around 89% of the total 3.32M captions ranged between 5 to 15 words.
Here are major statistics at a glance. Of the total 3.32 images, The range of caption lied between 3 to 50 words. The average words in a caption were 9.95, approximately equal to 10.
So, I feel unless you have more context to an image, it is only worth around 10 words. In other words, To make sense of any random image, you need 10 words on an average. That is a saving of 990 bytes (considering ASCII encoding).
Code/Viz on Github@kanishk307