Skip to content

Word Cloud Visualization

Word Cloud Tutorial

Introduction

Word cloud is another unorthodox visualization method that can be very useful and practical to highlight projects, express ideas,  summarize documents and visualize text and create text art in general.

Word cloud is another term that’s often used interchangeably with text cloud as well as tag cloud.

It is a underutilized but highly effective tool and Python offers yet another great library to practically create amazing word clouds in a couple of minutes.

This visualization method, technology or art form can be useful in a variety of fields for people with professions such as:

  • Scientists (research papers)
  • Entrepreneurs (presentations, brain storms, conceptualization)
  • Professionals (meetings, presentation, milestone preparation)
  • Consultants (pitching, branding, reports, research)
  • Students (thesis, assignments, projects)
  • Job seekers (recruitment, personal branding)
  • Bloggers (quality visual material, highlight images)
  • Media and Social Media (Communicating news and ideas)
  • Personal use cases (anniversary messages, gifts, gift ideas etc.)

I don’t remember a time when I saw a word cloud and my analytical mind was greatly intrigued. Word clouds are great attention catchers but also they do it in a quality and useful way.

In this tutorial we will demonstrate word cloud examples and try to answer the questions like: “How to create word clouds with Python” and “How to save word clouds as image files”.

1- Preparation and Libraries

Preparation for a word cloud generation project in Python is fairly simple. We will need libraries that we will use and we also need data which we will generate a word cloud based on.

Libraries: we will use wordcloud library and particularly two of its modules. wordcloud library can be installed by using this command:

  • pip install wordcloud for futher details refer to: Installing pip packages
  • WordCloud: Module that will be used for generating word clouds.
  • STOPWORDS: Module that helps us exclude certain words such as: this, that, the, of, have, has, am, are, there etc. These words usually don’t provide lots of insight when analyzed singularly in isolation.

Data: Needs to be in string format. Wordcloud function will automatically analyze it and generate a word cloud based taking into consideration the frequencies of words in the text string. More significant words (also more frequent) will be given larger font size in the word cloud.

Let’s import the Python libraries and modules already and get to work.

from wordcloud import WordCloud, STOPWORDS

2- Example 1: Word Cloud Basics with Python

We already imported the libraries, now we need data in string format. Let’s create a light example with a minimalistic string first.

Here are some of the words that come to my mind right now:

Getting some data ready:
text="Coke, Twitter, Jeep, deer, trip, read, snickers"

It doesn’t take a novel to create a word cloud. Just some cool or relevant words and that’ll still do.

We can also assign stop word collection to a variable so we can use it in the WordCloud function.

STOPWORDS is a handy collection that includes most of the stop words that makes sense to include in a project like this. You can simply print it out and see what’s in it.

stopwords = set(STOPWORDS)

Python sets are unique, immutable, unindexed, unordered data structures. They are similar to tuples but they don’t have an index which makes them slightly faster to operate with and they only have unique values in them which makes them ideal for projects where we want only a unique set of values.

WordCloud utilization:

We are now ready to use WordCloud and generate a word cloud with it. There are so many options that come handy when using this function and we will demonstrate some of them here.

Let’s create a white background word cloud with stopwords as created before. Let’s also make its dimensions 800×400 pixels and minimum font size in the cloud 6. 

Mask parameter can be used to create word clouds with arbitrary shapes such as clouds, faces, hand figure, circular shape, elliptic shape, car shape etc. We’ll leave that for now.

cloud = WordCloud(background_color="white", max_words=200, mask=None, 
        stopwords=stopwords, min_font_size=6, width=800, height=400)
 
cloud.generate(text)

Limited amount of words but it still looks great.

Saving Word Clouds:

Looks great. It’s super easy to save word clouds as images. All you have to do is use to_file() method on your cloud and then specify a full image name such as cloud.png or output.png.

cloud.to_file("Desktop/output3.png")

3- Example 2: Word Cloud from a Startup Story

Time is scarce in modern times but I’m still lucky enough to stumble upon very cool reading material every now and then and I treasure those reading opportunities.

The Innovation Stack was a bulls-eye hit for me and I think every coder, engineer, scientist and entrepreneur can benefit from it. It’s also known as a more humorous and well written version of Zero to One, another startup / venture capital classic.

So in this book, the author shares a notable venture capital presentation he had with his partner Jack Dorsey (also founder of Twitter) when they were looking for capital for Square.

They did something cool which helped them and impressed VC audience greatly. They listed 140 reasons Square will fail. Let’s visualize those reasons with a word cloud (or tag cloud) using Python and see what kind of visual we get.

The book’s full name is The Innovation Stack: Building an Unbeatable Business One Crazy Idea at a Time by Jim McKelvey.

We recreated a word cloud project with most of those reasons (100+) derived from the book itself and visualized the word cloud using Python and its wordcloud library.

Below are the 140 reasons why Square might fail and word cloud visualization results.

140 Reasons to Fail Word Cloud:
wc = WordCloud(background_color="white", max_words=200, mask=None, 
stopwords=stopwords, width=800, height=600, colormap="tab20",
min_font_size=8, max_font_size=125)
 
cloud.generate(b)
cloud.to_file("Desktop/output6.png")

Additionally we are using colormap parameter here. This is a fantastic addition because you can simply call one of matplotlib’s colormap schemes and WordCloud will evenly distribute the color palette over different words in the cloud. Viridis and Inferno are my personal favourites but tab20 suits this word cloud really well.

You might want to be careful while choosing a color map because if palette includes white, white words won’t be visible on white background (same with any other background colors matching a color in the palette).

140 reasons for failure for Square word cloud

No wonder why this amazing concept of listing failures from Square founders impressed the VC crowd enormously and opened new doors for them.

It’s proactive, thought-provoking, beneficial, safe, smart, intelligent, entertaining and visually aesthetic.

Imagine seeing a word cloud full of critical thoughts after seeing 100s of boring presentations full of baseless growth forecasts and self-praise.

This mutual list of Jack Dorsey (also founder of Twitter) and Jim McKelvey can be very relevant from startup culture, venture capital and also payment systems perspectives.

ps: At the bottom you can find the fullish list if you are interested. We couldn’t find every single reason from the VC slide but it’s somewhat close to complete.

4- Example 3: Word Cloud for HolyPython.com

It’s an exciting opportunity to visualize some of the topics we have been putting out there for coding enthusiasts, professionals, students, engineers, scientists and entrepreneurs.

stopwords = set(STOPWORDS)
holypython = "...our website's keywords..."
cloud = WordCloud(background_color="white", max_words=200, mask=None, 
               stopwords=stopwords, min_font_size=8,
               width=800, height=600, colormap="inferno")
 
cloud.generate(holypython)
140 reasons for failure for Square

Saving wordcloud is easy

Wordcloud Summary

In this visualization tutorial we discussed the concept of word clouds and their use cases as well as benefits.

We also learned how to use Wordcloud library in Python to create word clouds of different sizes, colors and shapes.

Furthermore, we introduced a couple of interesting word cloud examples that inspire you to create your own word clouds. What do you think? Do you think word clouds are useful as well? They are beautiful for sure but do you also see use cases for them?

What steps in your private life or business projects do you think you could incorporate word clouds to better express an idea or message you’re trying to communicate?

Thanks so much for visiting our visualization tutorials!

Reasons why Square might fail

  1. bad_math,
  2. Twitter,
  3. boring_idea,
  4. Facebook,
  5. no_demand,
  6. stupid_directors,
  7. wrong_investors,
  8. bank_rejections,
  9. chargebacks,
  10. debit-cards,
  11. hire-wrong-folks,
  12. hardware_manufacturing,
  13. restrictive_partnership,
  14. association_vetoes,
  15. Jacks_midlife_crisis,
  16. malware,
  17. governmental_regulation,
  18. bad_customer_service,
  19. upstart_competitors,
  20. lack_of_focus,
  21. Google,
  22. complexity,
  23. TPP_approval,
  24. managerial_deadlock,
  25. data_inundation,
  26. crappy_culture,
  27. users_reject_electronic_payments,
  28. no_plan,
  29. Heteroskedasticity,
  30. no_office_space,
  31. users_reject_electronic_payments,
  32. managerial_deadlock,
  33. hardware_manufacturing,
  34. Mint.com,
  35. weak_partners,
  36. toxic_corporate_culture,
  37. crappy_culture,
  38. jack_kills_jim,
  39. data_inundation,
  40. foreign_hackers,
  41. RFID,
  42. robot_uprising,
  43. no_users,
  44. government_regulation,
  45. insider_theft,
  46. debit_cards,
  47. craigslist,
  48. fraud,
  49. AML_reporting,
  50. bank_rejections,
  51. regulation_Z,
  52. Can’t_ACH,
  53. currency_devaluation,

  54. debit_cards,
  55. jilted_VC_copycats,
  56. long_tail_risk,
  57. lawsuits,
  58. errant_risk_model,
  59. ISOs_undercut_us,
  60. data_theft,
  61. contractual_mistakes,
  62. regulation_E,
  63. can’t_scale,
  64. can’t_hire_engineers,
  65. indecisiveness,
  66. hire_wrong_folks,
  67. pushers_gamblers_and_whores,
  68. payment_inexperience,
  69. Facebook,
  70. Inflation,
  71. bad_name,
  72. phishing,
  73. grow_too_fast,
  74. Google,
  75. no_revenue_model,
  76. API_attacks,
  77. Paypal_attacks,
  78. panic,
  79. no_user_data,
  80. Diseconomies_of_scale,
  81. no_network_effect,
  82. copycat_competitors,
  83. Zoe_revolts,
  84. shiny_distractions,
  85. no_debit_support,
  86. no_cash,
  87. marketing_costs,
  88. jims_new_wife,
  89. user_apathy,
  90. KYC,
  91. bad_reputation,
  92. interchange_regulations,
  93. bust_out,
  94. can’t_settle_funds,
  95. unreachable_decision_makers,
  96. weak_user_identity,
  97. Telecoms,
  98. expensive_hardware,
  99. ugly_hardware,
  100. new_payment_network,
  101. AMEX_rejection,
  102. FirstData,
  103. Restrictive_Partnerships,
  104. Inflated_Valuation
  105. Apple_revolts
  106. Fail PCI Audit