Style pivot table to create heat matrix

Learn how to highlight the most valuable cells in a Pandas pivot table that summarizes information on billionaires by country and industry.

Even though Hong Kong is small compared to other countries, its billionaires in Real Estate have accumulated wealth amounting to 132 billion dollars.

They are only surpassed by the United States, which has 192 billion dollars in Real Estate.

Heat matrix displaying billionaire wealth across various countries and industries, with notable highlights on Hong Kong and the USA.
Heat matrix of global billionaire wealth distribution

How can we build such a heat matrix from a long-format dataset?

Data

Each row represents a billionaire, and the columns represent their attributes.

The dataset is a subset of the original billionaires dataset from Kaggle.

df = pd.read_csv('data.csv')
Preview of the billionaire dataset showing names, countries, industries, and net worth.
Dataset showing billionaire attributes

Questions

  1. What is a Pandas pivot table, and how is it used in data analysis?
  2. How can you summarize the total worth by country and category?
  3. How to style a DataFrame to create a heat matrix?
  4. Why is formatting a heat matrix crucial for optimal readability?
  5. What insights can be derived from analyzing the heat matrix?

Methodology

Pivot Table to Summarize Categories

To summarize the data into categories, we can use the pivot_table function, setting its parameters as follows:

  • index: the categorical column whose categories will be uniquely represented by the rows of the resulting table.
  • columns: the categorical column whose categories will be uniquely represented by the columns of the resulting table.
  • values: the numerical column upon which a mathematical operation will be applied.
  • aggfunc: the mathematical operation to apply to the values.

Using our dataset, we apply the pivot_table function to answer the following question: What is the total worth of billionaires by country and category?

dfr = (df
 .pivot_table(
     index='category', columns='country',
     values='finalWorth', aggfunc='sum'
     )
)
Basic pivot table summarizing billionaire wealth by country and industry before applying any visual styling.
Initial pivot table of billionaire wealth

Having summarized the data, we can start to analyze it. However, the table does not visually highlight which countries have the highest total worth.

Let’s address this by coloring the table cells with a gradient color scale.

Heat Matrix with Background Gradient

dfr.style.background_gradient()
Initial styling attempt of the heat matrix with a background gradient, emphasizing the wealthier categories.
Early heat matrix version with gradient colors

Hmm… now it highlights the cells with the highest values, but it’s very hard to discern the overall pattern.

Formatting the Heat Matrix for Optimal Readability

To make the heat matrix more appealing, we will:

  1. Fill missing values with 0.
  2. Divide the values by 1,000 to enhance readability.
  3. Format the numbers to include commas and omit decimal places.
  4. Reapply the gradient with the ‘Greens’ color scheme to symbolize dollar banknotes.
(dfr
 .fillna(0)
 .div(1_000)
 .style
     .format(precision=0, thousands=',')
     .background_gradient(cmap='Greens', axis=1)
)
Finalized heat matrix with enhanced readability, showing billionaire wealth by country and industry, with a clear visual distinction through a green gradient.
Enhanced heat matrix for wealth visualization

Would this be interesting to one of your friends? Share it with them.

Most net worth is concentrated in the USA, although it’s remarkable how China’s manufacturers have accumulated wealth four times greater than that of the USA’s.

Which other insights can you spot? Share your thoughts in the comments!

Conclusions

  1. Pandas Pivot Table: Use df.pivot_table to summarize and analyze data patterns efficiently.
  2. Summarizing Categories: Identify categorical variables for analysis and apply mathematical operations to numerical variables.
  3. Creating a Heat Matrix: Employ df.style.background_gradient for visual differentiation of values.
  4. Optimal Formatting: Adjust units with div and simplify numbers using df.style.format.
  5. Deriving Insights: Analyze insights by comparing rows and columns against the color scale.

I’d love to hear your thoughts to further enhance our future articles.

What did you find most interesting or valuable in this piece?

Are there specific topics or aspects you’d like us to delve deeper into in our upcoming publications?

Your feedback is crucial for us to provide better content that aligns with your needs and interests.

Thank you for your attention and support!

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to datons.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.