Introduction to Charts and Graphs
Dashboard widgets, commonly known as charts and graphs, play a crucial role in enhancing the clarity and understandability of data. This chapter delves into the classification of charts and types of visualizations used in data analysis, offering insights into their appropriate use. Understanding the different types of charts and when to use each type can significantly improve the clarity and impact of your visualizations. For example, bar charts are effective for comparing values across different categories, while line charts are ideal for illustrating trends over time. Data visualization serves various important roles in data analysis, helping to convey insights in a clear and impactful manner. Some common roles for data visualization include:
- Change Over Time: Line charts and area charts are commonly used to visualize trends and changes in data over time. These visualizations make it easy to identify patterns and fluctuations in data.
- Part-to-Whole: Pie charts and stacked bar charts are examples of charts for illustrating how individual parts contribute to a whole. They are effective in showing proportions and percentages within a dataset.
- Data Distribution: Data distribution charts provide insights into the spread and central tendency of a dataset. Histograms and box plots are few examples used to visualize the distribution of data.
- Comparison: Comparison charts help you to identify differences and trends across groups. Bar/Column charts and grouped bar/column charts are ideal for comparing values between different groups or categories.
- Relationships: Relationship charts help identify correlations and patterns in data. Scatter plots and bubble charts are examples used to visualize relationships between variables.
- Geographical Data: Maps widgets are generally used to visualize geographical data. These visualizations help identify trends in data related to specific locations.
Bar Chart and Column Chart (Grouped or Stacked)
A bar/column chart is the most common type of data visualization used to represent the data value with the rectangular bars. The length of the bar chart corresponds to the data value. The different lengths for each dimension value makes it easy to compare different dimensions labels. Bar/Column chart are an ideal option when you want to do the comparison of a measure value against dimension values or across different categories. The best example for when to use a bar/column chart is especially when you wish to compare the sales figures across different departments. Bar/Column chart can also be used when showcasing the trend over time over discrete data points like the quarterly price change. Additionally, bar/column charts can illustrate the part-to-whole relationships using stacked bars to show contributions of each category to a total and can also visualize frequency distributions. You can also use the bar/column charts to show case the stacked percentage contribution.
Bar Chart Column Chart
When to Use a Bar/Column Chart
- A bar or column chart can be used to showcase comparisons between different categories or groups, showcase trends over time , part-to-whole relationships , and visualize frequency distributions.
- Example: Bar or column charts can easily showcase the comparison on sales figures across different product. groups or countries.
Do’s and Don’ts
Do’s:
- Use bar/column charts to compare values between different categories or groups.
- Use clear and concise labels for the x-axis (categories or groups) and y-axis (values) to ensure the chart is easy to understand.
- Use different colors or patterns to differentiate between bars or columns, when comparing multiple categories or groups.
- Use a bar/column chart when you have categorical data (data that can be divided into distinct categories).
- Use a bar/column chart when you want to compare values between different categories or groups.
- Use a bar/column chart when you have discrete data points (e.g., months, product categories).
Don’ts:
- Don’t use a bar or column chart for continuous data (use a line chart instead).
- Don’t overcrowd the chart with too many bars or columns.
Line Chart
A line chart is a type of graphical representation of the continuous progression of data points over a time period. The chart connects the data points; each point corresponds to a specific value of a variable through a simple line, hence the title line chart. A line chart consists of both a numerical and a categorical axis representing the numerical values (quantities, percentages, scores, or any measurable data) and the category (time period or sequentially ordered categories, such as months or stages of a process). A line chart helps you track, illustrate, and emphasize the continuity and progression of values as they change. The direction and slope in the line chart convey valuable insights, such as that upward trends indicate increasing values, downward trends show decreases, and flat lines suggest stability or no change in the data. A great example of the line chart can be demonstrated in the financial sector to track stock prices, market trends, and economic indicators over time.
The profit trend for the calendar years 2022 and 2023 is displayed in the above line chart. According to the trend line, the month of January 2023 is when the largest profit is shown, and as May 2023 approaches, the profit trend drastically declines.
When to Use a Line Chart
- Showing Trends Over Time: Line charts are most commonly used to show trends in data over time. They are particularly effective for displaying continuous data, where the data points are connected to show the progression over time.
- Comparing Trends: Line charts can also be used to compare trends between different categories or groups.
- For example, you can use a line chart to compare the sales trends of different products or the performance of different departments over time.
- Highlighting Patterns: Line charts can help highlight patterns or anomalies in data. By connecting the data points, line charts make it easier to see trends, cycles, or irregularities in the data.
Do’s and Don’ts
Do’s:
- Use a line chart to show trends over time or to compare trends between different categories or groups.
- Use clear and concise labels for the x-axis (time or categories) and y-axis (values) to ensure the chart is easy to understand.
- Use different line styles or colors to when using multiline chart.
Don’ts:
- Don’t use a line chart for categorical data (use a bar chart or column chart instead).
- Don’t overcrowd the chart with too many lines or data points.
Pie, Rose and Donut Chart
Pie/Rose/Donut charts are circular graphs that show percentages or proportions of a whole. It is simple to see the relative proportions of the various categories since each slice of the pie represents a portion of the whole.
The donut and rose charts above illustrate the relative proportion of sales profit contributed by each product in the market. The rose chart helps you identify the contribution by the proportion size. The pie chart showcases the percentage contribution of each product in the market to the total sales profit.
When to Use a Pie/Rose/Donut Chart
- Showing Proportions: Pie/Rose/Donut charts are most commonly used to show the relative proportions or percentages of different categories within a dataset. Each slice represents a relative proportions or percentage, making it easy to see which categories are larger or smaller.
- Comparing Categories: While bar charts are typically used for comparing values between different categories, pie/rose/donut charts can also be used for rough comparisons. However, bar charts are generally more effective for precise comparisons.
- Visualizing Percentages: Pie/Rose/Donut charts are ideal for visualizing percentages, as the size of each slice is proportional to the percentage it represents. This makes it easy to see the relative importance of each category.
Do’s and Don’ts
Do’s:
- Use a pie/rose/donut chart to show the proportions or percentages of different categories within a dataset.
- Use clear and concise labels for each slice of the pie/rose/donut to ensure the chart is easy to understand.
- Use different colors or patterns to differentiate between slices, especially when comparing multiple categories.
- Use a legend or data labels to provide additional information about each category.
- Use a pie/rose/donut chart when you have a small number of categories (less than 10) to ensure that the chart is easy to read.
Don’ts:
- Don’t use a pie/rose/donut chart for large datasets or when the categories have similar proportions.
- Don’t use a pie/rose/donut chart when precise comparisons between categories are necessary, as bar charts are generally more effective for this purpose.
- Don’t use a pie/rose/donut chart to compare changes over time or to show trends.
Scatter Chart
Scatter charts, also known as scatter plots, are used to display the relationship between two variables. Each point on the chart represents a single data point, with the x-axis representing one variable and the y-axis representing the other.
The above scatter chart showcases the contribution of values on two variables, cost and revenue, plotted against the profit generated by the product category along the two axes. The chart helps to identify any potential correlations or patterns between cost, revenue, and profit, aiding in strategic decision-making for the product categories.
When to Use a Scatter Chart
- Showing Relationships: Scatter charts are most commonly used to show the relationship between two variables. They can help identify patterns, trends, and correlations in the data.
- Identifying Outliers: Scatter charts can also be used to identify outliers or anomalies in the data.
- Outliers are data points that are significantly different from the rest of the data and can provide valuable insights into the dataset.
- Comparison: Scatter charts can be used to compare groups of data points. By using different colors or shapes for each group, you can visually compare the distributions of the groups.
- Showing Clusters: Scatter charts can be used to show clusters or groups of data points. This can be useful for identifying patterns or subgroups within the data.
Do’s and Don’ts
Do’s:
- Use a scatter chart when you want to show the relationship between two variables.
- Use a scatter chart when you want to identify outliers or anomalies in the data.
- Use a scatter chart when you want to compare groups of data points.
- Use clear and concise labels for the x-axis and y-axis to ensure the chart is easy to understand.
- Use a legend to provide additional information about each data point or group.
Don’ts:
- Don’t use a scatter chart for categorical data (use a bar chart or column chart instead).
- Don’t overcrowd the chart with too many data points.
- Don’t use a scatter chart when there is no clear relationship between the variables, as it may not provide meaningful insights.
Box Plot
A box plot, also known as a box-and-whisker plot, is a graphical representation of the distribution of a dataset. It displays a summary of the lower, first quartile, median, third quartile, and upper of a dataset, providing insights into the spread and central tendency of the data. What contributes to the box plot:
- Lower (Minimum): The lowest value in the dataset that is not considered an outlier. This is the bottom end of the whisker.
- First Quartile (Q1): The value below which 25% of the data fall. It is the lower edge of the box.
- Median (Q2): The middle value of the dataset when it is ordered from smallest to largest. It is represented by a line inside the box.
- Third Quartile (Q3): The value below which 75% of the data fall. It is the upper edge of the box.
- Upper (Maximum): The highest value in the dataset that is not considered an outlier. This is the top end of the whisker.
The box plot chart above shows the quantity of the items ordered in each category, through both online and offline sales channels. It helps you compare the differences in order quantity across the available product categories.
When to Use a Box Plot Chart
- Showing Distribution: Box plots are most commonly used to show the distribution of a dataset. The line inside the box represents the median, while the whiskers extend to the minimum and maximum values, excluding outliers.
- Identifying Outliers: Box plots can help identify outliers in the data. Outliers are data points that fall outside the whiskers of the box plot and may indicate errors in the data.
- Comparison: Box plots can be used to compare the distributions of two or more values. By plotting box plots by group side by side, you can visually compare distribution.
- Skewness: Box plots can also be used to visualize skewness in the data. A skewed distribution will have a longer tail on one side of the box plot, indicating that the data is not symmetrically distributed.
- Variability: Box plots provide a visual representation of the variability of the data. The length of the box and the whiskers can give you an idea of how spread out the data is.
Do’s and Don’ts
Do’s:
- Use a box plot when you want to visualize the distribution of a dataset and identify outliers.
- Use a box plot when you want to compare the distributions of two or more values in the dataset and include legends for easy identification.
- Use a box plot when you want to visualize skewness or variability in the data.
- Label the axes clearly to indicate the variables being represented.
Don’ts:
- Don’t use box plots when the data is not suitable for a box plot representation, such as categorical data that cannot be ordered.
- Don’t use box plots with skewed or highly variable data.
Area and Stacked Area Chart
An area chart is a type of chart that is a combination of line chart and bar chart. The area between the line and the x-axis is filled with color or shading. Area charts are commonly used to represent cumulative totals over time and to show the composition of a whole.
The stacked or grouped area chart shows the cumulative total sum of revenue over the sales channels (online and offline) across the years 2022 and 2023. This visualization helps to track the overall revenue trend over time and compare the contribution of each sales channel to the total revenue.
When to Use an Area/Stacked Area Chart
- Showing Trends Over Time: Area charts are most commonly used to show trends over time, similar to line charts.
- Comparing Categories: Area charts can be used to compare the trends of different categories or groups.
- Showing Cumulative Total Values: Area charts are effective for highlighting cumulative totals over time.
- Part to Whole: Area charts can also be used to visualize the composition of a whole.
- Showing Patterns: Area charts can help identify patterns or trends in the data.
Do’s and Don’ts
Do’s:
- Use an area chart when you want to show trends over time and highlight cumulative totals.
- Use an area chart when you want to compare the trends of different categories or groups.
- Use an area chart when you want to visualize the composition of a whole.
- Use a legend or data labels to provide additional information about each category or group.
Don’ts:
- Don’t use an area chart when you have a small number of data points.
- Don’t use an area chart when the data points are not evenly spaced or when there are gaps in the data.
- Don’t overcrowd the chart with too many categories or groups.
Bubble Chart
A bubble chart is a type of chart that displays three dimensions of data: the x-axis(value), the y-axis(axis), and the size of the bubble. Each bubble represents a data point, with the position of the bubble on the x-axis and y-axis indicating the values of two variables, and the size of the bubble indicating the value of a third variable. The position of the bubble on the x-axis and y-axis represents the values of two variables, while the size of the bubble represents the value of a third variable.
The above bubble chart helps you compare the three dimensions of data: quantity, product category and grouped by order priority (high, medium, and low).
When to Use a Bubble Chart
- Comparing Three Variables: Bubble charts are most commonly used to compare three variables at once.
- Showing Relationships: Bubble charts can help visualize relationships between variables by comparing the positions and sizes of the bubbles.
- Visualizing Patterns: Bubble charts can help visualize patterns or trends in the data.
- Comparing Groups: Bubble charts can also be used to compare groups of data points.
Do’s and Don’ts
Do’s:
- Use a bubble chart when you want to compare three variables at once.
- Use a bubble chart when you want to visualize relationships between variables.
- Use a bubble chart when you want to visualize patterns in the data.
- Use a bubble chart when you want to compare groups of data points.
- Use a legend to provide additional information about each bubble or group.
Don’ts:
- Don’t use a bubble chart when you don’t have a three-way relationship between the data points.
- Don’t use a bubble chart when you have a large number of bubbles or when the bubbles overlap.
- Don’t use a bubble chart when precise comparisons between variables are necessary.
- Don’t overcrowd the chart with too many bubbles or groups.
Alluvial Chart
An alluvial chart is a type of chart that visualizes the flow of data between multiple categories. It is particularly useful for showing how data transitions from one state to another over time or across different stages. Each category is represented by a column, and the flow of data between categories is represented by lines that connect the columns.
The above alluvial chart shows how the profits from different product categories contribute to the total profit across product types. Each flow represents the profit generated by a specific product category within a particular product type (Jar and Packet). The width of the flows indicates the relative contribution of each category’s profit to the total profit of the corresponding product type. This alluvial chart helps identify which product categories are the most profitable within each product type and how they impact the overall profitability of the product types.
When to Use a Scatter Chart
- Showing Data Flow: Alluvial charts are most commonly used to show the flow of data between different categories or groups.
- Showing Data Transitions: Alluvial charts can help visualize how data transitions from one state to another over time or across different stages.
- Comparison: Alluvial charts can be used to compare the flow of data between different categories or groups by comparing the widths of the lines.
- Showing Relationships: Alluvial charts can also be used to analyze relationships between categories or groups.
Do’s and Don’ts
Do’s:
- Use an alluvial chart when you want to visualize the flow of data between different categories or groups.
- Use an alluvial chart when you want to visualize transitions between states or stages over time.
- Use an alluvial chart when you want to compare the flow of data between different categories or groups.
- Use an alluvial chart when you want to highlight patterns or trends in the data flow.
- Use an alluvial chart when you want to analyze relationships between categories or groups.
Don’ts:
- Don’t use an alluvial chart when there are too many categories or groups.
- Don’t overcrowd the chart with too many categories or groups.
- Don’t use an alluvial chart for precise comparisons between categories or groups.
Treemap
A treemap is a type of chart that displays hierarchical data using nested rectangles with smaller rectangles nested within larger ones to show the relationship between outer and inner rectangles. Each rectangle in the chart represents a category, and its size corresponds to a specific value, that contribute to the whole category.
The above treemap visualizes orders based on their priority, product type, and category, with the size of each rectangle indicating the number of total orders. It provides a hierarchical view of order data, starting with the highest level of order priority and drilling down to product types and categories.
When to Use a Treemap Chart
- Showing Hierarchical Data: Treemaps are most commonly used to visualize hierarchical data structures.
- Comparison: Treemaps can be used to compare the sizes of different categories within a hierarchy. The size of each rectangle represents a quantitative value, allowing you to easily compare the relative sizes of categories.
- Distribution: Treemaps can also be used to analyze the proportions of a whole. By comparing the sizes of the rectangles, you can see how each category contributes to the total.
- Showing Part-to-Whole Relationships: Treemaps can effectively show part-to-whole relationships within a hierarchy. Each rectangle represents a part of the whole, and the entire treemap represents the complete data set.
Do’s and Don’ts
Do’s:
- Use a treemap when you have hierarchical data and want to visualize it in a compact and manner.
- Use a treemap when you want to compare the sizes of different categories within a hierarchy.
- Use a treemap when you want to understand how each category contributes to the total.
- Use a treemap when you want to highlight and show part-to-whole relationships.
Don’ts:
- Don’t use a treemap when precise comparisons between categories are necessary.
- Don’t use a treemap when you don’t have a hierarchical data, as it may not provide meaningful insights.
- Don’t use a treemap with too many colors or shades, as this can make it difficult to interpret and compare the data.
- Don’t overcrowd the treemap with too much information, as this can reduce readability and effectiveness.
Word Cloud
A word cloud chart is a graphical representation of text data in which the size of each word signifies its frequency or significance. Words that repeated more often in the dataset (column) are displayed in a larger font size, while less common words are smaller. Word cloud charts are useful for quickly identifying the most prominent words in the dataset.
When to Use a Word Cloud Chart
- Identifying Keywords: Word cloud charts can be used to identify the frequency of keywords or terms in a dataset.
- Visualizing Textual Data: Word cloud charts are effective for visualizing textual data in a concise and intuitive way.
- Analyzing Sentiment: Word cloud charts can be used to analyze sentiment in text data by including positive and negative words in the dataset.
- Analyzing Trends: Word cloud charts can help analyze trends or patterns in text data.
Do’s and Don’ts
Do’s:
- Use a word cloud chart when you want to quickly identify the keyword frequency in a data set.
- Use a word cloud chart when you want to visualize textual data in an appealing way.
- Use a word cloud chart when you want to analyze sentiment or highlight trends in text data.
Don’ts:
- Don’t use a word cloud chart with a large number of words.
- Don’t use a word cloud chart just to compare the frequency of words/text in the data set as it may not provide a meaningful comparison.
Pareto Chart
A Pareto chart is a combination of a bar chart and a line graph that helps identify the most significant factors in a data set. It displays individual values in descending order and cumulative values as a line. The chat follows the Pareto principle, also known as the 80/20 rule, which suggests that 80% of effects are contributed by the 20% of causes, adding up to a total of 100%.
- Bar Chart Component: The bars represent individual factors, sorted from the most significant to the least significant. The height of each bar corresponds to the value of the factor.
- Line Graph Component: The line graph shows the cumulative percentage of the total value. This line helps visualize the cumulative impact of factors as they are added from left to right.
The Pareto chart above illustrates sales data by focusing on the profit made by various sales categories. The graph shows bars indicating each category’s sales contribution, ordered descending to highlight the most important contributors. The line graph shows the cumulative profit for each category, demonstrating the cumulative effect of sales categories on profits. The chart makes it easier to determine which sales categories are the most profitable and how they affect overall profitability.
When to Use a Pareto Chart
- Analyzing Causes: Pareto charts are often used in quality control and process improvement to analyze the causes of problems or defects.
- Prioritizing Issues: Pareto charts can help prioritize issues or tasks based on their impact.
- Visualizing Cumulative Impact: The line graph in a Pareto chart shows the cumulative impact of each factor.
- Monitoring Progress: Pareto charts can be used to monitor progress over time.
Do’s and Don’ts
Do’s:
- Use a Pareto chart to identify the most significant factors contributing to a specific outcome.
- Sort the bars in descending order to highlight the most important factors.
- Include a line graph to show the cumulative total of the factors.
- Use different colors or patterns for bars to distinguish between categories.
- Label each bar and the line for clarity.
Don’ts:
- Don’t use a Pareto chart for data that cannot be sorted in descending order.
- Don’t include too many categories in the chart, as it can make it difficult to interpret.
- Don’t use a Pareto chart if the data does not follow the 80/20 rule.
Conclusion
Understanding the different types of charts and when to use each type can significantly improve the impact and effectiveness of your visualizations. Whether you’re comparing values, illustrating trends, showing proportions, visualizing data distribution, or identifying relationships, choosing the right chart can make all the difference in effectively conveying insights. This chapter has explored the classification of charts and types of visualizations used in data analysis, providing valuable insights into their appropriate use.