Data Visualization

Learning outcomes:

  • Describe some ways to share your query results with others
  • List some visualizations options with their pros/cons
  • Demonstrate how to create an appropriate set of data visualizations

Would you like to download my PowerPoint to follow along?

  • Why we visualize data
    • People do better with pictures
    • Lists of data and numbers don't mean much to most
    • Pic is worth a thousand words
    • Databases are collections of data, so all the data visualization options can be used for databases
    • Reports (next week) are commonly used and will frequently have images and summaries
    • Analytics and business intelligence are common and will use a lot of visuals
  • Visualizations
    • Showing database relationships and how the schema works
    • Showing portions of the data and query results
    • Visualize query or database performance
    • Types of visualizations:Charts, maps, graphs, live widgets, and dashboards
    • Auto and manual visualizations - is the visualization made by a person, or automatically done by the program/database/tool
    • AI based visualizations - AI is getting used for a lot of art and visualizations, however the caveats and warnings for all AI still apply here, including ethics
  • Example: Charts and Graphs
    • Charts are the overall term for data visuals
    • Charts are used to show large amounts of data in a way where patterns are more clear to see
    • All graphs are charts, not all charts are graphs
    • Graphs are usually used to show raw data trends over time
    • Line and bar graphs are two really common options
  • Example: Dashboards and Widgets
    • Dashboards are how we can visualize what's going on with information
    • Widgets are the things that make up the dashboard
    • Each widget is designed to show some piece of info
    • Widgets should be kept to key metrics, don't add widgets unless you are sure of their value
    • Widgets can have different refresh rates depending on how important it is to have up to date data
  • How to tell if it's a good visualization
    • Scalability - How well it works with larger amounts of data
    • Readability - How well you can understand what the visualization is trying to communicate
    • Useability - Is this data something we need, and does this illustrate that well
    • Interactivity - Is this visualization one we can change on the fly, or is it a static visual
    • Aesthetics - How pretty is it
    • Accessibility - This can be a tough thing to do because there can be a wide variety of things that keeps something inaccessible. Some questions we can ask are
      • To whom is it accessible
      • Under what conditions?
      • For which tasks?
    • LibGuide for evaluating data visualizations
    • Example of a good visualization from Information is Beautiful
  • What makes a bad visualization?
    • Visualizations that mislead the viewer, either on purpose or by accident
    • Hiding relevant data, or inaccurately representing data by changing things like scale and proportion, where the chart starts/ends are usually trying to falsely lead you somewhere
    • Showing too much data to confuse the viewer either obviously like a lot of 3D graphs, or more subtly to give the impression of well thought out analysis when in reality it's trying to hide things
    • Lack of context, labels, or any way to tell what the visualization is about and why it was made
    • Using the right data, but in confusing ways to try and lead the viewer into thinking you're saying one thing but you really mean another
    • Examples of bad visualizations
  • Example: Averages
    • One common issue with visualizations is showing data in ways that are misrepresentations, or willfully misleading
    • For example, if the average salary is €2000 (pretend that's good) then you might think yay people make good money, but if you look at the numbers, 9 of 10 people make €1000 and one person made €10,000 euro you'd see it's actually not great and misrepresents the affluency of the area
    • Let's say we're looking at response time, the average makes it look faster than it really is, the median makes it seem closer to reasonable, but it doesn't show the very fast but very failed transactions
    • Averages can be useful, but you need to make sure they are accurately representing your data
  • Example: Percentiles
    • Percentiles will show more accurate information in some cases because they include how much of the data is represented by the visualization
    • "To calculate the 10th percentile, let's say we have 10,000 values. We take all of the values, order them from smallest to largest, and identify the 1001st value (where 1000 or 10% of the values are below it), which will be our 10th percentile" from this blog on percentiles vs averages
    • Having this insight can allow a company to figure out there are problems faster, and respond faster
    • Companies don't like lots of tickets, or long wait times on tickets, predicting issues and seeing them quickly can mitigate those
    • Another article on percentiles vs averages
  • Data visualizations and accessibility
    • Data visualizations can be tough because of everything from labelling issues, to colour or colour contrast, to lack of alt text
    • Accessibility should be baked in to what you're doing, not seen as an after thought
    • Keeping your visualizations simple can help because they can be easier to describe and offer alternates for
    • Being mindful of colours and contrast is helpful for both text and images. If someone can't see colours, does your visualization still convey your meaning? If not, can you change it so it does?
    • Think about offering different formats so that it's easier for everyone to understand what you're trying to share
    • Accessibility article on data visualizations from Harvard
  • Examples of tools

Suggested Activities and Discussion Topics:

Would you like to see some more classes? Click here