Introduction to Data

Learning outcomes:

  • Describe the difference between primary and foreign keys and explain how to use them
  • List some criteria for picking good data sets including what makes a good or bad dataset
  • Describe what data is

Would you like to download my PowerPoint to follow along?

  • What is data
    • Data is facts
    • Data is values
    • Data can be collected
  • What is information
    • Information is what we can get out of the data
    • If data is a fact, information tells us why that fact is important
      • Data: The sky is Blue
      • Information: Blue skies indicate clear weather and unlikely to rain
    • If data is a value, information is what we can do with that value
      • Data: Temp is 90°
      • Information: Not a good day to wear a sweater
    • If data is collected, information is what we can use the collection for
  • Data vs Information
    • Given the collection of data below, can you tell if I am referring to King Charles or Ozzy Osbourne?
      • Identifies as Male
      • Born in 1948
      • Raised in the UK
      • Married twice
      • Wealthy and famous
      • Has lived in a castle
    • If data is a collection of facts, information is the results of analyzing those facts to be able to get something useful from them
    • You can use information to make choices and decisions within a context
  • Collect your data
    • Data can be collected Manually or automatically
    • Data collection Methods
    • When data is collected, it's important to pay attention to the form
      • Celsius vs Fahrenheit, or meters vs feet
      • Make sure to collect the same data (if it's books, collect title AND author not OR author)
      • Make sure format such as text, or integer or float is consistent
    • Data accuracy is important! Don't fudge or make things up. Blank is better than inaccurate
    • Measure all data in the same way (if you use a ruler make sure you are consistent with where you consider the "start")
  • Examples of how to collect data well
    • Data collection methods in business
    • Make sure you have a clear plan that everyone who is collecting data can see, including identifying what you need and how it's being measured in clear and hard to misinterpret language
    • Decide if you are collecting qualitative (Open ended response such as "I feel good today") or quantitative(numbers such as year you were born) data
    • Have a clear system! Include procedures and tests of the people doing the collecting BEFORE sending them out
  • Structure your data
    • Make sure it's in the same order so you don't have confusion of things like title vs author vs editor
    • If data is starting in something like a spreadsheet, rows vs columns is important!
    • Data labels are important, make sure each row, column or other is clearly labeled
    • You may want to specify the type of data expected, such as text, or single char, or integer, or float.
  • What is a database
    • A database is a collection of information
    • Databases store and organize data so you can get to it easier
    • Databases are good for larger amounts of data
    • Some examples of where a database might be useful
      • Customer information
      • Product or store information
      • Item collection information (Pokémon, Movies, Books)
      • Patient records
      • Student records
  • Why we have databases
    • Databases allow us to store more data easier than other formats, they also allow us to organize the data in lots of ways
    • Organized data allows us to run reports and queries more easily, so we can answer certain questions about our data, such as What is the email address for the customer? How many units did they buy last month? Last year? How much product have we sold last month? Last year?
    • Databases allow us to store and process our data, but also allow multiple simulations other people to also store and process data
    • Centralized locations for data to ensure everyone is using the same data and has access to it allows for better business practices
  • How to get data into a database
    • You can import a number of different types of files including text, but CSV files are commonly used
    • Imports can also happen from other programs including spreadsheet programs or other database programs
    • If you're using SQL you can use a create table or insert command to get your data into your database
    • There are also some Graphical options depending on the database type you're using
    • You can also write a script to enter in your data
  • How databases look different on different systems
    • Databases can be access by either Graphical User Interface (GUI) or Command line
    • The system you're using will affect how the data is stored and where it's being used
  • Different between how the data is stored and how it's accessed
    • You can have many different front end options, each may have different steps like switching between MS Word and Gooogle Docs
    • Dashboards are also commonly used for people that need to see but not change the data
    • SQL Dashboards
    • Databases will save the information on your computer but that will depend on what database is installed, which can be different then the front end used
    • Databases can also be on servers and in the cloud, the front end would connect to those for viewing
    • Generally only a few people may change the data, but many more will be able to see the data
  • Examples of front ends
    • You can build your own, but a lot of people use already made options
    • Your front end can be as simple or complex as you like, some are free, some are not.
    • Some database front end options can be very complex and include data analytics and visualizations, examples include Tableau, MS Power BI, Oracle Analytics Cloud and AIMMS
  • Data Dictionary
    • A data dictionary is an explanation of the data saves in our database
    • Should be centralized so everyone using the database is seeing the same definitions
    • Clarity can be an issue
    • Names listed aren't always obvious, and descriptions can be lacking
    • Active dictionaries are created within the database and auto updated, passive are separate and must be updated manually
  • Keys
    • Primary Key
      • Unique Identifier for each record
      • Used to be Numeric only
      • You're NOT supposed to use data (i.e not social security numbers) but people do
    • Foreign Key
      • Identifier for connecting to the primary key, linking the data in the tables
      • Always point to a primary key
      • Some databases call both foreign key AND primary key the ID (BADBAD)

Suggested Activities and Discussion Topics:

Would you like to see some more classes? Click here