Query Optimization

Learning outcomes:

  • Describe a query plan
  • Describe an index and when to use one

Would you like to download my PowerPoint to follow along?

  • Query plans and times
    • Each query has a time that it takes to run
    • Each query has a plan, this is the steps the database is using to figure out the results to give you for the query
    • Each query can have different amounts of time needed to run based on how it's put together
    • Some databases will come with a built in way to see the plan for your query, this will likely include things like operator attributes and types
    • Query execution plans in SQL
  • What is Query Optimization
    • Query optimization is a way to make a query more efficient
    • Query optimization is really only useful for much more complex queries and large sets of data
    • When we look at performance, and simple queries, there isn't much to be done, they are what they are
    • You can change the queries slightly to improve time, but database tuning might not be enough
    • Some databases come with build in optimizers, however! do not trust them blindly! Make sure it actually works and does what it says on the tin
  • Why we need optimization
    • In the real world we have very large sets of data and very complex queries
    • These datasets are not 10,000 records, they can be millions of records and that will make every second or even tenth of a second count if we have to go through each record
    • Very complex queries have a lot of moving parts and can be put together in many ways to solve our problems
    • SQL query optimization — How to determine when and if it's needed
  • Ok, but where do we start
    • Indexes: Front line defense for improvement
    • What is an Index?
      • An index is a way to copy some columns of the data so that we can get the information faster instead of having to sort through the whole table
      • You can also think of it as a map to the table, or shortcuts to the data. Like a cheat code of a hidden level in Mario to get to the last world
      • Indexes usually have a way to refer back to the original data using a key or direct link
      • Think of a library index
  • Index basics
    • How can we tell when an index might be useful?
      • If you look at your query plan and see a scan that goes over each record in sequence that might be a good place to have an index
    • Indexes can be created on any column in a database
    • Technically you can create indexes on multiple columns
    • There is also a concept of a partial index or filtered index, where you can include part of the rows as well
      • For example, a partial index could be used when you need to track a shipment that is in the warehouse, but not ordered or shipped or delivered so you can focus your queries more
    • Use The Index, Luke: A guide to database performance for developers
  • What kind of changes and improvements can we expect?
    • Index creation time is dependent on the size of the table, this could be a few seconds or several
    • If your query is using the index, there is a direct relationship between how long it takes for the index to be created and how much time you can save using them
    • Indexes can take a lot of space
    • Moves a sequential search to a tree search
  • Warning: Multi-column indexes
    • If you need a multi-column index, you should consider revising your schema instead
    • Multi-column indexes can increase the time it takes to add new rows to the database
    • If there isn't a lot of changes or writing happening to the database this might not be an issue
    • If we continue the library index card example, this would be the cards sorted by author, and once we got the author, it was then sorted by title
  • Types of indexes
    • Different types of indexes may have both pros and cons
    • Different types of indexes will use different data structures internally
    • Mostly the databases will make this choice for you, you don't have to pick which one is used
    • Some examples of types include clustered index, hash index, bitmap index and filtered index
    • There are a lot more types, some are only available on some systems or in some databases (such as bitmap only available on Oracle)
    • What Are the Types of Indexes in a Relational Database?
  • Index implementation
    • Indexes will use data structures, the different databases will likely have different names for each data structure
    • Some popular ways to implement indexes are hashes and trees (self-balanced binary search trees and B+ trees specifically)
    • Most databases will create an index based on a primary key
    • Some databases might also create an index for foreign keys
    • There is a concept of a covering index, where what you need is actually in the index and you don't need to go find the record (The library card has the info, you don't need the book)
    • There are no standards for how indexes are created
  • How to tell if it is an improvement
    • The cost of your query and alternative plans for your query is a hard problem to solve
    • At this point in time, hardware isn't usually the issue, disk space is cheap, and adding more servers isn't always the answer
      • You can't bake a loaf of bread faster with more ovens
    • There can be clear metrics such as "users complained about load times less this month" or "Users complained A LOT about how slow their shopping was"
    • How long the query takes to run can be variable enough it may not be the best indicator of improvements
    • Consider other metrics to also look for such as CPU use, or I/O
    • Another metric you could consider is how understandable your query and plan is to future people
  • Other considerations
    • One thing to consider is the amount of time saved by what you're doing, is this a onetime cost to save a lot of time later, or is the a large cost upfront for a thing you don't do very often
    • Another thing to think about is "what is reasonable" Is this query actually possible to run in the requested time, or is this request not in the bounds of reality
    • Is the slowness of the query something outside of your control? Is there another option? Such as running reports off hours, or creating less reports because they are being sent to the cabinet of no return anyway?

Suggested Activities and Discussion Topics:

Would you like to see some more classes? Click here