Query optimization is a way to make a query more efficient
Query optimization is really only useful for much more complex queries and large sets of data
When we look at performance, and simple queries, there isn't much to be done, they are what they are
You can change the queries slightly to improve time, but database tuning might not be enough
Some databases come with build in optimizers, however! do not trust them blindly! Make sure it actually works and does what it says on the tin
Why we need optimization
In the real world we have very large sets of data and very complex queries
These datasets are not 10,000 records, they can be millions of records and that will make every second or even tenth of a second count if we have to go through each record
Very complex queries have a lot of moving parts and can be put together in many ways to solve our problems
An index is a way to copy some columns of the data so that we can get the information faster instead of having to sort through the whole table
You can also think of it as a map to the table, or shortcuts to the data. Like a cheat code of a hidden level in Mario to get to the last world
Indexes usually have a way to refer back to the original data using a key or direct link
Think of a library index
Index basics
How can we tell when an index might be useful?
If you look at your query plan and see a scan that goes over each record in sequence that might be a good place to have an index
Indexes can be created on any column in a database
Technically you can create indexes on multiple columns
There is also a concept of a partial index or filtered index, where you can include part of the rows as well
For example, a partial index could be used when you need to track a shipment that is in the warehouse, but not ordered or shipped or delivered so you can focus your queries more
What kind of changes and improvements can we expect?
Index creation time is dependent on the size of the table, this could be a few seconds or several
If your query is using the index, there is a direct relationship between how long it takes for the index to be created and how much time you can save using them
Indexes will use data structures, the different databases will likely have different names for each data structure
Some popular ways to implement indexes are hashes and trees (self-balanced binary search trees and B+ trees specifically)
Most databases will create an index based on a primary key
Some databases might also create an index for foreign keys
There is a concept of a covering index, where what you need is actually in the index and you don't need to go find the record (The library card has the info, you don't need the book)
There are no standards for how indexes are created
How to tell if it is an improvement
The cost of your query and alternative plans for your query is a hard problem to solve
At this point in time, hardware isn't usually the issue, disk space is cheap, and adding more servers isn't always the answer
You can't bake a loaf of bread faster with more ovens
There can be clear metrics such as "users complained about load times less this month" or "Users complained A LOT about how slow their shopping was"
How long the query takes to run can be variable enough it may not be the best indicator of improvements
Consider other metrics to also look for such as CPU use, or I/O
Another metric you could consider is how understandable your query and plan is to future people
Other considerations
One thing to consider is the amount of time saved by what you're doing, is this a onetime cost to save a lot of time later, or is the a large cost upfront for a thing you don't do very often
Another thing to think about is "what is reasonable" Is this query actually possible to run in the requested time, or is this request not in the bounds of reality
Is the slowness of the query something outside of your control? Is there another option? Such as running reports off hours, or creating less reports because they are being sent to the cabinet of no return anyway?