Using Explain function of MySQL to optimize query
When database queries are slow a good approach is to use EXPLAIN to get the execution plan of the queries. But the result is not always obvious. In this post I show you how I use EXPLAIN to solve a real performance issue.
Like so often, the feature was developed with few hundreds lines in the database. After few times in production, the table has 2.5M rows and weigth 3.6Go on the disk. There is minimum 5 queries on the page (could be more because one query is in for loop, depending of the usage) and PHP code timeout because querying database takes too much time.
I didn't try to solve this with the full table, I extract data starting in certain point of time and end up with 1M rows. It's enough to have execution time trouble.
So my starting point is: 1 table, 1M rows (~1Go on disk), 5 queries and it tooks 1950 ms in total.
Here are the queries
SELECT DISTINCT (l0_.id_job_execution) AS sclr_0 FROM lib_events l0_ WHERE l0_.id_job_execution > 81168 AND l0_.source_guid = 'SHIFT' ORDER BY l0_.id_job_execution ASC; SELECT COUNT(l0_.id) AS sclr_0 FROM lib_events l0_ WHERE l0_.id_job_execution = 81169 AND l0_.source_guid = 'SHIFT' AND (l0_.geo_status = 'NULL' OR l0_.geo_status IS NULL OR l0_.geo_status = 'pending') AND l0_.status <> 'Deleted'; SELECT COUNT(l0_.id) AS sclr_0 FROM lib_events l0_ WHERE l0_.id_job_execution IN ('81169') AND l0_.id_job_execution > 81168 AND l0_.source_guid = 'SHIFT' ORDER BY l0_.shift_id ASC, l0_.id_job_execution ASC; SELECT id FROM ( SELECT id, ROW_NUMBER() OVER (partition by pro_guid ORDER BY id_job_execution DESC) AS rang FROM lib_events WHERE id_job_execution IN ('81169') AND id_job_execution > 81168 AND source_guid = 'SHIFT' ) temp WHERE rang = 1; SELECT * FROM lib_events l0_ WHERE l0_.id IN (?) LIMIT 1000;
I simplified last query, * is list of 34 columns and IN clause contains result of forth query so a lot more than 1000 id.
The first query is simple but tooks 1000ms to execute. Running explain give this result
> EXPLAIN SELECT DISTINCT (l0_.id_job_execution) AS sclr_0 FROM lib_events l0_ WHERE l0_.id_job_execution > 81168 AND l0_.source_guid = 'SHIFT' ORDER BY l0_.id_job_execution ASC \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: l0_ type: range possible_keys: liberal_event_id_job key: liberal_event_id_job key_len: 5 ref: NULL rows: 182228 Extra: Using index condition; Using where
Second query took 31ms.
> EXPLAIN SELECT COUNT(l0_.id) AS sclr_0 FROM lib_events l0_ WHERE l0_.id_job_execution = 81169 AND l0_.source_guid = 'SHIFT' AND (l0_.geo_status = 'NULL' OR l0_.geo_status IS NULL OR l0_.geo_status = 'pending') AND l0_.status <> 'Deleted' \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: l0_ type: range possible_keys: liberal_event_id_job,liberal_event_status,liberal_event_geo_status key: liberal_event_geo_status key_len: 303 ref: NULL rows: 46502 Extra: Using index condition; Using where
Third query took 140ms
> EXPLAIN SELECT COUNT(l0_.id) AS sclr_0 FROM lib_events l0_ WHERE l0_.id_job_execution IN ('81169') AND l0_.id_job_execution > 81168 AND l0_.source_guid = 'SHIFT' ORDER BY l0_.shift_id ASC, l0_.id_job_execution ASC \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: l0_ type: ref possible_keys: liberal_event_id_job key: liberal_event_id_job key_len: 5 ref: const rows: 175692 Extra: Using index condition; Using where
Fourth query took 217ms
> EXPLAIN SELECT id FROM ( SELECT id, ROW_NUMBER() OVER (partition by pro_guid ORDER BY id_job_execution DESC) AS rang FROM lib_events WHERE id_job_execution IN ('81169') AND id_job_execution > 81168 AND source_guid = 'SHIFT' ) temp WHERE rang = 1 \G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: <derived2> type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 175692 Extra: Using where *************************** 2. row *************************** id: 2 select_type: DERIVED table: lib_events type: ref possible_keys: liberal_event_id_job key: liberal_event_id_job key_len: 5 ref: const rows: 175692 Extra: Using index condition; Using where; Using temporary
Last query took 212ms
EXPLAIN SELECT * FROM lib_events l0_ WHERE l0_.id IN (?) LIMIT 1000 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: l0_ type: range possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 175692 Extra: Using where
We can see that all queries use an index to run, it's the content of key line (6th) in the explain result. Some queries have multiple index choice as show in possible_index line (5th).
The first query is very simple but it tooks 1000ms, that a lot of time for a simple query. We can't improve ORDER BY clause but we could improve WHERE clause. We could create a new index containing both id_job_execution and source_guid fields, so WHERE clause will be fully covered by one index.
> CREATE INDEX IF NOT EXISTS test_source_job ON `lib_events` (`source_guid`, `id_job_execution`); Query OK, 0 rows affected (2.973 sec) Records: 0 Duplicates: 0 Warnings: 0 > EXPLAIN SELECT DISTINCT (l0_.id_job_execution) AS sclr_0 FROM lib_events l0_ WHERE l0_.id_job_execution > 81168 AND l0_.source_guid = 'SHIFT' ORDER BY l0_.id_job_execution ASC \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: l0_ type: range possible_keys: liberal_event_id_job,test_source_job key: test_source_job key_len: 53 ref: NULL rows: 179312 Extra: Using where; Using index
New explain show that the index is used (line key). Rows line (9th) shows MySQL is working with little less rows from 182 228 to 179 312. Extra column line Using index instead of Using index condition, it's mean WHERE clause is fully covered by index, it was our goal. In fact execution time of this query just drop to 32ms. Wow it's a big win.
The 2nd query didn't benefit from this new index, running explain show the new index is in possible_keys list but it's still use the same index. Let's try adding another index that will cover all fields from the WHERE clause like we do in previous query.
> CREATE INDEX IF NOT EXISTS test_source_job_geo_status ON `lib_events` (`source_guid`, `id_job_execution`, `geo_status`, `status`); Query OK, 0 rows affected (3.793 sec) Records: 0 Duplicates: 0 Warnings: 0 > EXPLAIN SELECT COUNT(l0_.id) AS sclr_0 FROM lib_events l0_ WHERE l0_.id_job_execution = 81169 AND l0_.source_guid = 'SHIFT' AND (l0_.geo_status = 'NULL' OR l0_.geo_status IS NULL OR l0_.geo_status = 'pending') AND l0_.status <> 'Deleted' \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: l0_ type: range possible_keys: liberal_event_id_job,liberal_event_status,liberal_event_geo_status,test_source_job,test_source_job_geo_status key: test_source_job_geo_status key_len: 659 ref: NULL rows: 6 Extra: Using where; Using index
Explain show the same evolution as first query with use of the new index. Execution time drop from 31 to 4ms.
Third query should benefit from the first index, let's use explain to control that.
> EXPLAIN SELECT COUNT(l0_.id) AS sclr_0 FROM lib_events l0_ WHERE l0_.id_job_execution IN ('81169') AND l0_.id_job_execution > 81168 AND l0_.source_guid = 'SHIFT' ORDER BY l0_.shift_id ASC, l0_.id_job_execution ASC \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: l0_ type: index_merge possible_keys: liberal_event_id_job,test_source_job,test_source_job_geo_status key: liberal_event_id_job,test_source_job key_len: 5,53 ref: NULL rows: 87846 Extra: Using intersect(liberal_event_id_job,test_source_job); Using where
We see the new index is used in combination with another index liberal_event_id_job. That's strange because this index has the field id_job_execution that is present in our new index and execution time is the same. So we have no gain. This query should be faster, let's take a look at the query. Oh wait!, is there an ORDER BY clause and only a COUNT() in the SELECT clause? ORDER BY is known to be slow, it adds a lot of work on the server. But in the case of this request, it's completely useless.
Let's try again without ORDER BY clause.
> EXPLAIN SELECT COUNT(l0_.id) AS sclr_0 FROM lib_events l0_ WHERE l0_.id_job_execution IN ('81169') AND l0_.id_job_execution > 81168 AND l0_.source_guid = 'SHIFT' \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: l0_ type: ref possible_keys: liberal_event_id_job,test_source_job,test_source_job_geo_status key: test_source_job key_len: 53 ref: const,const rows: 179312 Extra: Using where; Using index
Ok this time only our new index is use and execution time drop from 140ms to 23ms.
Last 2 queries took 200ms each and they are working together. The fourth query help us find ids that will be used with IN in 5th query. The fifth query is really simple, it should not take that long to execute. Looking at the explain, we show the use of PRIMARY KEY as index and it works on 179 312 rows, nothing mutch to do here. But the query contains a LIMIT to return only 1000 rows. So what happend if we move the LIMIT in the 4th query?
> EXPLAIN SELECT * FROM lib_events l0_ WHERE l0_.id IN (?) \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: l0_ type: range possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 1000 Extra: Using where
As expected EXPLAIN shows the query work with 1000 rows only and execution time drop from 200ms to 10ms. It's a big win, also it's save time in PHP program because 4th query return a smaller dataset.
Adding the limit to the 4th query change nothing, it's not faster nor slower.
I tried many thing to optimize the 4th request like using CTE that is suppose to be more optimize but nothing changed. And finally after a good night, I was wondering if the field in PARTITION BY should not be part of the index. So let's try adding another index covering the WHERE clause but also PARTITION BY.
> CREATE INDEX IF NOT EXISTS test_source_job_guid ON `lib_events` (`source_guid`, `id_job_execution`, `pro_guid`); Query OK, 0 rows affected (4.839 sec) Records: 0 Duplicates: 0 Warnings: 0 > EXPLAIN SELECT id FROM ( SELECT id, ROW_NUMBER() OVER (partition by pro_guid ORDER BY id_job_execution DESC) AS rang FROM lib_events WHERE id_job_execution IN ('81169') AND source_guid = 'SHIFT' ) temp WHERE rang = 1 LIMIT 0, 1000 \G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: <derived2> type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 199362 Extra: Using where *************************** 2. row *************************** id: 2 select_type: DERIVED table: lib_events type: ref possible_keys: liberal_event_id_job,test_source_job,test_source_job_geo_status,test_source_job_guid key: test_source_job_guid key_len: 53 ref: const,const rows: 199362 Extra: Using where; Using index; Using temporary
Like with other we can see in Extra line that now the query use index and not condition on index. And yes it's faster execution time drop from 200ms to 90ms. PARTITION BY benefits from the index.
Indexes took time to be updated when data are modified in the table and also take space on disk. It's a good idea to not have too many indexes. In our case first index on fields source_guid and id_job_execution is duplicate by the last index on fields source_guid, id_job_execution and pro_guid. MySQL can use an index even if it's contains more columns. We can remove the first index and use EXPLAIN to control that MySQL use the last index instead and perfomance still the same.
I hope this post help you understand use of EXPLAIN function that is a powerfull tool in MySQL but often ignored or misused by people. Sometimes the problem is not with missing index but inside the query like the 3rd query. In that case reading carrefully EXPLAIN result can help us see there is something wrong and make us focus on the query.
Add a comment