Find the sample data here:
https://gist.github.com/reinink/30210fd48ef0435f475ed9d13270b09e
select | |
first_name, | |
last_name | |
from | |
users | |
left join | |
companies on companies.id = users.company_id | |
where ( | |
companies.name like 'TERM%' or | |
first_name like 'TERM%' or | |
last_name like 'TERM%' | |
) |
Find the sample data here:
https://gist.github.com/reinink/30210fd48ef0435f475ed9d13270b09e
Companies | |
+------------+-----------------+------+-----+---------+----------------+ | |
| Field | Type | Null | Key | Default | Extra | | |
+------------+-----------------+------+-----+---------+----------------+ | |
| id | bigint unsigned | NO | PRI | NULL | auto_increment | | |
| name | varchar(255) | NO | MUL | NULL | | | |
+------------+-----------------+------+-----+---------+----------------+ | |
+-----------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+ | |
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression | | |
+-----------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+ | |
| companies | 0 | PRIMARY | 1 | id | A | 10106 | NULL | NULL | | BTREE | | | YES | NULL | | |
| companies | 1 | companies_name_index | 1 | name | A | 8624 | NULL | NULL | | BTREE | | | YES | NULL | | |
+-----------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+ | |
Users | |
+------------+-----------------+------+-----+---------+----------------+ | |
| Field | Type | Null | Key | Default | Extra | | |
+------------+-----------------+------+-----+---------+----------------+ | |
| id | bigint unsigned | NO | PRI | NULL | auto_increment | | |
| company_id | bigint unsigned | NO | MUL | NULL | | | |
| first_name | varchar(255) | NO | MUL | NULL | | | |
| last_name | varchar(255) | NO | MUL | NULL | | | |
+------------+-----------------+------+-----+---------+----------------+ | |
+-------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+ | |
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression | | |
+-------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+ | |
| users | 0 | PRIMARY | 1 | id | A | 0 | NULL | NULL | | BTREE | | | YES | NULL | | |
| users | 1 | users_company_id_foreign | 1 | company_id | A | 0 | NULL | NULL | | BTREE | | | YES | NULL | | |
| users | 1 | users_first_name_index | 1 | first_name | A | 0 | NULL | NULL | | BTREE | | | YES | NULL | | |
| users | 1 | users_last_name_index | 1 | last_name | A | 0 | NULL | NULL | | BTREE | | | YES | NULL | | |
+-------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+ |
So, just an update here, for anyone who is interested. 😂
I managed to get this query working in Laravel, as a SINGLE query, using the union
approach, where it supports searching multiple queries. 🎊
The following is my final User
model scope.
Brace yourselves. 😬
public function scopeSearch($query, string $term = null)
{
collect(str_getcsv($term, ' ', '"'))->filter()->each(function ($term) use ($query) {
$term = $term.'%';
$query->whereIn('id', function ($query) use ($term) {
$query->select('id')
->from(function ($query) use ($term) {
$query->select('users.id')
->from('users')
->where('users.first_name', 'like', $term)
->orWhere('users.last_name', 'like', $term)
->union(
$query->newQuery()
->select('users.id')
->from('users')
->join('companies', 'users.company_id', '=', 'companies.id')
->where('companies.name', 'like', $term)
);
}, 'matches');
});
});
}
And the generated query (for the search "bill gates microsoft"):
SELECT
*
FROM
`users`
WHERE
`id` in(
SELECT
`id` FROM ((
SELECT
`users`.`id` FROM `users`
WHERE
`users`.`first_name` LIKE 'bill%'
OR `users`.`last_name` LIKE 'bill%')
UNION (
SELECT
`users`.`id` FROM `users`
INNER JOIN `companies` ON `users`.`company_id` = `companies`.`id`
WHERE
`companies`.`name` LIKE 'bill%')) AS `matches`)
AND `id` in(
SELECT
`id` FROM ((
SELECT
`users`.`id` FROM `users`
WHERE
`users`.`first_name` LIKE 'gates%'
OR `users`.`last_name` LIKE 'gates%')
UNION (
SELECT
`users`.`id` FROM `users`
INNER JOIN `companies` ON `users`.`company_id` = `companies`.`id`
WHERE
`companies`.`name` LIKE 'gates%')) AS `matches`)
AND `id` in(
SELECT
`id` FROM ((
SELECT
`users`.`id` FROM `users`
WHERE
`users`.`first_name` LIKE 'microsoft%'
OR `users`.`last_name` LIKE 'microsoft%')
UNION (
SELECT
`users`.`id` FROM `users`
INNER JOIN `companies` ON `users`.`company_id` = `companies`.`id`
WHERE
`companies`.`name` LIKE 'microsoft%')) AS `matches`)
ORDER BY
`last_name` ASC, `first_name` ASC
LIMIT 15 OFFSET 0
Overall, the final results are amazing.
Running it for three terms (ie. "bill gates microsoft"), against 1 million users, and 100,000 companies results in query times between 3-6ms.
The trick? Err..tricks?
First, using a union (as folks have suggested) allows the query builder to run each query (the users match and companies match) independently of each other, and therefore use all the available indexes. This final query now successfully uses the users_first_name_index
, users_last_name_index
and companies_name_index
.
Second, and this was the piece I was missing before, you must run this union as a derived table, not a normal (correlated/dependent) subquery.
Without the derived table, all of subqueries are "dependent":
Adding the derived table makes all the problems go away:
I would have thought that the sub queries would only be dependent if you actually had some type of dependency between the two. For example, inner.id = outer.id
, or something like that. But I guess the MySQL query planner doesn't look at the conditions to see if these two queries are actually dependent.
The only outstanding issue I have still is running a 1-2 character search. For example, searching for "b", or even "bi" is rather slow, taking upwards of 800ms on a million rows. The previous approaches that don't use the indexes are faster in those cases. I suspect that this is because the derived tables end up becoming massive, and that just takes computation time. One simple solution is to just not perform the search until you have at least 3 characters, which seems to be the sweet spot.
And, now for a quick sales pitch. 😂
I plan to cover all this in detail in my upcoming Eloquent Performance Patterns course. If that sounds interesting to you, be sure to join my mailing list on that website. 🤟
@reinink - interested in the course - the above is very close to an issue I want to solve - so wondering if the course will cover it - querying unrelated data at the same time and returning not just the ID
of the model, but the table reference it is derived from, in the return?
The purpose is to allow users to just have one search box to find any one model in the search. So say you have 4 tables - COMPANY, SERVICE, PRODUCT, and DEVICE. I want to design a query that queries for a typeahead "meta" search of the description or name (or some other column) in each table, and returns the ID
and the model type
(i.e. table reference) so that if someone selected that, it would take them to that particular record. Right now these are 4 different searches on my application - in other words, the person needs to know they are looking for a DEVICE, and then search the DEVICE table.
I think this can be done with Unions - just including an AS
column (from an SO post):
(SELECT content, title, 'msg' as type FROM messages WHERE content LIKE '%" .
$keyword . "%' OR title LIKE '%" . $keyword ."%')
UNION
(SELECT content, title, 'topic' as type FROM topics WHERE content LIKE '%" .
$keyword . "%' OR title LIKE '%" . $keyword ."%')
UNION
(SELECT content, title, 'comment' as type FROM comments WHERE content LIKE '%" .
$keyword . "%' OR title LIKE '%" . $keyword ."%')";
In my case the tables have different columns in each table, so instead of say content, title
above, say the table has description, name
in one of the tables. As per your post above, however, I suspect this is not performant at all . . . and thus have not implemented it.
In any event, I am on your email list and very interested in this course and really appreciate the above gist and all the comments on it.
@mikeblas Thanks for sharing your thoughts. You'll notice that I did try this query, but it's still giving me 10ms+ (or worse) performance, where running two queries got me <1ms per query.