(This post is part of the learning sql series.)
SQL wildcard filtering (with exercises)
These notes follow along with Sams Teach Yourself SQL in 10 Minutes. If you would like to follow along with the examples, create a database and execute this sql script. Small exercises are scattered throughout this article and should be accomplished before you move on to the next section. Review problems are provided at the bottom of article.
Using wildcard filtering
Topics covered
- The
LIKE
operator - The
%
wildcard - The
_
wildcard - Testing LIKE with the MySQL Shell
- Escape characters
- Review questions
What is wildcard filtering?
Wildcard filtering is a mechanism for comparing text in a column against a search pattern.
Example
Observe the following query:
SELECT prod_id, prod_name FROM Products WHERE prod_name LIKE '%bean bag%';
This query will retrieve the prod_id
and prod_name
fields from a table named
Products
if the prod_name
contains the string 'bean bag'
. How this works
will be explained in the article.
Why do we need wildcard filtering?
Without wildcard filtering, we would only be able to filter data against precisely known values.
Examples of not precisely known values
- You don't know how to spell someone's last name, but you know what letters it starts with
- You don't know the name of a song but you know the lyrics of the chorus
- You want to search the descriptions of a product catalog in case you don't know the exact product name
The LIKE
operator
The LIKE
operator allows wildcards to be used in search clauses. It is used in
place of other operators like =
, !=
, etc.
Example 1
SELECT prod_id, prod_name FROM Products WHERE prod_name LIKE 'Fish%';
The %
wildcard
Within a search string (which is the search pattern provided to the LIKE
operator), the %
wildcard means "Match 0 or more occurrences of any
character".
Example 2
Let's take a look at all the product names available in the Products table
SELECT prod_name FROM Products;
Result:
+---------------------+ | prod_name | +---------------------+ | Fish bean bag toy | | Bird bean bag toy | | Rabbit bean bag toy | | 8 inch teddy bear | | 12 inch teddy bear | | 18 inch teddy bear | | Raggedy Ann | | King doll | | Queen doll | +---------------------+ 9 rows in set (0.00 sec)
With this result in mind, can you guess what the result of the following query would be?
SELECT prod_id, prod_name FROM Products WHERE prod_name LIKE 'Fish%';
The query translates to:
Select the columns prod_id and prod_name from the Products table. Return the rows whose prod_name begins with the string 'Fish' and is followed by 0 or more occurrences of any character.
The result of the query:
+---------+-------------------+ | prod_id | prod_name | +---------+-------------------+ | BNBG01 | Fish bean bag toy | +---------+-------------------+
Example 3
What would happen if our search string does not contain any wildcards? In other words, what would be the result of the following query?
SELECT prod_id, prod_name FROM Products WHERE prod_name LIKE 'toy';
From the listing of product names, we see that the string 'toy'
is contained
in the following prod_names:
'Fish bean bag toy'
'Bird bean bag toy'
'Rabbit bean bag toy'
What is the result of the query?
Empty set (0.00 sec)
Exercise 1: Can you explain this behavior?
Exercise 2: Can you fix the query so that the correct results are returned?
Example 4
Observe the following query and its result.
SELECT prod_id, prod_name FROM Products WHERE prod_name LIKE '%Fish%';
Result:
+---------+-------------------+ | prod_id | prod_name | +---------+-------------------+ | BNBG01 | Fish bean bag toy | +---------+-------------------+ 1 row in set (0.00 sec)
We see that the result of the query is equal to the result of the query when the
search string was 'Fish%'
as opposed to '%Fish%'
.
Exercise 3: Can we claim that '%my search string%'
will always be identical
to 'my search string%'
? Why or why not?
Exercise 4: Is LIKE case sensitive for the RDBMS you're using? Create a query that verifies this.
The _
wildcard
The underscore wildcard _
matches a single character in a search string.
Example 5
SELECT prod_name FROM Products WHERE prod_name LIKE '%bean b_g%';
Results:
+---------------------+ | prod_name | +---------------------+ | Fish bean bag toy | | Bird bean bag toy | | Rabbit bean bag toy | +---------------------+ 3 rows in set (0.00 sec)
The query translates to:
Select the column prod_name from the Products table. Return the rows whose prod_name begins with any number of occurences of any character, followed by 'bean b', followed by a single instance of any character, followed by 'g', followed by any number of occurrences of any character.
Exercise 5: Does the _
wildcard match 0 characters? Create a query
that verifies this.
Exercise 6: Create a query that returns the prod_id and prod_name for teddy bear products less than 10 inches.
Testing LIKE with the MySQL Shell
Recall that we can perform computations in MySQL without a table for the sake of testing.
Example 6
SELECT 5 > 2;
Result:
+-------+ | 5 > 2 | +-------+ | 1 | +-------+ 1 row in set (0.00 sec)
When performing computations that involve a yes/no answer, such as "is 5 greater than 2?", a $1$ as the result means "yes", whereas a $0$ means "no".
We can use this as a way to experiment with the LIKE
operator.
Example 7
SELECT 'happy' LIKE 'h_ppy';
Result:
+----------------------+ | 'happy' LIKE 'h_ppy' | +----------------------+ | 1 | +----------------------+ 1 row in set (0.00 sec)
Example 8
SELECT 'hoppy' LIKE 'h_ppy';
Result:
+----------------------+ | 'hoppy' LIKE 'h_ppy' | +----------------------+ | 1 | +----------------------+ 1 row in set (0.00 sec)
Example 9
SELECT 'hppy' LIKE 'h_ppy';
Result:
+---------------------+ | 'hppy' LIKE 'h_ppy' | +---------------------+ | 0 | +---------------------+ 1 row in set (0.00 sec)
Exercise 7: Create SELECT
statements that test the following strings against
the search string 'j%_uery'
:
'jquery'
' jquery'
(note the leading space)'jquery '
(note the trailing space)'jQuery'
'JoeQuery'
'joeQuery'
Exercise 8: Is the search string 'j%_uery'
equivalent to the search
string 'j%uery'
? Why or why not?
Escape characters
What if we want to create a search pattern that involves searching for an actual underscore? Or an actual percent sign? Observe the following query.
Example 10
Suppose we want to create a search string that will match strings that begin
with my_
.
SELECT 'my_variable_name' LIKE 'my_%';
Result:
+--------------------------------+ | 'my_variable_name' LIKE 'my_%' | +--------------------------------+ | 1 | +--------------------------------+
This correctly matches. However, an important concept when working with queries, especially string based search queries, is the idea of a false positive.
False positives
If a string we intend to match the search pattern does in fact match, that's a great start. However, search patterns have two jobs:
- Accept strings intended to match the pattern
- Reject strings that do not match the pattern
So even though the search pattern 'my_%'
does match strings that begin with
my_
, it actually matches more than it should. Based upon our intentions, we
would expect the following queries to return 0, but they actually return 1.
Example 11
SELECT 'my-variable_name' LIKE 'my_%';
Result:
+--------------------------------+ | 'my-variable_name' LIKE 'my_%' | +--------------------------------+ | 1 | +--------------------------------+ 1 row in set (0.00 sec)
Example 12
SELECT 'myvariable_name' LIKE 'my_%';
Result:
+-------------------------------+ | 'myvariable_name' LIKE 'my_%' | +-------------------------------+ | 1 | +-------------------------------+ 1 row in set (0.00 sec)
This behavior is caused by the underscore character being interpreted as the
underscore wildcard. The underscore wildcard will match any single character. If
we want a search pattern to include a search for an actual underscore character,
we must prepend the escape character, a backslash (\
).
Example 13
SELECT 'my_variable_name' LIKE 'my\_%';
Result:
+---------------------------------+ | 'my_variable_name' LIKE 'my\_%' | +---------------------------------+ | 1 | +---------------------------------+ 1 row in set (0.00 sec)
Example 14
SELECT 'my-variable_name' LIKE 'my\_%';
Result:
+---------------------------------+ | 'my-variable_name' LIKE 'my\_%' | +---------------------------------+ | 0 | +---------------------------------+ 1 row in set (0.00 sec)
Example 15
SELECT 'myvariable_name' LIKE 'my\_%';
Result:
+--------------------------------+ | 'myvariable_name' LIKE 'my\_%' | +--------------------------------+ | 0 | +--------------------------------+
The string 'my_variable_name'
still matches as intended, but now the strings
my-variable_name
and myvariable_name
do not match, which is the desired
behavior.
Exercise 9: Explain why false positives are harmful. Provide an example scenario where search result returning a false positive could be catastrophic.
Exercise 10: Create a search pattern that matches the following strings:
'45.23%'
'9.45%'
'15.99%'
Create queries that verify the search does, in fact, match.
Exercise 11: What are some limitations of the search pattern created above? What are example false positives the search pattern would match?
Review problems
The following review questions require the database provided at the beginning of the article.
Problem 1
Translate to English what each of the following queries do:
Query 1)
SELECT x,y FROM t WHERE x LIKE 'COSC';
Query 2)
SELECT x FROM t WHERE z LIKE 'Hello%';
Query 3)
SELECT x FROM t WHERE y LIKE '%deals';
Query 4)
SELECT x FROM t WHERE y LIKE '%php%';
Query 5)
SELECT y FROM t WHERE x LIKE 'h_ppy';
Query 6)
SELECT y FROM t WHERE x LIKE '%h_ppy%';
Query 7)
SELECT x,y FROM t WHERE x LIKE 'wikipedia.%';
Query 8)
SELECT x,y FROM t WHERE z LIKE '_._\% APR';
Query 9)
SELECT z FROM t WHERE z LIKE '\_\_%';
Query 10)
SELECT x,y FROM t WHERE x LIKE 'I % SQL_';
Problem 2
Write a query that returns the contact and email addresses of the customers whose zip codes begin with the number 4. Only include contacts who have provided emails.
Problem 3
Write a query that returns the name, city, and state of the vendors that exclusively sell bears.
Problem 4
Write a query that returns the name and address of customers whose address begins with exactly 3 numbers.
Problem 5
Write a query that returns the id, name, and address of Vendors whose address ends in "Street"
Problem 6
Write a query that returns the id and product name of products that are dolls according to their description.