(guest@joequery.me)~ $ |

SQL wildcard filtering (with exercises)

(This post is part of the learning sql series.)

These notes follow along with Sams Teach Yourself SQL in 10 Minutes. If you would like to follow along with the examples, create a database and execute this sql script. Small exercises are scattered throughout this article and should be accomplished before you move on to the next section. Review problems are provided at the bottom of article.

Using wildcard filtering

Topics covered

  • The LIKE operator
  • The % wildcard
  • The _ wildcard
  • Testing LIKE with the MySQL Shell
  • Escape characters
  • Review questions

What is wildcard filtering?

Wildcard filtering is a mechanism for comparing text in a column against a search pattern.

Example

Observe the following query:

SELECT prod_id, prod_name
FROM Products
WHERE prod_name LIKE '%bean bag%';

This query will retrieve the prod_id and prod_name fields from a table named Products if the prod_name contains the string 'bean bag'. How this works will be explained in the article.

Why do we need wildcard filtering?

Without wildcard filtering, we would only be able to filter data against precisely known values.

Examples of not precisely known values

  • You don't know how to spell someone's last name, but you know what letters it starts with
  • You don't know the name of a song but you know the lyrics of the chorus
  • You want to search the descriptions of a product catalog in case you don't know the exact product name

The LIKE operator

The LIKE operator allows wildcards to be used in search clauses. It is used in place of other operators like =, !=, etc.

Example 1

SELECT prod_id, prod_name
FROM Products
WHERE prod_name LIKE 'Fish%';

The % wildcard

Within a search string (which is the search pattern provided to the LIKE operator), the % wildcard means "Match 0 or more occurrences of any character".

Example 2

Let's take a look at all the product names available in the Products table

SELECT prod_name
FROM Products;

Result:

+---------------------+
| prod_name           |
+---------------------+
| Fish bean bag toy   |
| Bird bean bag toy   |
| Rabbit bean bag toy |
| 8 inch teddy bear   |
| 12 inch teddy bear  |
| 18 inch teddy bear  |
| Raggedy Ann         |
| King doll           |
| Queen doll          |
+---------------------+
9 rows in set (0.00 sec)

With this result in mind, can you guess what the result of the following query would be?

SELECT prod_id, prod_name
FROM Products
WHERE prod_name LIKE 'Fish%';

The query translates to:

Select the columns prod_id and prod_name from the Products table. Return the rows whose prod_name begins with the string 'Fish' and is followed by 0 or more occurrences of any character.

The result of the query:

+---------+-------------------+
| prod_id | prod_name         |
+---------+-------------------+
| BNBG01  | Fish bean bag toy |
+---------+-------------------+

Example 3

What would happen if our search string does not contain any wildcards? In other words, what would be the result of the following query?

SELECT prod_id, prod_name
FROM Products
WHERE prod_name LIKE 'toy';

From the listing of product names, we see that the string 'toy' is contained in the following prod_names:

  • 'Fish bean bag toy'
  • 'Bird bean bag toy'
  • 'Rabbit bean bag toy'

What is the result of the query?

Empty set (0.00 sec)

Exercise 1: Can you explain this behavior?

Exercise 2: Can you fix the query so that the correct results are returned?

Example 4

Observe the following query and its result.

SELECT prod_id, prod_name
FROM Products
WHERE prod_name LIKE '%Fish%';

Result:

+---------+-------------------+
| prod_id | prod_name         |
+---------+-------------------+
| BNBG01  | Fish bean bag toy |
+---------+-------------------+
1 row in set (0.00 sec)

We see that the result of the query is equal to the result of the query when the search string was 'Fish%' as opposed to '%Fish%'.

Exercise 3: Can we claim that '%my search string%' will always be identical to 'my search string%'? Why or why not?

Exercise 4: Is LIKE case sensitive for the RDBMS you're using? Create a query that verifies this.

The _ wildcard

The underscore wildcard _ matches a single character in a search string.

Example 5

SELECT prod_name
FROM Products
WHERE prod_name LIKE '%bean b_g%';

Results:

+---------------------+
| prod_name           |
+---------------------+
| Fish bean bag toy   |
| Bird bean bag toy   |
| Rabbit bean bag toy |
+---------------------+
3 rows in set (0.00 sec)

The query translates to:

Select the column prod_name from the Products table. Return the rows whose prod_name begins with any number of occurences of any character, followed by 'bean b', followed by a single instance of any character, followed by 'g', followed by any number of occurrences of any character.

Exercise 5: Does the _ wildcard match 0 characters? Create a query that verifies this.

Exercise 6: Create a query that returns the prod_id and prod_name for teddy bear products less than 10 inches.

Testing LIKE with the MySQL Shell

Recall that we can perform computations in MySQL without a table for the sake of testing.

Example 6

SELECT 5 > 2;

Result:

+-------+
| 5 > 2 |
+-------+
|     1 |
+-------+
1 row in set (0.00 sec)

When performing computations that involve a yes/no answer, such as "is 5 greater than 2?", a $1$ as the result means "yes", whereas a $0$ means "no".

We can use this as a way to experiment with the LIKE operator.

Example 7

SELECT 'happy' LIKE 'h_ppy';

Result:

+----------------------+
| 'happy' LIKE 'h_ppy' |
+----------------------+
|                    1 |
+----------------------+
1 row in set (0.00 sec)

Example 8

SELECT 'hoppy' LIKE 'h_ppy';

Result:

+----------------------+
| 'hoppy' LIKE 'h_ppy' |
+----------------------+
|                    1 |
+----------------------+
1 row in set (0.00 sec)

Example 9

SELECT 'hppy' LIKE 'h_ppy';

Result:

+---------------------+
| 'hppy' LIKE 'h_ppy' |
+---------------------+
|                   0 |
+---------------------+
1 row in set (0.00 sec)

Exercise 7: Create SELECT statements that test the following strings against the search string 'j%_uery':

  • 'jquery'
  • ' jquery' (note the leading space)
  • 'jquery ' (note the trailing space)
  • 'jQuery'
  • 'JoeQuery'
  • 'joeQuery'

Exercise 8: Is the search string 'j%_uery' equivalent to the search string 'j%uery'? Why or why not?

Escape characters

What if we want to create a search pattern that involves searching for an actual underscore? Or an actual percent sign? Observe the following query.

Example 10

Suppose we want to create a search string that will match strings that begin with my_.

SELECT 'my_variable_name' LIKE 'my_%';

Result:

+--------------------------------+
| 'my_variable_name' LIKE 'my_%' |
+--------------------------------+
|                              1 |
+--------------------------------+

This correctly matches. However, an important concept when working with queries, especially string based search queries, is the idea of a false positive.

False positives

If a string we intend to match the search pattern does in fact match, that's a great start. However, search patterns have two jobs:

  1. Accept strings intended to match the pattern
  2. Reject strings that do not match the pattern

So even though the search pattern 'my_%' does match strings that begin with my_, it actually matches more than it should. Based upon our intentions, we would expect the following queries to return 0, but they actually return 1.

Example 11

SELECT 'my-variable_name' LIKE 'my_%';

Result:

+--------------------------------+
| 'my-variable_name' LIKE 'my_%' |
+--------------------------------+
|                              1 |
+--------------------------------+
1 row in set (0.00 sec)

Example 12

SELECT 'myvariable_name' LIKE 'my_%';

Result:

+-------------------------------+
| 'myvariable_name' LIKE 'my_%' |
+-------------------------------+
|                             1 |
+-------------------------------+
1 row in set (0.00 sec)

This behavior is caused by the underscore character being interpreted as the underscore wildcard. The underscore wildcard will match any single character. If we want a search pattern to include a search for an actual underscore character, we must prepend the escape character, a backslash (\).

Example 13

SELECT 'my_variable_name' LIKE 'my\_%';

Result:

+---------------------------------+
| 'my_variable_name' LIKE 'my\_%' |
+---------------------------------+
|                               1 |
+---------------------------------+
1 row in set (0.00 sec)

Example 14

SELECT 'my-variable_name' LIKE 'my\_%';

Result:

+---------------------------------+
| 'my-variable_name' LIKE 'my\_%' |
+---------------------------------+
|                               0 |
+---------------------------------+
1 row in set (0.00 sec)

Example 15

SELECT 'myvariable_name' LIKE 'my\_%';

Result:

+--------------------------------+
| 'myvariable_name' LIKE 'my\_%' |
+--------------------------------+
|                              0 |
+--------------------------------+

The string 'my_variable_name' still matches as intended, but now the strings my-variable_name and myvariable_name do not match, which is the desired behavior.

Exercise 9: Explain why false positives are harmful. Provide an example scenario where search result returning a false positive could be catastrophic.

Exercise 10: Create a search pattern that matches the following strings:

  • '45.23%'
  • '9.45%'
  • '15.99%'

Create queries that verify the search does, in fact, match.

Exercise 11: What are some limitations of the search pattern created above? What are example false positives the search pattern would match?

Review problems

The following review questions require the database provided at the beginning of the article.

Problem 1

Translate to English what each of the following queries do:

Query 1)

SELECT x,y
FROM t
WHERE x LIKE 'COSC';

Query 2)

SELECT x
FROM t
WHERE z LIKE 'Hello%';

Query 3)

SELECT x
FROM t
WHERE y LIKE '%deals';

Query 4)

SELECT x
FROM t
WHERE y LIKE '%php%';

Query 5)

SELECT y
FROM t
WHERE x LIKE 'h_ppy';

Query 6)

SELECT y
FROM t
WHERE x LIKE '%h_ppy%';

Query 7)

SELECT x,y
FROM t
WHERE x LIKE 'wikipedia.%';

Query 8)

SELECT x,y
FROM t
WHERE z LIKE '_._\% APR';

Query 9)

SELECT z
FROM t
WHERE z LIKE '\_\_%';

Query 10)

SELECT x,y
FROM t
WHERE x LIKE 'I % SQL_';

Problem 2

Write a query that returns the contact and email addresses of the customers whose zip codes begin with the number 4. Only include contacts who have provided emails.

Problem 3

Write a query that returns the name, city, and state of the vendors that exclusively sell bears.

Problem 4

Write a query that returns the name and address of customers whose address begins with exactly 3 numbers.

Problem 5

Write a query that returns the id, name, and address of Vendors whose address ends in "Street"

Problem 6

Write a query that returns the id and product name of products that are dolls according to their description.

Tagged as databases

(This post is part of the learning sql series.)

Date published - December 28, 2015