(guest@joequery.me)~ $ |

SQL calculated fields (with exercises)

(This post is part of the learning sql series.)

These notes follow along with Sams Teach Yourself SQL in 10 Minutes. If you would like to follow along with the examples, create a database and execute this sql script. Small exercises are scattered throughout this article and should be accomplished before you move on to the next section. Review problems are provided at the bottom of article.

Calculated Fields

Topics covered

  • What are calculated fields?
  • Why do we need calculated fields?
  • The need for aliases
  • Creating field aliases
  • Review problems

What are calculated fields?

Calculated fields are calculations, conversions, or format manipulations on a field value.

Example 1

Observe the following plain query.

SELECT prod_price
FROM Products
ORDER BY prod_price
LIMIT 5;

Results:

+------------+
| prod_price |
+------------+
|       3.49 |
|       3.49 |
|       3.49 |
|       4.99 |
|       5.99 |
+------------+

This query returns the 5 lowest product prices. What if we wanted the product prices to be formatted with a dollar sign in front? We can create a calculated field using the MySQL CONCAT function to accomplish this.

SELECT CONCAT('$', prod_price)
FROM Products
ORDER BY prod_price
LIMIT 5;

Results:

+-------------------------+
| CONCAT('$', prod_price) |
+-------------------------+
| $3.49                   |
| $3.49                   |
| $3.49                   |
| $4.99                   |
| $5.99                   |
+-------------------------+

We have successfully manipulated the data in the query result without altering the data in the database itself, which is exactly the purpose of a calculated field.

Why do we need calculated fields?

The ability to manipulate field values in query results without altering the data in the database itself provides many significant benefits. Some examples include:

  1. Data can be stored with the appropriate data type
  2. Formatting/conversion is generally faster through the RDBMS in comparison to the application layer (i.e. PHP, Python)

Exercise 1: Provide 3 examples of data in every day life that may be stored in one format but is commonly displayed in other formats.

Data can be stored with the appropriate data type

If we want to display a prod_price of 3.49 with a leading dollar sign, the alternative to using the CONCAT function without resorting to the application layer would be to store the string '$3.49'.

If we stored this string in place of the float 3.49 we will lose the ability to perform arithmetic operations on the value. This means no adding (thus no order total functionality) and no sorting of prices.

If we stored this string in its own column, say prod_price_currency.

+---------+---------+---------------------+------------+---------------------+
| prod_id | vend_id | prod_name           | prod_price | prod_price_currency |
+---------+---------+---------------------+------------+---------------------+
| BNBG01  | DLL01   | Fish bean bag toy   |       3.49 |               $3.49 |
+---------+---------+---------------------+------------+---------------------+

While this may seem like a decent compromise, this approach introduces data duplication, which is extremely dangerous. Any adjustments to the prod_price would need to be reflected in the prod_price_currency column or the data will be invalid. Suppose the price of a product goes up but the string version is not updated.

+---------+---------+---------------------+------------+---------------------+
| prod_id | vend_id | prod_name           | prod_price | prod_price_currency |
+---------+---------+---------------------+------------+---------------------+
| BNBG01  | DLL01   | Fish bean bag toy   |       9.49 |               $3.49 |
+---------+---------+---------------------+------------+---------------------+

The '$3.49' price will be displayed to the user, but their actual total when it comes time to pay would be 9.49.

RDBMS formatting faster than application layer formatting

An RDBMs such as MySQL is specifically designed for applying calculation or formatting functions to large amounts of data. For example, it's faster to query MySQL for the average of a certain column as opposed to retrieving the column for each row and using a foreach loop in PHP to calculate the average.

List of MySQL Functions and Operators

The complete list of functions and operators is documented in the Function and Operator Reference section of the MySQL documentation.

Note that not all of the functions provided are used for calculated fields. Some functions, such as AVG(), are aggregation functions which we will cover soon.

Example function - ROUND()

Visit the Function and Operator Reference and visit the reference page for the ROUND() function. The function header according to the documentation is:

ROUND(X), ROUND(X,D)

Rounds the argument X to D decimal places. The rounding algorithm depends on the data type of X. D defaults to 0 if not specified. D can be negative to cause D digits left of the decimal point of the value X to become zero.

Remember that we can use SELECT statements that do not interact with the database to test out various functions and operators. Let's test out the ROUND() function.

Example 2

SELECT ROUND(123.4567, 2);

Result:

+---------------------+
| ROUND(123.4567, 2)  |
+---------------------+
| 123.46              |
+---------------------+
1 row in set (0.00 sec)

Example 3

SELECT ROUND(123.4544, 2);

Result:

+---------------------+
| ROUND(123.4544, 2)  |
+---------------------+
| 123.45              |
+---------------------+
1 row in set (0.00 sec)

Exercise 2: Use the function documentation quoted above to explain why ROUND(123.4567, 2) and ROUND(123.4544, 2) return different values.

Exercise 3: Use the function documentation for the FORMAT() function to write a db-less SELECT query that formats the number $123123456$ into the nicely formatted number $123,123,456$.

Exercise 4: Write a db-less SELECT query in the mysql shell that concatenates the strings 'Joe', ' is', and ' stinky'.

Example 4

Suppose we want to see what the product prices would look like if they were discounted at 20% off. To calculate the new total of a product discounted at 20% off, we multiply the initial price by .8. This is because $$100\% - 20\% = 80\% = .8$$

A query that accomplishes this is

SELECT prod_name, prod_price*.8
FROM Products;

Result:

+---------------------+---------------+
| prod_name           | prod_price*.8 |
+---------------------+---------------+
| Fish bean bag toy   |         2.792 |
| Bird bean bag toy   |         2.792 |
| Rabbit bean bag toy |         2.792 |
| 8 inch teddy bear   |         4.792 |
| 12 inch teddy bear  |         7.192 |
| 18 inch teddy bear  |         9.592 |
| Raggedy Ann         |         3.992 |
| King doll           |         7.592 |
| Queen doll          |         7.592 |
+---------------------+---------------+

Exercise 5: Modify the query above to round the discounted price to 2 decimal places.

Exercise 6: Modify the query you just created in the previous exercise to prepend a dollar sign before the product price.

The need for aliases

Let's examine a simple SELECT query that retrieves some fields from a table.

Example 5

SELECT vend_id, vend_name, vend_state
FROM Vendors;

Result:

+---------+-----------------+------------+
| vend_id | vend_name       | vend_state |
+---------+-----------------+------------+
| BRE02   | Bear Emporium   | OH         |
| BRS01   | Bears R Us      | MI         |
| DLL01   | Doll House Inc. | CA         |
| FNG01   | Fun and Games   | NULL       |
| FRB01   | Furball Inc.    | NY         |
| JTS01   | Jouets et ours  | NULL       |
+---------+-----------------+------------+
6 rows in set (0.00 sec)

Let's take a closer look at the header that comes along with the results, which is something we have not paid much attention to so far in this series.

+---------+-----------------+------------+
| vend_id | vend_name       | vend_state |
+---------+-----------------+------------+

These header values correspond to the field names we requested in the SELECT statement. This may seem blatantly obvious, but it's worth pointing out. Why? Because application layers rely on these field names.

Example 6

For example, consider a simple PHP script that will print out the vendor id, vendor name, and vendor state for each vendor in the Vendors table.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<?php

$db = mysqli_connect("localhost","db_user","db_password","sams_sql");
$sql = "SELECT vend_id, vend_name, vend_state FROM Vendors";
$result = mysqli_query($db, $sql);

?>

<h1>Vendor information</h1>

<table>
    <tr>
        <th>Vendor ID</th>
        <th>Vendor Name</th>
        <th>Vendor State</th>
    </tr>

    <?php while($vendor_info = mysqli_fetch_assoc($result)): ?>
    <tr>
        <td><?= $vendor_info['vend_id'];?></td>
        <td><?= $vendor_info['vend_name'];?></td>
        <td><?= $vendor_info['vend_state'];?></td>
    </tr>
    <?php endwhile; ?>
</table>

raw source

Result:

Observe how the PHP source code used the field names from the SELECT query on Line 4 as keys of the $vendor_info associative array on lines 18-24. This is by design.

Example 7

Now let's revisit a previous example where we queried the Product table to examine the prices if they were discounted 20%. As a refresher,

SELECT prod_name, prod_price*.8
FROM Products;

Result:

+---------------------+---------------+
| prod_name           | prod_price*.8 |
+---------------------+---------------+
| Fish bean bag toy   |         2.792 |
| Bird bean bag toy   |         2.792 |
| Rabbit bean bag toy |         2.792 |
| 8 inch teddy bear   |         4.792 |
| 12 inch teddy bear  |         7.192 |
| 18 inch teddy bear  |         9.592 |
| Raggedy Ann         |         3.992 |
| King doll           |         7.592 |
| Queen doll          |         7.592 |
+---------------------+---------------+

Let's alter the vendors PHP file above to list out the id, name, and discounted price for each product.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<?php

$db = mysqli_connect("localhost","db_user","db_password","sams_sql");
$sql = "SELECT prod_id, prod_name, prod_price*.8 FROM Products";
$result = mysqli_query($db, $sql);

?>

<h1>Vendor information</h1>

<table>
    <tr>
        <th>Product ID</th>
        <th>Product Name</th>
        <th>Product Price</th>
    </tr>

    <?php while($product_info = mysqli_fetch_assoc($result)): ?>
    <tr>
        <td><?= $product_info['prod_id'];?></td>
        <td><?= $product_info['prod_name'];?></td>
        <td><?= $product_info['prod_price'];?></td>
    </tr>
    <?php endwhile; ?>
</table>

raw source

Result:

The product prices are missing! Why? Let's take a closer look at the header that came back from the query we performed earlier:

+---------------------+---------------+
| prod_name           | prod_price*.8 |
+---------------------+---------------+

The field name for the discounted product price is not prod_price. It's prod_price*.8. To verify this claim, observe this modified php file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<?php

$db = mysqli_connect("localhost","db_user","db_password","sams_sql");
$sql = "SELECT prod_id, prod_name, prod_price*.8 FROM Products";
$result = mysqli_query($db, $sql);

?>

<h1>Vendor information</h1>

<table>
    <tr>
        <th>Product ID</th>
        <th>Product Name</th>
        <th>Product Price</th>
    </tr>

    <?php while($product_info = mysqli_fetch_assoc($result)): ?>
    <tr>
        <td><?= $product_info['prod_id'];?></td>
        <td><?= $product_info['prod_name'];?></td>
        <td><?= $product_info['prod_price*.8'];?></td>
    </tr>
    <?php endwhile; ?>
</table>

raw source

Result:

Now the correct values are being output, but we had to use 'prod_price.*8' as the key in the $product_info associative array. This is problematic for three reasons:

  1. It violates the DRY (don't repeat yourself) principle.
  2. It provides no context
  3. It's disgusting.

Let's elaborate on these three points.

#1: if the discount changed to 75%, we would have to update all occurrences of prod_price*.8 to prod_price*.75. We essentially have the logic of the calculated field as associative array keys themselves. Ideally we should only have to change the calculated field in the query.

#2: The only reason why you as the reader know that prod_price.*8 is a discounted product price is because I provided the context to you in this article. To other developers looking at the source code for the first time, prod_price.*8 on its own provides no meaning or context. (And remember, you become an other developer to your own source code after a few weeks of not looking at it!!!). Currently you cannot directly discern why the query multiplies the product price by .8. It just does.

#3: I believe this is pretty clear. My fingers cry having to type 'prod_price*.8' as an array key.

Exercise 7: Concoct a scenario similar to the prod_price*.8 example above that demonstrates why having the logic of calculated fields as array keys is annoying and problematic.

Creating aliases

If using calculated fields as associative array keys is such a poor practice, what can we do instead? SQL provides column aliases to alleviate this problem.

Example 8

We'll take our good friend the product discount example and finally create the query the correct way!

SELECT prod_id, prod_name, prod_price*.8 AS discount_price
FROM Products;

Result:

+---------+---------------------+----------------+
| prod_id | prod_name           | discount_price |
+---------+---------------------+----------------+
| BNBG01  | Fish bean bag toy   |          2.792 |
| BNBG02  | Bird bean bag toy   |          2.792 |
| BNBG03  | Rabbit bean bag toy |          2.792 |
| BR01    | 8 inch teddy bear   |          4.792 |
| BR02    | 12 inch teddy bear  |          7.192 |
| BR03    | 18 inch teddy bear  |          9.592 |
| RGAN01  | Raggedy Ann         |          3.992 |
| RYL01   | King doll           |          7.592 |
| RYL02   | Queen doll          |          7.592 |
+---------+---------------------+----------------+

Much better!

The syntax for creating a field aliases is

SELECT col1 AS alias1, col2 AS alias2
FROM table;

While field aliases are extremely useful for calculated fields, they can be used for normal field names as well.

Example 9

SELECT
    prod_id AS id,
    prod_name,
    prod_price*.8 AS discount_price
FROM Products;

Results:

+--------+---------------------+----------------+
| id     | prod_name           | discount_price |
+--------+---------------------+----------------+
| BNBG01 | Fish bean bag toy   |          2.792 |
| BNBG02 | Bird bean bag toy   |          2.792 |
| BNBG03 | Rabbit bean bag toy |          2.792 |
| BR01   | 8 inch teddy bear   |          4.792 |
| BR02   | 12 inch teddy bear  |          7.192 |
| BR03   | 18 inch teddy bear  |          9.592 |
| RGAN01 | Raggedy Ann         |          3.992 |
| RYL01  | King doll           |          7.592 |
| RYL02  | Queen doll          |          7.592 |
+--------+---------------------+----------------+

The query created two field aliases. The prod_id field is aliased as id and the calculated field prod_price*.8 is aliased as discount_price.

Exercise 8: Write a db-less SELECT query in the mysql shell that uses a calculated field returning the greatest of three numbers using the GREATEST() function. Ensure the calculated field has an appropriately named alias.

Review problems

The following review questions require the database provided at the beginning of the article.

Problem 1

Retrieve the product names and product prices for products less than 6 dollars. Return the product name and price in a single, appropriately named field.

Problem 2

Write a db-less SELECT query in the mysql shell that demonstrates each of the following MySQL Functions. For each function, provide a scenario where the function might be useful.

Problem 3

Write a query that retrieves the id, name, and price of products whose name contains the string 'toy' or the sting 'bear'. The results should be sorted by product price with the highest price first. Ensure the price is formatted to include the dollar sign and displays two decimal points. Format the product name to be all lower cased.

Problem 4

An online toy store is selling the toys in the Products table at a 20% off discounted price. In order to provide the best deals for the customers, the store will take \$1 off the price instead of 20% if the \$1 off saves the customer more than the 20% off.

For example, if a product is priced at \$6.50, 20% off brings the price down to \$5.20, a savings of \$1.30, so the 20% discount should be what's used. However, if a product is priced at \$3.50, 20% off brings the price down to \$2.80, a savings of only \$0.70, so the \$1 off discount should be used instead.

Write a query that retrieves the id, name, and the properly formatted discounted price according to the discount rules above.

Tagged as databases

(This post is part of the learning sql series.)

Date published - January 11, 2016