(This post is part of the learning-sql series.)
These notes follow along with Sams Teach Yourself SQL in 10 Minutes. If you would like to follow along with the examples, create a database and execute this sql script. Small exercises are scattered throughout this article and should be accomplished before you move on to the next section. Review problems are provided at the bottom of article.
- What are aggregation functions?
- Why do we need aggregation functions?
- Aggregation function examples
- Review problems
What are aggregation functions?
As discussed in the previous notes, calculated fields in SQL are calculations performed on a field (or group of fields) for an individual row. In contrast, an aggregation function is a function performed on multiple rows.
Adding a dollar sign to each price in a Product table is an example of a calculated field. is applied.
SELECT CONCAT('$', prod_price) FROM Products ORDER BY prod_price LIMIT 5;
+-------------------------+ | CONCAT('$', prod_price) | +-------------------------+ | $3.49 | | $3.49 | | $3.49 | | $4.99 | | $5.99 | +-------------------------+
For each row that returns from the query, the value of the field
passed to the
CONCAT() function. The
CONCAT() function takes values as
arguments. A value or multiple values go in, an individual value comes out.
In contrast, aggregation functions do not take individual values as arguments. Aggregation functions take in a set of values as arguments. A set of values go in, an individual value comes out.
Why do we need aggregation functions?
Humans constantly query for aggregate data in their day to day lives. Examples include
- What will the high temperature for today be?
- What's the cheapest toy in the store?
- How many students of the high school are in 10th grade?
What makes these real-life queries aggregate? The distinction between aggregate data and non-aggregate data is whether the individual components of the data are significant.
As a way to demonstrate the distinction between aggregate and non-aggregate data, suppose our boss asks us "How many orders were placed yesterday?"
Option 1 - non aggregate answer
In order to provide a non-aggregate answer to this question, we must list out each individual component of data. In the context of "How many orders were placed yesterday?", this means listing out all the orders that were placed yesterday. So imagine you pull up the order sheet from the day before and reply with
"Company A ordered at 9AM, Company B ordered at 9:12AM, Company C ordered at 10:15AM, Company D ordered at 11:30AM, Company E ordered at 11:33AM, Company F ordered at noon, Company G ordered at 12:18PM, Company H ordered at 2:45PM, Company I ordered at 2:53PM, Company J ordered at 3:02PM, Company K ordered at 3:09PM."
Option 2 - aggregate answer
In order to provide an aggregate answer to this question, we must process the data as a whole and provide an individual result that does not include the individual components of the data. This could look like opening up the order sheet from the day before, internally counting the orders that took place, and replying with
In this case, the aggregate answer best answers the question at hand since the individual orders themselves were not a point of interest.
Exercise 1: Provide 5 real world examples of aggregate data and 5 examples of non-aggregate data.
Aggregation function reference
To view a list of all aggregation functions available in MySQL, visit the Aggregate Function Reference section of the official MySQL documentation.
Aggregation function examples
For example, to calculate the average price of all of the products in the
Product table, we use the
AVG() aggregation function.
SELECT AVG(prod_price) AS avg_price FROM Products;
+-----------+ | avg_price | +-----------+ | 6.823333 | +-----------+ 1 row in set (0.00 sec)
AVG() function takes in the set of
prod_price values from all rows in
the Product table. The function returns the average of this set of values.
Suppose we wanted to figure out the cost of buying all the products below \$5 in the product table.
First let's examine what products meet this criteria.
SELECT prod_name, prod_price FROM Products WHERE prod_price < 5;
+---------------------+------------+ | prod_name | prod_price | +---------------------+------------+ | Fish bean bag toy | 3.49 | | Bird bean bag toy | 3.49 | | Rabbit bean bag toy | 3.49 | | Raggedy Ann | 4.99 | +---------------------+------------+ 4 rows in set (0.01 sec)
If we were to purchase all of these products, the total would be $$3.49 + 3.49 + 3.49 + 4.99 = 15.46$$
Aggregation functions can work with filtering via
WHERE clauses. To directly
query mysql the sum of the product prices under \$5, we perform the following:
SELECT SUM(prod_price) as total_price FROM Products WHERE prod_price < 5;
+-------------+ | total_price | +-------------+ | 15.46 | +-------------+ 1 row in set (0.00 sec)
Since aggregation functions return single values, you can treat the return value of an aggregation function as any other standard value. This includes passing the value to functions.
SELECT CONCAT('$', SUM(prod_price)) as total_price FROM Products WHERE prod_price < 5;
+-------------+ | total_price | +-------------+ | $15.46 | +-------------+
Exercise 2: Rewrite Example 2 to provide the same result without using the
AVG() function. Use the MySQL Aggregation Function Reference.
The following review questions require the database provided at the beginning of the article.
Write a query that answers each of the following questions. Be sure to include the query and the result of the query for each answer.
How many bean bag toys are in the Products table?
How many customers have yet to provide their email?
What is the average price of products that are 12 inches long?
How much would it cost to order 100 units of the most expensive product?
How much would it cost to order 10 quantities of the 3 most expensive products?
How many vendors are incorporated?