Learning C (Part 2): Types and Expressions

(This post is part of the learning c series.)

In this article, I will note what I feel are the non-obvious information related to C types, operators, and expressions.

Data Types

char: a single byte. Holds a single character.
int: an integer
float: single-precision floating point
double: double-precision floating point

Qualifiers

The qualifiers short or long can be applied to an integer.

short and int must be at least 16 bits in size, but int may be larger than short depending upon hardware and implementation.
Also, short <= int.
long is at least 32 bits. Also, int <= long.

Usage can be seen below:

#include <stdio.h>
int main(){
    short int x = 10;
    // int can actually be omitted here
    short y = 10;

    // zu is size_t type specifier
    printf("%zu\n", sizeof x);
    printf("%zu\n", sizeof y);

    long int lx = 10;
    long ly = 10;

    printf("%zu\n", sizeof lx);
    printf("%zu\n", sizeof ly);
    return 0;
}

raw source | show output ⇓

Note that the output is in bytes, so see that sizeof x is 16 bits and sizeof lx is 64 bits.

The qualifiers signed or unsigned can be used with a char or any integer type.

unsigned numbers are always ≥ 0.
unsigned numbers obey laws of arithmetic $\mod{2^n}$, where n is the number of bits in the type.
For example, for an 8-bit unsigned char type, the possible values are 0-255.
For an 8-bit signed char type, the possible values are -128-127.
long double specifies extended floating point precision.
- float <= double <= long double
- The sizes of the above could be distinct or similar. It's implementation defined.

Here's an example experimenting with these qualifiers.

#include <stdio.h>
int main(){
    // --------------------------------------------------
    // Messing with size qualifiers
    // --------------------------------------------------
    short sx = 10;
    int ix = 10;
    long lx = 10;

    char *fmt1 = "short: %zu, int: %zu, long: %zu\n";
    printf(fmt1, sizeof sx, sizeof ix, sizeof lx);

    float fy = 10.0;
    double dy = 10.0;
    long double ldy = 10.0;

    char *fmt2 = "float: %zu, double: %zu, long double: %zu\n";
    printf(fmt2, sizeof fy, sizeof dy, sizeof ldy);

    // --------------------------------------------------
    // Messing with signed/unsigned qualifiers
    // --------------------------------------------------
    unsigned char c = -1;
    unsigned long lc = -1;
    printf("c: %u, lc: %lu\n", c, lc);

    return 0;
}

raw source | show output ⇓

short: 2, int: 4, long: 8
float: 4, double: 8, long double: 16
c: 255, lc: 18446744073709551615

Constants and literals

int - 1234
long - 123456789L
unsigned - 123U
unsigned long - 123UL

#include <stdio.h>
int main(){
    int theint = 123;
    printf("%d\n", theint);

    long thelong = 123456789L;
    printf("%ld\n", thelong);

    unsigned theuint = 123U;
    printf("%u\n", theuint);

    unsigned long theulong = 123456789UL;
    printf("%lu\n", theulong);

    double thedouble = 123e-2;
    printf("%f\n", thedouble);

    int thehex = 0xA88BCF;
    printf("%x\n", thehex);

    char thechar = 'A'; // single, not double quotes
    printf("c as char: %c\n", thechar);
    printf("c as int: %d\n", thechar);
    return 0;
}

raw source | show output ⇓

123
123456789
123
123456789
1.230000
a88bcf
c as char: A
c as int: 65

String literals have a null character ('\0') at the end.

#include <stdio.h>
void print_str(char *s);

int main(){
    char *mystr = "Hello";
    print_str(mystr);
    return 0;
}

void print_str(char *s){
    int i=0;
    char c;
    while(c = *s++)
        printf("%c\n", c);

    if(c == '\0')
        // '\0' is not a printable character, so we need to print something
        // ourselves if we want to see it.
        printf("\\0\n");
}

raw source | show output ⇓

H
e
l
l
o
\0

Note: 'x' != "x"

Enums for symbol comparison

Enums are a great way to use symbols to represent constant, related data without the use of a define. Enums allow for semantic, readable comparisons with symbols when their value doesn't necessarily matter.

These concepts are described in the following example:

#include <stdio.h>
enum category { DOG, HUMAN, CAT };
struct mammal {
    char *name;
    enum category c;
};

void find_the_mammal(struct mammal *mammals, enum category mammal_type, size_t n);
int main(){
    struct mammal mammals[3];
    mammals[0] = (struct mammal){"Joseph", HUMAN};
    mammals[1] = (struct mammal){"Odin", CAT};
    mammals[2] = (struct mammal){"LUCY", DOG};

    find_the_mammal(mammals, HUMAN, 3);
    find_the_mammal(mammals, CAT, 3);
    find_the_mammal(mammals, DOG, 3);
    return 0;
}

void find_the_mammal(struct mammal *mammals, enum category mammal, size_t n){
    int i;
    for(i=0; i<n; i++){
        if(mammals[i].c == mammal)
            printf("%s\n", mammals[i].name);
    }
}

raw source | show output ⇓

Joseph
Odin
LUCY

Implicit and explicit type conversions

When performing an operation where the items being operated on are of different types, they are converted to a common type before the operation occurs.

For example, take the following simple program:

#include <stdio.h>
int main(){
    int x = 10;
    double y = 20.0;
    printf("%f\n", x+y);
    return 0;
}

raw source | show output ⇓

30.000000

The integer value of x was implicitly converted to a double when added to the double y. This conversion was "temporary" in that the variable x was not effected at all. x still remains an integer after the printf() statement has finished.

In general, implicit conversions such as the one above will make the smaller data type match the larger one. This is because there is no data loss going from an int to a float, but there is definitely data loss going from float to int via truncation!

#include <stdio.h>
#include <math.h>
int main(){
    // You need a grade average of 70 after rounding to graduate
    int graduation_threshold = 70;
    float your_grade = 69.9999;
    float your_grade_rounded = round(your_grade);
    printf("Your rounded grade as a double: %.2f\n", your_grade_rounded);

    int your_grade_as_int = your_grade;
    int your_grade_as_int_rounded = round(your_grade_as_int);

    if(your_grade_as_int_rounded >= graduation_threshold){
        printf("You graduated. Congrats!\n");
    }
    else{
        printf("You failed! Oh no!\n");
    }
    printf("Your final grade: %d\n", your_grade_as_int_rounded);

    return 0;
}

raw source | show output ⇓

Your rounded grade as a double: 70.00
You failed! Oh no!
Your final grade: 69

Chars are just numbers!

#include <stdio.h>
int main(){
    char c = 'A';
    c++;
    printf("%c\n", c);
    return 0;
}

raw source | show output ⇓

Implicit conversions

Suppose you have an operator that takes two numeric operands. The C standard specifies the following rules for conversions, in order (this is an exact quote of the standard):

First, if either operand is long double, the other is converted to long double.
Otherwise, if either operand is double, the other is converted to double.
Otherwise, if either operand is float, the other is converted to float.
Otherwise, the integral promotions are performed on both operands; then, if either operand is unsigned long int, the other is converted to unsigned long int.
Otherwise, if one operand is long int and the other is unsigned int, the effect depends on whether a long int can represent all values of an unsigned int; if so, the unsigned int operand is converted to long int; if not, both are converted to unsigned long int.
Otherwise, if one operand is long int, the other is converted to long int.
Otherwise, if either operand is unsigned int, the other is converted to unsigned int.
Otherwise, both operands have type int.

Comparison between signed and unsigned values are machine-dependent.

Note: "integral promotion" is a process that occurs when a char, short, or enum object are promoted to either int or unsigned int when used in an expression. If an int can represent the value of being converted, the value is converted to an int for the expression. Otherwise, it is converted to unsigned int.

Rules 1-3, and 6-8 were demonstrated by example 5 above.

The rules unsigned concerning integers are much more interesting, and they have very important implications! Consider a machine with a 32 bit int and 64 bit long along with the following example:

#include <stdio.h>
int main(){
    // On my computer, this prints 32 and 64. It may be different on your
    // computer!
    printf("sizeof unsigned int: %zu\n", sizeof(unsigned int)*8);
    printf("sizeof long: %zu\n", sizeof(long)*8);

    // Note that long/short, when by themselves, act as shorthand for
    // "long int" and "short int" respectively.
    printf("sizeof long int: %zu\n", sizeof(long int)*8);

    if(-1L < 1U){
        printf("-1 < 1. Duh!\n");
    }
    else{
        printf("-1 > 1?! WTF?!\n");
    }

    if(-1L < 1UL){
        printf("-1 < 1. Duh!\n");
    }
    else{
        printf("-1 > 1?! WTF?!\n");
    }

    return 0;
}

raw source | show output ⇓

sizeof unsigned int: 32
sizeof long: 64
sizeof long int: 64
-1 < 1. Duh!
-1 > 1?! WTF?!

Let's explain this behavior using the rules listed above.

-1L < 1U

In the first comparison, we are comparing a long with an unsigned int. Rule 5 tells us what we should expect when an unsigned int and a long are operands together:

Otherwise, if one operand is long int and the other is unsigned int, the effect depends on whether a long int can represent all values of an unsigned int; if so, the unsigned int operand is converted to long int; if not, both are converted to unsigned long int.

So the question becomes: Can a 64 bit long represent all the values of a 32 bit unsigned int? A long has to be able to represent both positive and negative numbers, whereas an unsigned int only has to represent positive numbers. So if the largest possible positive long value is greater than or equal to the largest possible unsigned int value, the standard says the unsigned int will be converted to long.

Let's ask <limits.h> whether a 64 bit long can represent all the values of a 32 bit unsigned int.

#include <stdio.h>
#include <limits.h>

// This is by definition
#define UINT_MIN 0

int main(){
    printf("UINT_MIN: %u\n", UINT_MIN); 
    printf("UINT_MAX: %u\n", UINT_MAX);
    printf("LONG_MAX: %ld\n", LONG_MAX);
    printf("LONG_MIN: %ld\n", LONG_MIN);

    if(LONG_MIN <= UINT_MIN && LONG_MAX >= UINT_MAX){
        printf("`long` can represent all values of `unsigned int`\n");
    }
    return 0;
}

raw source | show output ⇓

UINT_MIN: 0
UINT_MAX: 4294967295
LONG_MAX: 9223372036854775807
LONG_MIN: -9223372036854775808
`long` can represent all values of `unsigned int`

So the answer is yes, a long can represent all values of an unsigned int, thus the less-than operator -1L < 1UL will cause unsigned int to be converted to a long during the comparison. Let's see what the result of conversion would be.

#include <stdio.h>
int main(){
    long l = (long) 1U;
    printf("%ld\n", l);
    return 0;
}

raw source | show output ⇓

This behavior isn't anything unexpected! But now, it's time for the fun part.

-1L > 1UL

Rule 4 in the conversion steps says

Otherwise, the integral promotions are performed on both operands; then, if either operand is unsigned long int, the other is converted to unsigned long int.

So during the evaluation of the expression -1L > 1UL, integral promotion is performed. Remember, integral promotion is just taking the smaller int types, such as char and short, and converting them to int or unsigned int. So, since we are dealing with larger integer types, integral promotion does not occur in this example.

The second part of Rule 4, though, states that if either operand is an unsigned long, the other operand is converted to an unsigned long. Thus, in the expression -1L > 1UL, the operand -1L will be converted to an unsigned long. Let's see what converting a negative long to an unsigned long looks like.

#include <stdio.h>
#include <limits.h>
int main(){
    unsigned long ul = (unsigned long) -1L;
    printf("%lu\n", ul);

    // Compare
    printf("%lu\n", ULONG_MAX);
    return 0;
}

raw source | show output ⇓

18446744073709551615
18446744073709551615

The C standard actually mandates that -nUL == ULONG_MAX-n+1 for some integer n holds true for any standards-compliant compiler - this result is not machine dependent. Thus, converting a negative, signed long to an unsigned long yields a VERY big number. Identical behavior occurs for UINT.

So, for the comparison of -1L > 1UL, Rule 4 states that -1L is converted to an unsigned long, and we have seen that such a conversion results in a very large number. So now, it makes sense that -1L > 1UL, since the largest unsigned long is certainly larger than the number 1!

Explicit conversions

As you may have noticed in previous examples, we can explicitly convert data via a "cast". You simply prefix an expression with (thetype), and that expression will be converted to that type.

Consider this example, which will provide a compile-time warning due to formatter mismatch:

#include <stdio.h>
int main(){
    int i = 15;
    printf("The value: %.2f\n", i);
    return 0;
}

raw source | show output ⇓

ex13.c: In function ‘main’:
ex13.c:4:5: warning: format ‘%f’ expects argument of type ‘double’, but argument
2 has type ‘int’ [-Wformat]

The value: 0.00

Instead of creating another variable to facilitate an implicit int to float conversion, we can explicitly convert the data ourselves.

#include <stdio.h>
int main(){
    int i = 15;
    printf("The value: %.2f\n", (float)i);
    return 0;
}

raw source | show output ⇓

The value: 15.00

(guest@joequery.me)~ $ |

Learning C (Part 2): Types and Expressions

Data Types

Qualifiers

Constants and literals

Enums for symbol comparison

Implicit and explicit type conversions

Chars are just numbers!

Implicit conversions

-1L < 1U

-1L > 1UL

Explicit conversions

Tagged as c

(This post is part of the learning c series.)