A while ago I watched a super interesting talk about the history of character encodings (https://www.youtube.com/watch?v=gd5uJ7Nlvvo) and I was intrigued by the fact that ASCII characters are not positioned randomly on the table, but they're placed in such a way that they have a convenient bit pattern for the kinds of operations that you would do on them (go to 15:00 in the talk to understand better).
Today I tried to implement some of the functions in ctype.h
to see if they would be faster than the ones that I see most often. So for example, instead of writing this:
int to_digit(char c) { return (c - '0'); }
I wrote this:
int to_digit(char c) { return (c & 0x0F); }
And instead of this:
bool is_digit(char c) { return (c >= '0') && (c <= '9'); }
I wrote this:
bool is_digit(char c) { return (c ^ 0x30) < 0xA; }
I benchmarked these and found out that most are actually faster, while some of them are slower. This is the complete code for the functions:
/* Most obvious implementation */ char to_upper_v1(char c) { return (c - 32); } char to_lower_v1(char c) { return (c + 32); } int to_digit_v1(char c) { return (c - '0'); } char to_char_v1(int i) { return (char) (i + '0'); } bool is_digit_v1(char c) { return (c >= '0') && (c <= '9'); } bool is_alpha_v1(char c) { return ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')); } bool is_lower_v1(char c) { return (c >= 'a' && c <= 'z'); } bool is_upper_v1(char c) { return (c >= 'A' && c <= 'Z'); } /* Less obvious implementation */ char to_upper_v2(char c) { return (c & 0xDF); } char to_lower_v2(char c) { return (c | 0x20); } int to_digit_v2(char c) { return (c & 0x0F); } char to_char_v2(int i) { return (char) (i | 0x30); } bool is_digit_v2(char c) { return (c ^ 0x30) < 0xA; } bool is_alpha_v2(char c) { return ((c ^ 0x40) - 1) < 0x5B; } bool is_lower_v2(char c) { return ((c ^ 0x60) - 1U) < 0x1A; } bool is_upper_v2(char c) { return ((c ^ 0x40) - 1U) < 0x1A; }
These are the results (a billion calls, compiled with MSVC on -O2):
to_upper_v1 = 59.19 sec to_upper_v2 = 56.17 sec (+ 5.08 %) is_digit_v1 = 71.47 sec is_digit_v2 = 56.91 sec (+20.37 %) is_alpha_v1 = 122.46 sec is_alpha_v2 = 83.24 sec (+32.02 %) is_lower_v1 = 65.07 sec is_lower_v2 = 83.24 sec (-27.9 %) is_upper_v1 = 65.88 sec is_upper_v2 = 82.04 sec (-27.9 %)
(Note I didn't include the results for to_lower, to_digit and to_char because they're identical to the ones for to_upper.)
I'm still kind of a noob programmer and this is my first time doing benchmarks of some code so I might have made a mistake somewhere, but looking at the instruction count these results seem to make sense - faster functions are the ones with the least amount of instructions.
I know this isn't anything groundbreaking, but I though it could be interesting and something that isn't looked at very often.
EDIT: I realized as soon as I turned off my PC that I only checked that they work on printable ASCII characters, so maybe they're not as useful as I thought. But hey, that doesn't mean that there can't be a better way to write those functions that works on all characters - and if there can't be, well, it has been a good exercise.