Confusion Regarding Network Byte Order

As I understand it, network byte order is big-endian, while most consumer PCs are little-endian. So a conversion must take place. I'm confused as to how this conversion takes place. Consider:

char msg1[2] = {0xBE, 0xEF};
write(server_fd, msg1, sizeof(msg1));
// -----------------------------------
char recv_buf[2] = {0};
read(client_fd, recv_buf, sizeof(recv_buf));
// recv_buf = { 0xBE, 0xEF };

u16 msg2 = 0xBEEF;
write(server_fd, &msg2, sizeof(msg2));
// -----------------------------------
char recv_buf[2] = {0};
read(client_fd, recv_buf, sizeof(recv_buf));
// recv_buf = { 0xEF, 0xBE };

For msg1 no conversion is done, however for msg2 it is. If write() sends things out as a series of bytes, i.e it does not know the pointer points to a multibyte value, why the difference? I'm very confused.

if you memcpy &msg2 to a 2 char array you will see {0xEF, 0xBE} That's how little endian works.

The least significant byte 0xEF is stored first and the byte after is the next significant byte 0xBE is stored next .

Sending data in "network byte order" (which is big-endian) and conversion "must take a place" is biggest lie network tutorials are selling you on the internet.

Data send over network DOES NOT CARE in what endian they are. They are just bytes. Whatever you'll send it, you will receive it in same order.

Only place where endianness matters is IP packet headers, so if you're writing network drivers or special firewall rules, then yeah you sometimes care about endianness. But for regular API usage you DO NOT CARE which endianness you use. So simply use whatever is simpler for you.

If you know that your code will always run on little endian machine (which is probably close to 100% of your actual use cases) then you simply read and write it same way and that's it. It'll work.

u16 send_msg = 0xBEEF;
write(server_fd, &send_msg, sizeof(send_msg));

...

u16 recv_msg;
read(client_fd, &recv_msg, sizeof(recv_msg));

and recv_msg will be 0xBEEF. Simple as that.

Or alternatively you can serialize variables to values in endian independent way. So they will be read and written the same regardless if sender is big endian, or receiver little endian, or other way around. This code will work the same in all situations:

u16 send_msg = 0xBEEF;
u8 bytes[2];
bytes[0] = (u8)(send_msg >> 0);
bytes[1] = (u8)(send_msg >> 8);
write(server_fd, bytes, sizeof(bytes));

...

u8 bytes[2];
read(client_fd, bytes, sizeof(bytes));
u16 recv_msg = (u16)((bytes[0] << 0) | (bytes[1] << 8));

Other way to think about all of this is to pretend that write/read is just memcpy. Then exactly same logic applies. With your example:

u16 msg2 = 0xBEEF;
char recv_buf[2];
memcpy(recv_buf, msg2, sizeof(msg2)); // replace write/read with memcpy
// now recv_buf = { 0xEF, 0xBE };

No networking involved, just exactly same bytes are copied around.


Edited by Mārtiņš Možeiko on

Thank you Mārtiņš. Things make much more sense to me now. To be pedantic, you said IP headers is were endianness matters. However, I think maybe TCP/UDP headers also, e.g port number (in reality any layer from Transport downwards)


Replying to mmozeiko (#26888)

Yes, that's the same thing. By IP packet I mean all kind of IP protocols, which include TCP packets and UDP packets and more.


Edited by Mārtiņš Možeiko on
Replying to scott-mccloud (#26890)

I've run into the same thought ("what did my professor mean when he said network order is big endian?") when doing a websocket server.

If you control both the client and the server, then there's no reason to do double conversion.

I do this, for example:

server:

int main()
{
    if (!little_endian()) {
        log_error("This server should only be run on little endian systems!\n");
        return(1);
    }
    
    ...

    write(socket, &msg_bytes, msg_size);
}

client (web browser):

function deserializer_u32(d) {
    // true means little endian
    const value = d.view.getUint32(d.offset, true);
    d.offset += 4;
    return value;
}