I think you are misunderstanding what does "nonblocking" mean for socket. It is a flag for OS to know whether recv function should wait for data or not. It is not about server informing client whether it should block or not. Each side need to set their own socket to nonblocking mode themselves.
In basic form there are three ways how you deal with sockets in asynchronous (non blocking) way:
1) polling mechanism - you set sockets to nonblocking mode and then simply call recv in a loop. As you could imagine this will waste performance if there is nothing else to do in a loop.
2) using select - do not use nonblocking mode, but you check which sockets are read to be read/written (with timeout 0) with "select" call and if there are any, you can safely read/write to them and recv/send calls won't block.
3) multithreading - do not use nonblocking mode, but just call recv from separate thread. Only then you need to deal with correctly passing read data back to your main or other worked threads.
You can see basic forms of these approaches in this article (includes example code):
https://unixism.net/2019/04/linux...cations-performance-introduction/ It also shows POSIX specific way with fork'ing (not relevant to Windows)
Once you will want to do something more advanced, for performance reasons, you cannot do any of above approaches. They won't scale for large amount of clients or data. Then you need to use
IOCP (I/O Completion Ports) on Windows, or
epoll on Linux, or
kqueue on BSD. These are all mechanisms how to efficiently receive data form large amount of sockets and to dispatch to threads for processing it. On newer Linux'es there is now alternative called
io_uring. Here is basic
tutorial on it.
Sending structures on socket is completely OK way to do it. I have worked on multiple successful commercial products that does this. If you are worried about big-endian vs little-endian, you should ask yourself - do you want to really support big-endian platform? Which platform is that? There are not many live hardware out there that is big endian. But yeah, if you need to handle it, you'll need to take extra care about it. Still - memcpy'ing structures will still work, all you'll need to do is reverse bytes for int types on receive side (or sender side - up to you).
As for padding/packing of structs - again question whether you need to support something that behaves so differently? You just need to know what compiler is doing on platforms you are compiling code, and how to make compiler to do what you want. It's all doable.
Most common alternative is to do custom serialization directly on bytes. Don't send strings as text strings. For example, if you have two integers to send, you just write 4 and 4 bytes always as little endian (if you care about endianness). Then receiving side will always read 4 and 4 bytes to get the message. This is basically part of you defining protocol how to parties will talk to each other. It is similar concept how file formats indicate what do they contain - for example, how bmp or png file tells you it's width and height. It has header that describes data coming after the header.