The 2024 Wheel Reinvention Jam is in 3 days. September 23-29, 2024. More info
Mārtiņš Možeiko's Avatar
Mārtiņš Možeiko
GitHub
Twitter
Member since

Recent Activity

&wcap update with many cool new features & improvements - https://github.com/mmozeiko/wcap

  • Record application local audio for window capture (Windows 10 "20H1" and up)
    Captures audio output only from single process, no any other system/process audio included.
  • Option to disable yellow capture border (Windows 11 only)
  • Option to disable rounded window corners for window capture (Windows 11 only)
  • Resize performance is significantly improved when limiting max width/height
    For example, resizing 3840x2160 to 1280x720 on RTX 2080 improved from ~0.6 to 0.2 msec per frame. For Intel UHD 630 it improved from ~30 to 8 msec. The larger resize ratio is, the larger performance improvement.
  • Option to do gamma correct resize
    Resize can now happen in linear space, thus better preserving brightness. It mostly helps when you have bright colors in front of dark background. For 3d game/movie/photo content capture it should look better, but it will look worse for capturing screen images with dark text on light background. So it is not always an improvement. See the attached image for comparisons - notice how much dimmer are the smaller stars in old version, but gamma correct resize preserves their brightness. You will need to look at image with 100% scale, otherwise discord/browser may show wrong colors in resize comparison.
  • Option to improve color conversion
    Adjusts RGB to YUV conversion output to better match the brightness of output to original input color. This can be a huge improvement for high color contrast images. Especially max-red and max-blue colors. See attached image where new conversion shows significantly less darker areas around edges of colored text. Also see how closer in brightness are regions with red & blue alternating lines. For content with smoother color changes like 3d game/movies/photos there will be almost no visible difference from enabling this option. The improvement costs a bit of extra performance, so it is not completely for free. For example, for 3840x2160 size on RTX 2080 it costs ~0.19 vs 0.08 msec for regular conversion per frame. On Intel UHD 630 it is ~9 vs 6 msec.
  • Fixed bad shift in color (0.5px horizontally) when doing RGB to YUV color conversion
    Now the conversion is more accurate, for both improved and default YUV conversions.
  • Executable size increased by 6.5 KB - from 57856 to 64512 bytes, only 1KB left till 64KB 😦
View original message on Discord

Another &derpnet example - tunneling TCP traffic over derp relay:

  • similar to SSH tunneling, but just using DerpNet api
  • server connects to some TCP port on localhost and forwards all traffic from it to derpnet_proxy remote
  • client listens on local TCP port and forwards all traffic to derpnet_proxy remote
  • example is limited to only single connection to server, but same idea can be extended to support multiple connections
    https://github.com/mmozeiko/derpnet#derpnet_proxy
View original message on Discord

&derpnet ported to JavaScript - https://github.com/mmozeiko/derpnet/blob/main/web

  • implements exactly same protocol as C code, with same encryption guarantees
  • now you can communicate between browser and native code "without running" any extra servers!
  • JS uses websocket to talk to Tailscale DERP server, which relays messages between your peers
  • comes with two examples, one showing basic usage to send & receive messages, and one for exchanging files - both directions, upload to browser and download from browser supported
View original message on Discord

&derpnet - simple end-to-end encrypted network library in C for Windows.

  • single header library to provide encrypted communication between peers
  • uses Tailscale DERP relays for low bandwidth communication even when both peers are behind NAT
  • comes with 3 examples showing simple message exchange, file sharing or chat between users
    https://github.com/mmozeiko/derpnet
View original message on Discord

stb_image_resize2.h: https://github.com/nothings/stb/blob/master/stb_image_resize2.h
Much faster and better than stb_image_resize.h
Major performance increases (in degenerate cases can be >1000x faster). Now includes SIMD optimizations - for SSE2, AVX, NEON and WASM.
Bit-identical deterministic results between scalar and SIMD code, across architectures (x86, arm, wasm) and across all compilers - msvc, clang, gcc.

View original message on Discord

WASAPI wrapper example that offers directsound-like lock/unlock buffer functionality (but simpler): https://gist.github.com/mmozeiko/5a5b168e61aff4c1eaec0381da62808f#file-win32_wasapi-h

Header has just 4 functions - start/stop, lock/unlock. And then you fill in the buffer between lock/unlock calls in every frame. No worrying about callbacks or multithreading for your audio code to mix the samples. The buffer to fill is a "magic ringbuffer" allocated with virtual memory mapping trick - it gives just one array to fill, no need to handle split in the middle for wrapping back to start like dsound requires. It wraps around automatically. Internally wrapper creates large ringbuffer that you fill on every frame as much as you want. This means no audio going out or glitching when frame time stutters and you get longer frame than expected. This is done by running a background thread where samples are submitted to wasapi from ringbuffer with smallest buffer size. And lock/unlock allows to overwrite portion of ringbuffer that has not been submitted yet, so you can get lower latency even if you have prepared more samples than necessary.

example.c file with example code at the top.

View original message on Discord

Toy project to mount tags from local git repository as "virtual" folder using Windows Projected File System: https://github.com/mmozeiko/gitprj

This allows you to browse tree of all commits referenced by tags without need to run git command to checkout between commits.

View original message on Discord

"hello triangle" using WebGPU on Windows in C: https://gist.github.com/mmozeiko/4c68b91faff8b7026e8c5e44ff810b62
draws same triangle as my other d3d11/opengl gists. Just a single file to compile, no other dependencies (other than webgpu implementation itself), simple code without abstractions.

View original message on Discord

upng.h - uncompressed png writer & reader, with optimizations for x64 and arm64: https://gist.github.com/mmozeiko/e66f6d23e101b1b9c37cb3d9d10727f5?ts=4
Standalone header file with two functions to use, no memory allocations, no runtime dependencies.
In case you need to create valid png files really really fast. Supports all 8-bit and 16-bit png pixel formats & any image size (32-bit width/height) - as long as it fits in memory.
Can create 8k BGRA8 png file (256MB) in 23msec on Ryzen 5950x. Which means it's running at 11GB/s.
Compared to libpng 170msec (uncompressed png, 1.5GB/s) or 2150msec (compressed png, 120MB/s).

View original message on Discord

Minor TwitchNotify improvement - code updated to refresh stream status for users when websocket is reconnected. This happens, for example, after computer was resumed for sleep/standby. Previously UI did not reflect changes in user stream status during this time. So you had old info available after resuming pc, potentially showing users streams being live when they are not, or opposite.
Plus removed nested user context menu - it's simpler to have everything in one list without sub-menus. &twitchnotify
https://github.com/mmozeiko/TwitchNotify

View original message on Discord

alternatively use fwidth for automatic "zoom" calculations, no need to manually scale anything - just pass correct vertex/texcoord coordinates: https://www.shadertoy.com/view/csX3RH

&wcap update - now it can encode video to 10-bit HEVC (main10 profile). Theoretically it increases quality of image as there is more resolution for color values in conversion to YUV. But it depends on how GPU implements encoder. In practice it will be very hard to see any differences. From my non-scientific tests with Nvidia, it seems it encodes better - on some content it gets ~10% smaller video files than 8-bit HEVC.

Not all GPU's support encoding to 10-bit HEVC. Check your GPU support here:
Nvidia: https://en.wikipedia.org/wiki/Nvidia_NVENC#Versions or https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new
Intel: https://en.wikipedia.org/wiki/Intel_Quick_Sync_Video#Hardware_decoding_and_encoding

Get it on GitHub: https://github.com/mmozeiko/wcap

View original message on Discord

&TwitchNotify now allows to download followed user list from your Twitch account - set username in .ini file and choose to download automatically on startup, or do it manually in popup menu. https://github.com/mmozeiko/TwitchNotify/

View original message on Discord

TwitchNotify update: https://github.com/mmozeiko/TwitchNotify/ Much better websocket connection code. Now it does not need to disconnect to reload users - websocket connection stays permanent. Also because Twitch likes you to notify that stream is live but then it takes sometimes up to 10 seconds to actually report game/stream name, so now TwitchNotify will show notification with just user name, and then try to get game/stream name later and update notification when it successfully gets one. And now it shows viewer count in user list. &twitchnotify

View original message on Discord

New TwitchNotify codebase: https://github.com/mmozeiko/TwitchNotify/ Now it monitors Twitch user live status via websocket instead of previous 1-min polls. Now you get notifications instantly without any delay. It also now uses Windows 10 toast notifications - it shows actions buttons in notification popup itself. Either to open mpv video player, or open browser page. All this with nice 34KB .exe and ~3MB runtime memory usage. CC @DreamerSleeper &twitchnotify

View original message on Discord

new &wcap feature - allow to choose discrete vs integrated GPU to use for encoding. By default it will select discrete one. Using integrated may be useful for laptop users with nvidia optimus when you're recording low framerate / low complexity screen capture, as that uses less power, which means less heat & better battery usage. I recommend using iGPU only if you're on Skylake or newer Intel CPU, as older CPUs than Skylake have pretty poor performance & quality. New version also includes minor fix for bad handling of minimized window capture. In such case no new frames are captured which means messed up timestamp or hanging of recording (if audio capture was also enabled). Now it will properly produce discontinuity in video stream. https://github.com/mmozeiko/wcap

View original message on Discord