handmade.network » Forums » Learning DB Performance/Behavior
Floresy
Tom
20 posts

None

#11523 Learning DB Performance/Behavior
3 weeks, 6 days ago

So despite a SQL database being something I with day-to-day at my job, i've come to realize its a huge black box for me. I was wondering if folks on here have any resources for learning about the internals of such dbs in a similar way to understanding C code in terms of what instructions it breaks down to (and how costly those instructios are), how it access memory and penalties, etc. Are there even simpler parts you can break things down to with SQL dbs? My ignorance of this subject is pretty big, I mean I can guess that there's disk access going on in response to running a command and that it does something a little smarter than just scan the whole disk for every last place your data might be, but after that...?

0xdedededede.....
mmozeiko
Mārtiņš Možeiko
1280 posts
1 project
#11526 Learning DB Performance/Behavior
3 weeks, 6 days ago Edited by Mārtiņš Možeiko on March 29, 2017, 8:22 p.m.

If you want actual source code, then SQLite is best learning material for this.
It also has a good documentation available on architecture and internals: https://www.sqlite.org/arch.html
Take a look there, you'll see various technologies/algorithms that are pretty much the same also in other SQL databases. And check out other topics:
https://www.sqlite.org/atomiccommit.html
https://www.sqlite.org/malloc.html
https://www.sqlite.org/isolation.html
https://www.sqlite.org/optoverview.html
https://www.sqlite.org/queryplanner-ng.html
https://www.sqlite.org/opcode.html
ratchetfreak
251 posts
#11533 Learning DB Performance/Behavior
3 weeks, 5 days ago

At it's core DBs work with the constraint that not all data can fit in ram so it has to page in and out the data it needs from disk.

I'm not too familiar with DB code myself but My gut tells me that the primary optimization target is to minimize the amount of data that needs to be read into ram (at least until you can install a massive SSD RAID) or written out back to disk. They will also be working with fixed sized file blocks rather than simple flat files because then they don't have to rely on the OS file IO caching doing the right thing and not eating up too much ram.

Minimizing reads can be done by sorting the table or adding index tables so the DB can find where the data is stored based on the constraints in the WHERE clause in sub linear time (you don't want things to scale linearly for the big tables when they don't have to).

Those index tables will likely be K-ary trees with each node taking up an entire block in the hard drive.

Keeping the variable length data and the fixed length data separate lets the fixed length be easily traversable (and then you can use a offset to the variable length data in the fixed length struct).

Making changes to a sorted list without doing a O(n) update each time can be done by soft-deleting records and keeping a second smaller array for the insertions. Then during daily/weekly maintenance the full table can be traversed, compacting out the deleted records and merging in the smaller array.

Floresy
Tom
20 posts

None

#11539 Learning DB Performance/Behavior
3 weeks, 5 days ago

@mmozeiko Thanks! These look pretty awesome

@ratchetfreak Yeah, moving/transforming a small amount of data is always better than more, I imagine that holds true at all levels of hardware. What caused me to realize that I didn't have the mental tools to understand querying a db was my coworker explaining how a query he had written got "optimized" by having a certain select reversed. I'd imagine this could be something similar to reading from a linear array backwards vs forwards, thanks to a little game dev experience I know about cache behavior at the cpu end but I have no clue as to what the reasoning is for a db.

0xdedededede.....
ratchetfreak
251 posts
#11547 Learning DB Performance/Behavior
3 weeks, 4 days ago

Floresy:


@ratchetfreak Yeah, moving/transforming a small amount of data is always better than more, I imagine that holds true at all levels of hardware. What caused me to realize that I didn't have the mental tools to understand querying a db was my coworker explaining how a query he had written got "optimized" by having a certain select reversed. I'd imagine this could be something similar to reading from a linear array backwards vs forwards, thanks to a little game dev experience I know about cache behavior at the cpu end but I have no clue as to what the reasoning is for a db.


If he's talking about "SELECT A B" vs. "SELECT B A" then that shouldn't matter. It may perhaps change some cpu cache behavior but I doubt it's significant when taking into account the surrounding processing.

The only way that could matter significantly is if the query interpreter uses the order of SELECT fields in some way for the execution and frankly I'd say that's a bug.

Not including some columns of a table can have significant impact (when a table is split up internally to separate variable length fields from fixed length fields and you don't SELECT any varchar column.)
msmshazan
Shazan Shums
116 posts

Some day I will make quality software.
Programming FTW.

#11548 Learning DB Performance/Behavior
3 weeks, 4 days ago

I was thinking is it possible to use sqlite asynchronously (inside game loop).

Just being curious on programming....
mmozeiko
Mārtiņš Možeiko
1280 posts
1 project
#11551 Learning DB Performance/Behavior
3 weeks, 4 days ago Edited by Mārtiņš Možeiko on March 31, 2017, 5:36 p.m.

Yes, it is possible. You can use sqlite asynchronously by putting query execution on separate thread. Just make game loop to issue queries on thread, and after they finish post back the result. Very similar how HandmadeHero did async asset loading.
msmshazan
Shazan Shums
116 posts

Some day I will make quality software.
Programming FTW.

#11555 Learning DB Performance/Behavior
3 weeks, 4 days ago

Is it possible without multi threading. Because I haven't done multi threading before.

Just being curious on programming....
mmozeiko
Mārtiņš Možeiko
1280 posts
1 project
#11556 Learning DB Performance/Behavior
3 weeks, 4 days ago

No, sqlite library doesn't provide async API.
MandleBro
Jack Mott
96 posts
1 project

Web Developer by day, game hobbyist by night. Fond of C and F#

#11559 Learning DB Performance/Behavior
3 weeks, 3 days ago

The database course professor at Rice University highly recommends this book:
https://www.amazon.com/Database-S...s-Complete-Book-2nd/dp/0131873253

Also the PostgreSQL source code looks really clean and well organized, so it might be a good place to poke around.