handmade.network » Forums » Learning DB Performance/Behavior
Floresy
Tom
21 posts

None

#11523 Learning DB Performance/Behavior
5 months, 4 weeks ago

So despite a SQL database being something I with day-to-day at my job, i've come to realize its a huge black box for me. I was wondering if folks on here have any resources for learning about the internals of such dbs in a similar way to understanding C code in terms of what instructions it breaks down to (and how costly those instructios are), how it access memory and penalties, etc. Are there even simpler parts you can break things down to with SQL dbs? My ignorance of this subject is pretty big, I mean I can guess that there's disk access going on in response to running a command and that it does something a little smarter than just scan the whole disk for every last place your data might be, but after that...?

0xdedededede.....
mmozeiko
Mārtiņš Možeiko
1477 posts
1 project
#11526 Learning DB Performance/Behavior
5 months, 4 weeks ago Edited by Mārtiņš Možeiko on March 29, 2017, 8:22 p.m.

If you want actual source code, then SQLite is best learning material for this.
It also has a good documentation available on architecture and internals: https://www.sqlite.org/arch.html
Take a look there, you'll see various technologies/algorithms that are pretty much the same also in other SQL databases. And check out other topics:
https://www.sqlite.org/atomiccommit.html
https://www.sqlite.org/malloc.html
https://www.sqlite.org/isolation.html
https://www.sqlite.org/optoverview.html
https://www.sqlite.org/queryplanner-ng.html
https://www.sqlite.org/opcode.html
ratchetfreak
298 posts
#11533 Learning DB Performance/Behavior
5 months, 4 weeks ago

At it's core DBs work with the constraint that not all data can fit in ram so it has to page in and out the data it needs from disk.

I'm not too familiar with DB code myself but My gut tells me that the primary optimization target is to minimize the amount of data that needs to be read into ram (at least until you can install a massive SSD RAID) or written out back to disk. They will also be working with fixed sized file blocks rather than simple flat files because then they don't have to rely on the OS file IO caching doing the right thing and not eating up too much ram.

Minimizing reads can be done by sorting the table or adding index tables so the DB can find where the data is stored based on the constraints in the WHERE clause in sub linear time (you don't want things to scale linearly for the big tables when they don't have to).

Those index tables will likely be K-ary trees with each node taking up an entire block in the hard drive.

Keeping the variable length data and the fixed length data separate lets the fixed length be easily traversable (and then you can use a offset to the variable length data in the fixed length struct).

Making changes to a sorted list without doing a O(n) update each time can be done by soft-deleting records and keeping a second smaller array for the insertions. Then during daily/weekly maintenance the full table can be traversed, compacting out the deleted records and merging in the smaller array.

Floresy
Tom
21 posts

None

#11539 Learning DB Performance/Behavior
5 months, 3 weeks ago

@mmozeiko Thanks! These look pretty awesome

@ratchetfreak Yeah, moving/transforming a small amount of data is always better than more, I imagine that holds true at all levels of hardware. What caused me to realize that I didn't have the mental tools to understand querying a db was my coworker explaining how a query he had written got "optimized" by having a certain select reversed. I'd imagine this could be something similar to reading from a linear array backwards vs forwards, thanks to a little game dev experience I know about cache behavior at the cpu end but I have no clue as to what the reasoning is for a db.

0xdedededede.....
ratchetfreak
298 posts
#11547 Learning DB Performance/Behavior
5 months, 3 weeks ago

Floresy


@ratchetfreak Yeah, moving/transforming a small amount of data is always better than more, I imagine that holds true at all levels of hardware. What caused me to realize that I didn't have the mental tools to understand querying a db was my coworker explaining how a query he had written got "optimized" by having a certain select reversed. I'd imagine this could be something similar to reading from a linear array backwards vs forwards, thanks to a little game dev experience I know about cache behavior at the cpu end but I have no clue as to what the reasoning is for a db.


If he's talking about "SELECT A B" vs. "SELECT B A" then that shouldn't matter. It may perhaps change some cpu cache behavior but I doubt it's significant when taking into account the surrounding processing.

The only way that could matter significantly is if the query interpreter uses the order of SELECT fields in some way for the execution and frankly I'd say that's a bug.

Not including some columns of a table can have significant impact (when a table is split up internally to separate variable length fields from fixed length fields and you don't SELECT any varchar column.)
msmshazan
Shazan Shums
162 posts

Some day I will make quality software.
Programming FTW.

#11548 Learning DB Performance/Behavior
5 months, 3 weeks ago

I was thinking is it possible to use sqlite asynchronously (inside game loop).

Just being curious on programming....
mmozeiko
Mārtiņš Možeiko
1477 posts
1 project
#11551 Learning DB Performance/Behavior
5 months, 3 weeks ago Edited by Mārtiņš Možeiko on March 31, 2017, 5:36 p.m.

Yes, it is possible. You can use sqlite asynchronously by putting query execution on separate thread. Just make game loop to issue queries on thread, and after they finish post back the result. Very similar how HandmadeHero did async asset loading.
msmshazan
Shazan Shums
162 posts

Some day I will make quality software.
Programming FTW.

#11555 Learning DB Performance/Behavior
5 months, 3 weeks ago

Is it possible without multi threading. Because I haven't done multi threading before.

Just being curious on programming....
mmozeiko
Mārtiņš Možeiko
1477 posts
1 project
#11556 Learning DB Performance/Behavior
5 months, 3 weeks ago

No, sqlite library doesn't provide async API.
MandleBro
Jack Mott
98 posts
1 project

Web Developer by day, game hobbyist by night. Fond of C and F#

#11559 Learning DB Performance/Behavior
5 months, 3 weeks ago

The database course professor at Rice University highly recommends this book:
https://www.amazon.com/Database-S...s-Complete-Book-2nd/dp/0131873253

Also the PostgreSQL source code looks really clean and well organized, so it might be a good place to poke around.
msmshazan
Shazan Shums
162 posts

Some day I will make quality software.
Programming FTW.

#12073 Learning DB Performance/Behavior
3 months, 3 weeks ago Edited by Shazan Shums on June 1, 2017, 11:58 a.m.

Hi, I'am working on a app using sqlite and was wondering how would you protect the app from Sql Injection. Is there a specific way of coding sqlite api to prevent sql injection.

Just being curious on programming....
ratchetfreak
298 posts
#12074 Learning DB Performance/Behavior
3 months, 3 weeks ago

msmshazan
Hi, I'am working on a app using sqlite and was wondering how would you protect the app from Sql Injection.Is there a specific way of coding sqlite api to prevent sql injection.


Don't build sql query strings at runtime. Instead use parametered queries. That way there is no way for the input to change what the query does.
msmshazan
Shazan Shums
162 posts

Some day I will make quality software.
Programming FTW.

#12075 Learning DB Performance/Behavior
3 months, 3 weeks ago

What do you mean by parameterized queries can you provide an example.

Just being curious on programming....
ratchetfreak
298 posts
#12076 Learning DB Performance/Behavior
3 months, 3 weeks ago

1
2
3
4
5
6
7
8
sqlite3_stmt* statement;
sqlite3_prepare_v2(db, "select name from accounts where id = ?1", -1, &statement, null);
sqlite3_bind_text(statement, 1, id_from_input, free);

sqlite3_step(statement);
//grab data out of statement

sqlite3_finalize(statement);


the "?1" in the query string is the parameter with "bind" I set the value of it to "id_from_input" which is then used as if the query string was build as sprintf(buff, "select name from accounts where id = \"%s\"", id_from_input); but without any chance that a quote character inside id_from_input could change the meaning of the query.
msmshazan
Shazan Shums
162 posts

Some day I will make quality software.
Programming FTW.

#12086 Learning DB Performance/Behavior
3 months, 3 weeks ago

I seem to have a problem making Parameterized query.

Here's the function I implemented.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
int CreateTable(sqlite3* Database,const char* Tablename){
    const char* Sql ="CREATE TABLE ?1 ( bar TEXT );";
    sqlite3_stmt *CreateStatement;
    const char *Errormessage;
    int Result= sqlite3_prepare(Database, Sql,-1,&CreateStatement,NULL);
    if(SQLITE_OK != Result) {
        // TODO: Correct error handling
        Errormessage= sqlite3_errstr(Result);
        fprintf(stderr, "Error creating table (%i): %s\n", Result,Errormessage);
        
    }else{
        Result= sqlite3_bind_text(CreateStatement, 1, Tablename, -1,SQLITE_TRANSIENT);
        if(Result != SQLITE_OK){
            // TODO: Correct error handling
            Errormessage= sqlite3_errstr(Result);
            fprintf(stderr, "Error Binding text (%i): %s\n", Result,Errormessage);
          
        }else{
            Result=  sqlite3_step(CreateStatement);
            if(SQLITE_OK != Result) {
                // TODO: Correct error handling
                Errormessage= sqlite3_errstr(Result);
                fprintf(stderr, "Error stepping sql (%i): %s\n", Result,Errormessage);
            }else{
            }
        }
        }
    sqlite3_finalize(CreateStatement);
    
    return Result;
}


But when i run it it say's sql logic error.

Just being curious on programming....