Learning DB Performance/Behavior

Tom

#11523

March 29, 2017

So despite a SQL database being something I with day-to-day at my job, i've come to realize its a huge black box for me. I was wondering if folks on here have any resources for learning about the internals of such dbs in a similar way to understanding C code in terms of what instructions it breaks down to (and how costly those instructios are), how it access memory and penalties, etc. Are there even simpler parts you can break things down to with SQL dbs? My ignorance of this subject is pretty big, I mean I can guess that there's disk access going on in response to running a command and that it does something a little smarter than just scan the whole disk for every last place your data might be, but after that...?

Mārtiņš Možeiko

#11526

March 29, 2017

If you want actual source code, then SQLite is best learning material for this.
It also has a good documentation available on architecture and internals: https://www.sqlite.org/arch.html
Take a look there, you'll see various technologies/algorithms that are pretty much the same also in other SQL databases. And check out other topics:
https://www.sqlite.org/atomiccommit.html
https://www.sqlite.org/malloc.html
https://www.sqlite.org/isolation.html
https://www.sqlite.org/optoverview.html
https://www.sqlite.org/queryplanner-ng.html
https://www.sqlite.org/opcode.html

Edited by Mārtiņš Možeiko on March 29, 2017, 8:22pm

ratchetfreak

#11533

March 30, 2017

At it's core DBs work with the constraint that not all data can fit in ram so it has to page in and out the data it needs from disk.

I'm not too familiar with DB code myself but My gut tells me that the primary optimization target is to minimize the amount of data that needs to be read into ram (at least until you can install a massive SSD RAID) or written out back to disk. They will also be working with fixed sized file blocks rather than simple flat files because then they don't have to rely on the OS file IO caching doing the right thing and not eating up too much ram.

Minimizing reads can be done by sorting the table or adding index tables so the DB can find where the data is stored based on the constraints in the WHERE clause in sub linear time (you don't want things to scale linearly for the big tables when they don't have to).

Those index tables will likely be K-ary trees with each node taking up an entire block in the hard drive.

Keeping the variable length data and the fixed length data separate lets the fixed length be easily traversable (and then you can use a offset to the variable length data in the fixed length struct).

Making changes to a sorted list without doing a O(n) update each time can be done by soft-deleting records and keeping a second smaller array for the insertions. Then during daily/weekly maintenance the full table can be traversed, compacting out the deleted records and merging in the smaller array.

Tom

#11539

March 30, 2017

@mmozeiko Thanks! These look pretty awesome

@ratchetfreak Yeah, moving/transforming a small amount of data is always better than more, I imagine that holds true at all levels of hardware. What caused me to realize that I didn't have the mental tools to understand querying a db was my coworker explaining how a query he had written got "optimized" by having a certain select reversed. I'd imagine this could be something similar to reading from a linear array backwards vs forwards, thanks to a little game dev experience I know about cache behavior at the cpu end but I have no clue as to what the reasoning is for a db.

ratchetfreak

#11547

March 31, 2017

Floresy

@ratchetfreak Yeah, moving/transforming a small amount of data is always better than more, I imagine that holds true at all levels of hardware. What caused me to realize that I didn't have the mental tools to understand querying a db was my coworker explaining how a query he had written got "optimized" by having a certain select reversed. I'd imagine this could be something similar to reading from a linear array backwards vs forwards, thanks to a little game dev experience I know about cache behavior at the cpu end but I have no clue as to what the reasoning is for a db.

If he's talking about "SELECT A B" vs. "SELECT B A" then that shouldn't matter. It may perhaps change some cpu cache behavior but I doubt it's significant when taking into account the surrounding processing.

The only way that could matter significantly is if the query interpreter uses the order of SELECT fields in some way for the execution and frankly I'd say that's a bug.

Not including some columns of a table can have significant impact (when a table is split up internally to separate variable length fields from fixed length fields and you don't SELECT any varchar column.)

Shazan

#11548

March 31, 2017

I was thinking is it possible to use sqlite asynchronously (inside game loop).

Mārtiņš Možeiko

#11551

March 31, 2017

Yes, it is possible. You can use sqlite asynchronously by putting query execution on separate thread. Just make game loop to issue queries on thread, and after they finish post back the result. Very similar how HandmadeHero did async asset loading.

Edited by Mārtiņš Možeiko on March 31, 2017, 5:36pm

Shazan

#11555

April 1, 2017

Is it possible without multi threading. Because I haven't done multi threading before.

Mārtiņš Možeiko

#11556

April 1, 2017

No, sqlite library doesn't provide async API.

Jack Mott

#11559

April 1, 2017

The database course professor at Rice University highly recommends this book:
https://www.amazon.com/Database-S...s-Complete-Book-2nd/dp/0131873253

Also the PostgreSQL source code looks really clean and well organized, so it might be a good place to poke around.

Shazan

#12073

June 1, 2017

Hi, I'am working on a app using sqlite and was wondering how would you protect the app from Sql Injection. Is there a specific way of coding sqlite api to prevent sql injection.

Edited by Shazan on June 1, 2017, 11:58am

ratchetfreak

#12074

June 1, 2017

msmshazan
Hi, I'am working on a app using sqlite and was wondering how would you protect the app from Sql Injection.Is there a specific way of coding sqlite api to prevent sql injection.

Don't build sql query strings at runtime. Instead use parametered queries. That way there is no way for the input to change what the query does.

Shazan

#12075

June 1, 2017

What do you mean by parameterized queries can you provide an example.

ratchetfreak

#12076

June 1, 2017

sqlite3_stmt* statement;
sqlite3_prepare_v2(db, "select name from accounts where id = ?1", -1, &statement, null);
sqlite3_bind_text(statement, 1, id_from_input, free);

sqlite3_step(statement);
//grab data out of statement

sqlite3_finalize(statement);

the "?1" in the query string is the parameter with "bind" I set the value of it to "id_from_input" which is then used as if the query string was build as sprintf(buff, "select name from accounts where id = \"%s\"", id_from_input); but without any chance that a quote character inside id_from_input could change the meaning of the query.

Shazan

#12086

June 2, 2017

I seem to have a problem making Parameterized query.

Here's the function I implemented.

int CreateTable(sqlite3* Database,const char* Tablename){
    const char* Sql ="CREATE TABLE ?1 ( bar TEXT );";
    sqlite3_stmt *CreateStatement;
    const char *Errormessage;
    int Result= sqlite3_prepare(Database, Sql,-1,&CreateStatement,NULL);
    if(SQLITE_OK != Result) {
        // TODO: Correct error handling
        Errormessage= sqlite3_errstr(Result);
        fprintf(stderr, "Error creating table (%i): %s\n", Result,Errormessage);
        
    }else{
        Result= sqlite3_bind_text(CreateStatement, 1, Tablename, -1,SQLITE_TRANSIENT);
        if(Result != SQLITE_OK){
            // TODO: Correct error handling
            Errormessage= sqlite3_errstr(Result);
            fprintf(stderr, "Error Binding text (%i): %s\n", Result,Errormessage);
          
        }else{
            Result=  sqlite3_step(CreateStatement);
            if(SQLITE_OK != Result) {
                // TODO: Correct error handling
                Errormessage= sqlite3_errstr(Result);
                fprintf(stderr, "Error stepping sql (%i): %s\n", Result,Errormessage);
            }else{
            }
        }
        }
    sqlite3_finalize(CreateStatement);
    
    return Result;
}

But when i run it it say's sql logic error.