The 2024 Wheel Reinvention Jam is in 15 days. September 23-29, 2024. More info

Need Help: Configuration File Formats

Hey everyone,

So I've been working on my project Const Port for a little while now. I just released a new version today. In the next version I'm aiming at adding some configuration options that can be customized by the user. However, as of yet I have not decided what the best way of going about this would be. I'd like to save all of the configuration variables in a way that can be easily edited with a text editor while also being easy to parse/create in code. I've enjoyed watching Jonathon Blows programming streams and I like the way the files look. However, trying to conceptualize how the code for parsing those files is set up, I feel like there is a lot of room for ambiguity and problems if people miss a space or something. Sublime Text being my text editor of choice I also feel that maybe going down a similar path wouldn't be such a bad option. They use JSON files for all configuration files. I can see a lot of trade-offs with this option as well though.

So here's my question: What sort of configuration files formats do you think work best? Do you have any examples of code used to serialize/deserialize this format in C/C++?

I'm open to all suggestions or comments about this subject. On a slightly off topic note if you have any suggestions for options you'd like to see in Const Port please post them on the project's forum thread.

Thanks!
I like YAML files. Has similar capabilities as JSON, but no need for to type extra symbols like comma/quotes/curly brackets/etc.

Simplified JSON is also OK. JSMN can parse it with non-scrict mode enabled.

There's also TOML which is something like extended INI files.

Edited by Mārtiņš Možeiko on
I personally prefer formats like the ones in Jonathan Blow's stream or .INI files. I don't understand the ambiguity problem. If you just treat any whitespace as an indicator of a new identifier it would be very obvious to the user. If your file looks like
1
2
3
name value
name value
name value

The user would know not to write namevalue. As long as you don't assign any significance to the amount of whitespace it should be fine.
mmozeiko
I like YAML files. Has similar capabilities as JSON, but no need for to type extra symbols like comma/quotes/curly brackets/etc.

Simplified JSON is also OK. JSMN can parse it with non-scrict mode enabled.

There's also TOML which is something like extended INI files.


I think YAML files are fine but I don't enjoy the significant indentation very much.

I've actually used JSMN before for an embedded application and it worked fairly well. The only thing I would be worried about is if the JSON is mal-formatted the JSMN parser just returns an error for the whole file which wouldn't very helpful for the user if they missed a comma or something somewhere. Ideally I'd like a way to get a character index or line number where the syntax went astray that I could serve to the user. Or ideally a way that allows us to throw away malformatted lines while preserving the data in the rest of the file.

TOML looks nice but unless I'm missing something it's just a definition? There's no library for parsing it?
empulse
I personally prefer formats like the ones in Jonathan Blow's stream or .INI files. I don't understand the ambiguity problem. If you just treat any whitespace as an indicator of a new identifier it would be very obvious to the user. If your file looks like
1
2
3
name value
name value
name value

The user would know not to write namevalue. As long as you don't assign any significance to the amount of whitespace it should be fine.


Yeah I guess white space is an easily solvable example. What I worry about is when you get into wanting to define nested elements and other things. For example maybe I have an item with a name and 2 values. Or multiple key-value pairs under another name. Do we have each pair on a separate line with significant indentation:
1
2
3
4
name value
  name value
  name value
  name value


Or do we do some sort of inline syntax:
1
name1 value1 name2 value2 name3 value3


What happens when I want the value to be a string with spaces or special characters? ('\n', '\t', etc.)

I think all of these problems can be solved but the fact that they have to be solved makes it hard for the user to know what they can and can't type. I think there's some benefit to going with a predefined language that has a decent amount of documentation and user base. However, whether or not JSON or YAML are commonly understood languages I can't really say. So maybe the benefit isn't as large as I'd like to hope.

It might be important to mention also that I'm not really looking for a fully featured serialization language but I would like some support for strings, grouped key-value pairs, and possibly arrays. There's also a possibility I might be adding options that use regular expressions so string formats that require less escaping would be better.

Edited by Taylor Robbins on
ProfessorSil
TOML looks nice but unless I'm missing something it's just a definition? There's no library for parsing it?
Scroll down :)
https://github.com/toml-lang/toml#user-content-implementations
From your text I'm not sure if you are aware, so I might be stating the obvious: What you describe in terms of syntax ambiguity is also possible in JSON etc. It's not a matter of what configuration language you use, but rather how good the configuration language library is at parsing files conforming to language standards and formatting styles.

With that being said, I would recommend using your own configuration format and write a generic token parser and a syntax checker for it. But I admit I'm biased: Don't like to use JSON, XML etc. for anything that is supposed to work offline.