[Resolved] Editor silently devours HTML tags.

I should be able to type e.g. < div > without any spaces in the editor, and it should properly escape it (i.e. "& lt ;div& gt ;" w/o spaces) such that it displays in my post as plain text.

At the moment, they seem to be silently stripped and are nowhere to be found in the output source.

Edit: Apparently when I type "& l t ;" into the editor without spaces, it shows up as "<". Again, this should be escaped so that I can type ampersands without them being evaluated as HTML entities.

Edit 2: Furthermore, when I edit this post, "& l t ;" (w/o spaces) gets evaluated to "<" and that's what shows up in the editor. When I edit a post, I expect to see the exact same text that I entered into the editor when I originally posted.

I know these are basically all the same bug, I'm just trying to be explicit to help the developer fix and verify the behavior.

Edited by Jeroen van Rijn on Reason: Resolved
dbechrd
I should be able to type e.g. < div > without any spaces in the editor, and it should properly escape it (i.e. "& lt ;div& gt ;" w/o spaces) such that it displays in my post as plain text.

At the moment, they seem to be silently stripped and are nowhere to be found in the output source.

Edit: Apparently when I type "& l t ;" into the editor without spaces, it shows up as "<". Again, this should be escaped so that I can type ampersands without them being evaluated as HTML entities.

Edit 2: Furthermore, when I edit this post, "& l t ;" (w/o spaces) gets evaluated to "<" and that's what shows up in the editor. When I edit a post, I expect to see the exact same text that I entered into the editor when I originally posted.

I know these are basically all the same bug, I'm just trying to be explicit to help the developer fix and verify the behavior.


Tags like div are stripped on purpose. I'll have a think about escaping them as plain text outside of code tags. Right now anything not on the whitelist gets nixed, but I can see an update turning that into escaping instead of stripping.

I don't understand the use case of wanting to write < div > outside of a code tag, but that's another story.
The use case is me talking about HTML/XML or anything else that coincidentally contains "special" symbols in conversation as I was doing above. Having to use a big, thought-interrupting code block to type a 5-character symbol is rather silly.

If I were posting actual sizable code blocks, I would of course use a code block.

Having a blacklist that eats things without telling the user what the blacklist contains makes for a rather confusing experience. Especially for someone who may not understand how filtering works or what its purpose is.

Edited by Dan on
dbechrd
The use case is me talking about HTML/XML or anything else that coincidentally contains "special" symbols in conversation as I was doing above. Having to use a big, thought-interrupting code block to type a 5-character symbol is rather silly.

If I were posting actual sizable code blocks, I would of course use a code block.


Fair enough :) I just erred on the side of caution during development. I don't mind reevaluating the decisions made at the time and updating the code if need be.

I'll roll it in with the other parser updates. It'll probably be addressed by mid next week, at which time I'll reparse existing posts so people don't have to edit the posts to get the updated output.

The reason it'll take a bit longer is that the interaction between the parser, the sanitiser and the code highlighter are interesting to say the least. I need to be certain that a) sanisation doesn't break legit output and b) it doesn't open up xss or style hacks and the like. A tool can only partially help in making that call.
dbechrd
The use case is me talking about HTML/XML or anything else that coincidentally contains "special" symbols in conversation as I was doing above. Having to use a big, thought-interrupting code block to type a 5-character symbol is rather silly.

If I were posting actual sizable code blocks, I would of course use a code block.

Having a blacklist that eats things without telling the user what the blacklist contains makes for a rather confusing experience. Especially for someone who may not understand how filtering works or what its purpose is.


On the contrary, it's a whitelist of allowed tags. The rest gets eaten. A blacklist is the reverse, where tags on a certain list get eaten and the rest gets passed through.
Kelimion

Fair enough :) I just erred on the side of caution during development. I don't mind reevaluating the decisions made at the time and updating the code if need be.

Trust me when I say I *much* prefer slight inconveniences of gaping security holes. This approach was the right one initially, but we both agree it can be improved. That is the way to do things, for sure, and exactly why this feedback section exists.

Kelimion

On the contrary, it's a whitelist of allowed tags. The rest gets eaten. A blacklist is the reverse, where tags on a certain list get eaten and the rest gets passed through.

Misread "whitelist" at "blacklist", my bad. Whitelist is the way to go!
This is relevant not only for HTML tags, but also for C++ code. If I want to type in some sentence #include < string > I want it to shown with < and > brackets and not silently dropped.

Edited by Mārtiņš Možeiko on
mmozeiko
This is relevant not only for HTML tags, but also for C++ code. If I want to type in some sentence #include < string > I want it to shown with < and > brackets and not silently dropped.


Let's see: <test> & so on <!-- more testing -->
a = b << 2;

1
2
<test> & so on <!-- more testing -->
a = b << 2;