[Resolved] Editor silently devours HTML tags.

Dan

#6624

April 29, 2016

I should be able to type e.g. < div > without any spaces in the editor, and it should properly escape it (i.e. "& lt ;div& gt ;" w/o spaces) such that it displays in my post as plain text.

At the moment, they seem to be silently stripped and are nowhere to be found in the output source.

Edit: Apparently when I type "& l t ;" into the editor without spaces, it shows up as "<". Again, this should be escaped so that I can type ampersands without them being evaluated as HTML entities.

Edit 2: Furthermore, when I edit this post, "& l t ;" (w/o spaces) gets evaluated to "<" and that's what shows up in the editor. When I edit a post, I expect to see the exact same text that I entered into the editor when I originally posted.

I know these are basically all the same bug, I'm just trying to be explicit to help the developer fix and verify the behavior.

Edited by Jeroen van Rijn on August 7, 2016, 4:01pm Reason: Resolved

Jeroen van Rijn

#6637

April 29, 2016

dbechrd
I should be able to type e.g. < div > without any spaces in the editor, and it should properly escape it (i.e. "& lt ;div& gt ;" w/o spaces) such that it displays in my post as plain text.

At the moment, they seem to be silently stripped and are nowhere to be found in the output source.

Edit: Apparently when I type "& l t ;" into the editor without spaces, it shows up as "<". Again, this should be escaped so that I can type ampersands without them being evaluated as HTML entities.

Edit 2: Furthermore, when I edit this post, "& l t ;" (w/o spaces) gets evaluated to "<" and that's what shows up in the editor. When I edit a post, I expect to see the exact same text that I entered into the editor when I originally posted.

I know these are basically all the same bug, I'm just trying to be explicit to help the developer fix and verify the behavior.

Tags like div are stripped on purpose. I'll have a think about escaping them as plain text outside of code tags. Right now anything not on the whitelist gets nixed, but I can see an update turning that into escaping instead of stripping.

I don't understand the use case of wanting to write < div > outside of a code tag, but that's another story.

Dan

#6642

April 29, 2016

The use case is me talking about HTML/XML or anything else that coincidentally contains "special" symbols in conversation as I was doing above. Having to use a big, thought-interrupting code block to type a 5-character symbol is rather silly.

If I were posting actual sizable code blocks, I would of course use a code block.

Having a blacklist that eats things without telling the user what the blacklist contains makes for a rather confusing experience. Especially for someone who may not understand how filtering works or what its purpose is.

Edited by Dan on April 29, 2016, 6:50pm

Jeroen van Rijn

#6643

April 29, 2016

dbechrd
The use case is me talking about HTML/XML or anything else that coincidentally contains "special" symbols in conversation as I was doing above. Having to use a big, thought-interrupting code block to type a 5-character symbol is rather silly.

If I were posting actual sizable code blocks, I would of course use a code block.

Fair enough :) I just erred on the side of caution during development. I don't mind reevaluating the decisions made at the time and updating the code if need be.

I'll roll it in with the other parser updates. It'll probably be addressed by mid next week, at which time I'll reparse existing posts so people don't have to edit the posts to get the updated output.

The reason it'll take a bit longer is that the interaction between the parser, the sanitiser and the code highlighter are interesting to say the least. I need to be certain that a) sanisation doesn't break legit output and b) it doesn't open up xss or style hacks and the like. A tool can only partially help in making that call.

Jeroen van Rijn

#6645

April 29, 2016

dbechrd
The use case is me talking about HTML/XML or anything else that coincidentally contains "special" symbols in conversation as I was doing above. Having to use a big, thought-interrupting code block to type a 5-character symbol is rather silly.

If I were posting actual sizable code blocks, I would of course use a code block.

Having a blacklist that eats things without telling the user what the blacklist contains makes for a rather confusing experience. Especially for someone who may not understand how filtering works or what its purpose is.

On the contrary, it's a whitelist of allowed tags. The rest gets eaten. A blacklist is the reverse, where tags on a certain list get eaten and the rest gets passed through.

Dan

#6649

April 29, 2016

Kelimion

Fair enough :) I just erred on the side of caution during development. I don't mind reevaluating the decisions made at the time and updating the code if need be.

Trust me when I say I *much* prefer slight inconveniences of gaping security holes. This approach was the right one initially, but we both agree it can be improved. That is the way to do things, for sure, and exactly why this feedback section exists.

Kelimion

On the contrary, it's a whitelist of allowed tags. The rest gets eaten. A blacklist is the reverse, where tags on a certain list get eaten and the rest gets passed through.

Misread "whitelist" at "blacklist", my bad. Whitelist is the way to go!

Mārtiņš Možeiko

#7170

June 7, 2016

This is relevant not only for HTML tags, but also for C++ code. If I want to type in some sentence #include < string > I want it to shown with < and > brackets and not silently dropped.

Edited by Mārtiņš Možeiko on June 7, 2016, 5:48pm

Jeroen van Rijn

#7921

August 7, 2016

mmozeiko
This is relevant not only for HTML tags, but also for C++ code. If I want to type in some sentence #include < string > I want it to shown with < and > brackets and not silently dropped.

Let's see: <test> & so on 
a = b << 2;

1 2	<test> & so on <!-- more testing --> a = b << 2;