`find_emph_char` returns 0 if the char specified is not found in the
current line, but this is also what happens when there's an empty
cell. This patch adds logic to work around this problem.
See uranusjr/macdown#321
When the `HOEDOWN_EXT_UNDERLINE` extension was enabled, underlined spans
would actually be passed verbatim to the output buffer. This was because
the active_char was _only_ set when the emphasis, double_emphasis, or
triple_emphasis handlers were registered. As a result, no active char
was found in the input buffer, so everything was passed through
verbatim.
This patch fixes this by also registering the `active_char` if the
underline handler is registered. I also added a simple regression test.
I personally don't use this extension, but I encountered this bug over
the course of writing bindings for Rust.
The loop performs two jobs: Find the first tabstop, and counting the number of
characters before it. To count the number of characters before the
tabstop, it counts all bytes that are not UTF-8 continuation bytes.
The current form of the loop doesn't check the first character, but
checks the character past the range's end. Since these are both
usually non-continuation characters, it does the right thing accidentally.
However, it accesses the character range at index `size`, which is
forbidden and might be uninitialized for strings that are not
null-terminated.
Before this patch, a header like "# *A*" was displayed as
"<li><em>A</em></li>" in the TOC. The error was caused by
toc_header doing the HTML escaping. In the normal HTML renderer, the escaping
is done by the normal_text hook. This patch uses the same handling to
fix the issue.
The original use of the renderer was for benchmarking,
to know the time used exclusively for parsing. But setting
NULL callbacks actually disabled parsing. No-op callbacks
should be used instead.
Previously, expand_tabs() counted every *byte* rather than every *character*
to determine the proper tab stops. This change fixes that by skipping over
UTF-8 continuation bytes (bytes of the form 10xxxxxx), which are the only
types of bytes in valid UTF-8 that do not start a character.
This code does not skip over combining characters.
Only one parameter (`content`) can have rendered output.
The `table` callback contradicts this rule, so it has to be split
in `table_header` and `table_body`.
This also helps keep the HTML renderer code dry and readable.