The loop performs two jobs: Find the first tabstop, and counting the number of
characters before it. To count the number of characters before the
tabstop, it counts all bytes that are not UTF-8 continuation bytes.
The current form of the loop doesn't check the first character, but
checks the character past the range's end. Since these are both
usually non-continuation characters, it does the right thing accidentally.
However, it accesses the character range at index `size`, which is
forbidden and might be uninitialized for strings that are not
null-terminated.
Before this patch, a header like "# *A*" was displayed as
"<li><em>A</em></li>" in the TOC. The error was caused by
toc_header doing the HTML escaping. In the normal HTML renderer, the escaping
is done by the normal_text hook. This patch uses the same handling to
fix the issue.
Previously, expand_tabs() counted every *byte* rather than every *character*
to determine the proper tab stops. This change fixes that by skipping over
UTF-8 continuation bytes (bytes of the form 10xxxxxx), which are the only
types of bytes in valid UTF-8 that do not start a character.
This code does not skip over combining characters.
Only one parameter (`content`) can have rendered output.
The `table` callback contradicts this rule, so it has to be split
in `table_header` and `table_body`.
This also helps keep the HTML renderer code dry and readable.
As long as there's a valid matching ] at the end, and there's no
codespan inside the link, things should render exactly the same
as before. Please review this.
This fixes bugs for edge cases of pattern:
opening backtick ending [any] backtick [any] ending
Examples: *`**`* and __`__`__
Range handling in 48a19b3 and e7bcbe3 is also changed to match the
correct behavior.