8000 Escape terminator characters in filter arguments by depau · Pull Request #25 · kkroening/ffmpeg-python · GitHub
[go: up one dir, main page]

Skip to content

Escape terminator characters in filter arguments #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 12, 2017

Conversation

depau
Copy link
Collaborator
@depau depau commented Jul 10, 2017

Self explanatory. I lost an hour trying to understand why it wouldn't trim videos correctly when I specified the timestamps as HH:MM:SS.fraction. Damnit.

ffmpeg/nodes.py Outdated
for k in keys:
if not isinstance(seq[k], basestring):
continue
for ch in "[]=;:,":
Copy link
Owner
@kkroening kkroening Jul 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also escape \ chars, e.g. \ => \\\\\\\\ (see comment above)

@kkroening
Copy link
Owner

Cool - I was wondering when we would run into this. It's probably going to need some additional tweaking as more edge cases are uncovered but it's a good start.

Gotta figure out why tests are failing though. I'll take a look

ffmpeg/nodes.py Outdated
if not isinstance(seq[k], basestring):
continue
for ch in "[]=;:,":
seq[k] = seq[k].replace(ch, "\\\\"+ch)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "\\"+ch?

ffmpeg/nodes.py Outdated
kwarg_params = ['{}={}'.format(k, self._kwargs[k]) for k in sorted(self._kwargs)]

# Helper function to escape uncomfortable characters
def escape_chars(seq, keys):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we pull this out to a global function we can put a test around it:

def _escape_chars(text):
    """Helper function to escape uncomfortable characters."""
    for ch in '\\[]=;:,':
        text = text.replace(ch, '\\' + ch)
    return text

# in test_ffmpeg.py:
def test_escape_chars():
    assert ffmpeg.nodes._escape_chars('a[b]c=d;e,\\f') == 'a\\[b\\]c\\=d\\;e\\,\\\\f'

Then FilterNode._get_filter becomes..

def _get_filter(self):
    params_text = self._name
    _args = [_escape_chars(str(x)) for x in self._args]
    _kwargs = {k: _escape_chars(str(v)) for k, v in self._kwargs.items()}
    ...

Copy link
Owner
@kkroening kkroening Jul 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I ran a few scenarios against ffmpeg using drawtext, and ffmpeg's string unescaping is super bizarre. Some chars only need a single backslash before them, some require two backslashes, and some require eight backslashes.

Here's an example that draws the string a\b:

ffmpeg -i ffmpeg/tests/sample_data/dummy.mp4 -filter_complex '[0]drawtext=x=0:y=100:text=a\\\\\\\\b:fontfile=/Library/Fonts/Arial.ttf[1]' -map '[1]' out.mp4

Yup, eight backslashes.

I found this stackoverflow article talking about it:

Special character escapes are like violence: if they're not solving your problem, you're not using enough.
ffmpeg -i test.mpg -vf drawtext=text="It\\\\\'s so easy"

Ahaha.

Anyways I'm gonna try to see if there's any method to the madness and whether there's a way to completely correctly escape. It's nice to have this done in a tool like ffmpeg-python so that hopefully nobody else has to deal with ffmpeg's insanity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG this is crazy. In fact I did notice that I only needed one backslash when running it on the shell, and two with python, which resulted in 4 backslashes.

I'm currently running my code in my testing branch and I'm keeping it for the moment as it works for timestamps (which are basically the only thing that needs escaping in our project, though we could avoid it using SS.fraction only).

Let me know if I should commit the change you proposed or if you're going to do it some other way.

The text is being parsed a couple times, so you not only have to escape the quote, you also have to escape the slash escaping the quote. Twice.

If what the StackOverflow article says is right, we should do a few real-world tests with ffmpeg, find out how many backslashes are required for a particular filter or character, then make an intelligent escaper out of these numbers. The escaper would escape characters in trim arguments with r'\\', text argument of drawtext with r'\\\, etc., or fallback to some default.

Copy link
Owner
@kkroening kkroening Jul 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my science results:

`'`: 7 backslashes in drawtext text param, 3 in fontfile; 3 in param names; 1 in filter name
`\`: 7 backslashes in drawtext text param, 3 in fontfile; 3 in param names; 1 in filter name
`%`: 4 backslashes in drawtext text param; unknown elsewhere
`:`: 2 backslashes
`,`: 1 backslash
`[`: 1 backslash
`]`: 1 backslash
`=`: 1 backslash
newline: 0 backslash; use raw newline char

I was thinking something like this:

args = [escape_chars(str(x, '=:')) for x in self._args]
kwargs = {}
for k, v in self._kwargs.items():
    k = escape_chars(k, '\\\'=:')
    v = escape_chars(str(v), '\\\'=:')
    kwargs[k] = v

arg_params = ['{}'.format(arg) for arg in args]
kwarg_params = ['{}={}'.format(k, kwargs[k]) for k in sorted(kwargs)]
params = ':'.join(arg_params + kwarg_params)

params_text = self._name
if params:
    params_text += '={}'.format(':'.join(params))
return escape_chars(params_text, '\\\'[],;')

And then another escape_chars for rendering the text param of drawtext. I think this produces similar results to what's above. But need more science

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's any way to encode unicode chars in ffmpeg args. You'd think there would be, since people might want to render Chinese text over a video

@kkroening
Copy link
Owner
kkroening commented Jul 11, 2017

Alright so as mentioned above, there are a lot of weird ffmpeg string-escaping quirks. I tried a bunch of scenarios to try to figure out how ffmpeg unescapes strings and still don't fully understand it. Looks like there are multiple layers of escaping that need to happen. I think I'm close to figuring out the quirks but have to wrap up for now and try to get to it tomorrow.

But overall the PR looks mostly good as a starting point aside from failing tests.

@kkroening
Copy link
Owner

Ok I updated the string escaping to be pretty robust.

I added the drawtext filter since its text arg has slightly more complicated string escaping that other filter args.

The string escaping actually makes sense to me now. The reason why there are so many backslashes is that \ and ' characters are escape/unescaped numerous times as the command-line is constructed/destructed. It's actually intentional by the ffmpeg designers so that tools like ffmpeg-python can programmatically escape things in a well-defined manner. It's a pain to do by hand, but libraries like this can automate it and make it a non-issue.

The tests are table-driven to make sure all of the characters are escaped as desired, both in the drawtext text arg and normal args. This is nearly a direct translation of the science table I posted earlier with the character -> expected backslash count:

    expected_backslash_counts = {
        'x': 0,
        '\'': 7,
        '\\': 7,
        '%': 4,
        ':': 2,
        ',': 1,
        '[': 1,
        ']': 1,
        '=': 2,
        '\n': 0,
    }
    for ch, expected_backslash_count in expected_backslash_counts.items():
        expected = '{}{}'.format('\\' * expected_backslash_count, ch)
        actual = _get_drawtext_text_repr(ch)
        assert expected == actual

So basically with these tests, we should be pretty confident that we escape all things correctly in all circumstances. And if there are any extra edge cases, we can just add it to the above test table and then see the tests pass once we fix the code.

@kkroening
Copy link
Owner
kkroening commented Jul 12, 2017

Here it is in action:

(ffmpeg
    .input('in.mp4')
    .drawtext(
        u'some quotes: \'yay!\'\na newline\nand more stuff :,\\\'\nthe end',
        x=0,
        y=100,
        fontfile='LiberationSans.ttf', box=1
    )
    .output('out.mp4')
    .overwrite_output()
    .run()
)

Args:

['-i',
 'in.mp4',
 '-filter_complex',
 "[0]drawtext=box=1:fontfile=LiberationSans.ttf:text=some quotes\\\\: \\\\\\\\\\\\\\'yay!\\\\\\\\\\\\\\'\na newline\nand more stuff \\\\:\\,\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'\nthe end:y=100[v0]",
 '-map',
 '[v0]',
 'out.mp4']

Output:
out0001

Unicode characters render as squares though - haven't been able to figure that out. But can save that for another time - at least those chars don't cause problems other than rendering inappropriately.

EDIT: oh actually it does work with unicode, aside from a really obscure one I tested.

@kkroening
Copy link
Owner

I'm gonna merge this so I can pull it into the other branches. Let me know if you find any issues with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0