-
Notifications
You must be signed in to change notification settings - Fork 919
Escape terminator characters in filter arguments #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ffmpeg/nodes.py
Outdated
for k in keys: | ||
if not isinstance(seq[k], basestring): | ||
continue | ||
for ch in "[]=;:,": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also escape \
chars, e.g. \
=> \\\\\\\\
(see comment above)
Cool - I was wondering when we would run into this. It's probably going to need some additional tweaking as more edge cases are uncovered but it's a good start. Gotta figure out why tests are failing though. I'll take a look |
ffmpeg/nodes.py
Outdated
if not isinstance(seq[k], basestring): | ||
continue | ||
for ch in "[]=;:,": | ||
seq[k] = seq[k].replace(ch, "\\\\"+ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be "\\"+ch
?
ffmpeg/nodes.py
Outdated
kwarg_params = ['{}={}'.format(k, self._kwargs[k]) for k in sorted(self._kwargs)] | ||
|
||
# Helper function to escape uncomfortable characters | ||
def escape_chars(seq, keys): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we pull this out to a global function we can put a test around it:
def _escape_chars(text):
"""Helper function to escape uncomfortable characters."""
for ch in '\\[]=;:,':
text = text.replace(ch, '\\' + ch)
return text
# in test_ffmpeg.py:
def test_escape_chars():
assert ffmpeg.nodes._escape_chars('a[b]c=d;e,\\f') == 'a\\[b\\]c\\=d\\;e\\,\\\\f'
Then FilterNode._get_filter
becomes..
def _get_filter(self):
params_text = self._name
_args = [_escape_chars(str(x)) for x in self._args]
_kwargs = {k: _escape_chars(str(v)) for k, v in self._kwargs.items()}
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I ran a few scenarios against ffmpeg
using drawtext
, and ffmpeg
's string unescaping is super bizarre. Some chars only need a single backslash before them, some require two backslashes, and some require eight backslashes.
Here's an example that draws the string a\b
:
ffmpeg -i ffmpeg/tests/sample_data/dummy.mp4 -filter_complex '[0]drawtext=x=0:y=100:text=a\\\\\\\\b:fontfile=/Library/Fonts/Arial.ttf[1]' -map '[1]' out.mp4
Yup, eight backslashes.
I found this stackoverflow article talking about it:
Special character escapes are like violence: if they're not solving your problem, you're not using enough.
ffmpeg -i test.mpg -vf drawtext=text="It\\\\\'s so easy"
Ahaha.
Anyways I'm gonna try to see if there's any method to the madness and whether there's a way to completely correctly escape. It's nice to have this done in a tool like ffmpeg-python
so that hopefully nobody else has to deal with ffmpeg's insanity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OMG this is crazy. In fact I did notice that I only needed one backslash when running it on the shell, and two with python, which resulted in 4 backslashes.
I'm currently running my code in my testing branch and I'm keeping it for the moment as it works for timestamps (which are basically the only thing that needs escaping in our project, though we could avoid it using SS.fraction
only).
Let me know if I should commit the change you proposed or if you're going to do it some other way.
The text is being parsed a couple times, so you not only have to escape the quote, you also have to escape the slash escaping the quote. Twice.
If what the StackOverflow article says is right, we should do a few real-world tests with ffmpeg, find out how many backslashes are required for a particular filter or character, then make an intelligent escaper out of these numbers. The escaper would escape characters in trim
arguments with r'\\'
, text
argument of drawtext
with r'\\\
, etc., or fallback to some default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are my science results:
`'`: 7 backslashes in drawtext text param, 3 in fontfile; 3 in param names; 1 in filter name
`\`: 7 backslashes in drawtext text param, 3 in fontfile; 3 in param names; 1 in filter name
`%`: 4 backslashes in drawtext text param; unknown elsewhere
`:`: 2 backslashes
`,`: 1 backslash
`[`: 1 backslash
`]`: 1 backslash
`=`: 1 backslash
newline: 0 backslash; use raw newline char
I was thinking something like this:
args = [escape_chars(str(x, '=:')) for x in self._args]
kwargs = {}
for k, v in self._kwargs.items():
k = escape_chars(k, '\\\'=:')
v = escape_chars(str(v), '\\\'=:')
kwargs[k] = v
arg_params = ['{}'.format(arg) for arg in args]
kwarg_params = ['{}={}'.format(k, kwargs[k]) for k in sorted(kwargs)]
params = ':'.join(arg_params + kwarg_params)
params_text = self._name
if params:
params_text += '={}'.format(':'.join(params))
return escape_chars(params_text, '\\\'[],;')
And then another escape_chars
for rendering the text
param of drawtext
. I think this produces similar results to what's above. But need more science
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's any way to encode unicode chars in ffmpeg args. You'd think there would be, since people might want to render Chinese text over a video
Alright so as mentioned above, there are a lot of weird ffmpeg string-escaping quirks. I tried a bunch of scenarios to try to figure out how ffmpeg unescapes strings and still don't fully understand it. Looks like there are multiple layers of escaping that need to happen. I think I'm close to figuring out the quirks but have to wrap up for now and try to get to it tomorrow. But overall the PR looks mostly good as a starting point aside from failing tests. |
Ok I updated the string escaping to be pretty robust. I added the The string escaping actually makes sense to me now. The reason why there are so many backslashes is that The tests are table-driven to make sure all of the characters are escaped as desired, both in the
So basically with these tests, we should be pretty confident that we escape all things correctly in all circumstances. And if there are any extra edge cases, we can just add it to the above test table and then see the tests pass once we fix the code. |
Here it is in action:
Args:
Unicode characters render as squares though - haven't been able to figure that out. But can save that for another time - at least those chars don't cause problems other than rendering inappropriately. EDIT: oh actually it does work with unicode, aside from a really obscure one I tested. |
I'm gonna merge this so I can pull it into the other branches. Let me know if you find any issues with it. |
Self explanatory. I lost an hour trying to understand why it wouldn't trim videos correctly when I specified the timestamps as
HH:MM:SS.fraction
. Damnit.