Opened at 2009-04-17T13:32:35Z
Last modified at 2014-12-02T19:41:19Z
#682 assigned defect
FTP frontend should support Unicode filenames encoded as UTF-8
| Reported by: | arthur | Owned by: | francois |
|---|---|---|---|
| Priority: | major | Milestone: | soon |
| Component: | code-frontend-ftp-sftp | Version: | 1.3.0 |
| Keywords: | i18n unicode ftpd names twisted | Cc: | amontero@… |
| Launchpad Bug: |
Description (last modified by amontero)
using ncftp on a put of a file with an é accent I get the following message :
[Requested action not taken: internal server error]
in the logs server side :
2009-04-17 15:22:07+0200 [ProtocolWrapper,3,127.0.0.1] Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.5/site-packages/twisted/internet/tcp.py", line 362, in doRead
return self.protocol.dataReceived(data)
File "/usr/lib/python2.5/site-packages/twisted/protocols/policies.py", line 72, in dataReceived
self.wrappedProtocol.dataReceived(data)
File "/usr/lib/python2.5/site-packages/twisted/protocols/basic.py", line 231, in dataReceived
why = self.lineReceived(line)
File "/usr/lib/python2.5/site-packages/twisted/protocols/ftp.py", line 698, in lineReceived
d = defer.maybeDeferred(self.processCommand, cmd, *args)
--- <exception caught here> ---
File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 106, in maybeDeferred
result = f(*args, **kw)
File "/usr/lib/python2.5/site-packages/twisted/protocols/ftp.py", line 729, in processCommand
return method(*params)
File "/usr/lib/python2.5/site-packages/twisted/protocols/ftp.py", line 1079, in ftp_STOR
d = self.shell.openForWriting(newsegs)
File "/usr/lib/python2.5/site-packages/allmydata/frontends/ftpd.py", line 255, in openForWriting
path = [unicode(p) for p in path]
exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 21: ordinal not in range(128)
Change History (26)
comment:1 Changed at 2009-04-23T10:05:58Z by francois
- Milestone changed from undecided to 1.5.0
- Owner changed from nobody to francois
comment:2 Changed at 2009-04-23T10:06:02Z by francois
- Status changed from new to assigned
comment:3 Changed at 2009-06-30T17:16:35Z by zooko
- Milestone changed from 1.5.0 to eventually
comment:4 Changed at 2009-07-11T11:28:04Z by warner
- Component changed from unknown to code-frontend
- Description modified (diff)
reformatted description slightly
comment:5 Changed at 2009-11-23T02:37:56Z by davidsarah
- Keywords i18n unicode added
comment:6 Changed at 2010-01-15T02:39:15Z by davidsarah
- Keywords ftp added
comment:7 follow-up: ↓ 9 Changed at 2010-01-15T02:41:01Z by davidsarah
See RFC 2640 for FTP internationalization.
comment:8 Changed at 2010-02-07T16:43:48Z by davidsarah
- Keywords ftpd added; ftp removed
comment:9 in reply to: ↑ 7 ; follow-up: ↓ 16 Changed at 2010-06-15T23:41:39Z by davidsarah
Replying to davidsarah:
See RFC 2640 for FTP internationalization.
Summary:
- include UTF8 in the response to a FEAT request;
- use UTF-8;
- reject filenames that are not valid UTF-8.
Admirably simple :-)
(See also #1076 about normalization, but that will probably be done in the dirnode interface rather than in frontends.)
comment:10 Changed at 2010-06-15T23:50:30Z by davidsarah
Hmm, judging by the exception message ("'ascii' codec can't decode byte 0xe0"), ncftp was trying to use ISO-Latin-1 rather than UTF-8. But at least it would be possible for clients to do the right thing, so I still think we should implement RFC 2640.
comment:11 Changed at 2010-06-15T23:57:51Z by davidsarah
Actually 'é' is 0xE9 in ISO-Latin-1, so I don't know what encoding this was (but not UTF-8).
comment:12 Changed at 2010-06-15T23:59:14Z by davidsarah
- Description modified (diff)
comment:13 Changed at 2010-06-16T00:04:51Z by davidsarah
- Summary changed from FTP frontend refuses accents to FTP frontend should support Unicode filenames
comment:14 Changed at 2010-06-16T16:45:50Z by zooko
With the new improved pyutil-1.7.9 you get this handy-dandy script called "try_decoding":
HACL:~/playground/pyutil/bothw$ python -c 'open("d","wb").write(chr(0xe0))'
HACL:~/playground/pyutil/bothw$ try_decoding d -t é
HACL:~/playground/pyutil/bothw$
Oh hey there are no encodings known to Python 2.6.1 which would decode 0xe0 to é!
Here are all the things that all the encodings would decode 0xe0 to:
HACL Zooko-Ofsimplegeos-MacBook-Pro:~/playground/pyutil/bothw$ try_decoding d
charmap : à
cp037 : \
cp1006 : ﻓ
cp1026 : ü
cp1140 : \
cp1250 : ŕ
cp1251 : а
cp1252 : à
cp1253 : ΰ
cp1254 : à
cp1255 : א
cp1256 : à
cp1257 : ą
cp1258 : à
cp424 : \
cp437 : α
cp500 : \
cp737 : ω
cp775 : Ó
cp850 : Ó
cp852 : Ó
cp855 : Я
cp857 : Ó
cp860 : α
cp861 : α
cp862 : α
cp863 : α
cp864 : ـ
cp865 : α
cp866 : р
cp869 : ζ
cp874 : เ
cp875 : \
hp_roman8 : Á
iso8859_1 : à
iso8859_10 : ā
iso8859_11 : เ
iso8859_13 : ą
iso8859_14 : à
iso8859_15 : à
iso8859_16 : à
iso8859_2 : ŕ
iso8859_3 : à
iso8859_4 : ā
iso8859_5 : р
iso8859_6 : ـ
iso8859_7 : ΰ
iso8859_8 : א
iso8859_9 : à
koi8_r : Ю
koi8_u : Ю
latin_1 : à
mac_arabic : ـ
mac_centeuro : ŗ
mac_croatian : –
mac_cyrillic : а
mac_farsi : ـ
mac_greek : ύ
mac_iceland : ý
mac_latin2 : ŗ
mac_roman : ‡
mac_romanian : ‡
mac_turkish : ‡
palmos : à
ptcp154 : а
raw_unicode_escape : à
rot_13 : à
tis_620 : เ
unicode_escape : à
comment:15 Changed at 2010-06-17T21:56:18Z by davidsarah
#1089 discusses the use of non-UTF-8 encodings by FTP and SFTP clients.
comment:16 in reply to: ↑ 9 Changed at 2010-06-21T01:30:51Z by davidsarah
Replying to davidsarah:
Summary:
- include UTF8 in the response to a FEAT request; [...]
Twisted's FTP implementation does not currently implement FEAT. However it is implemented in such a way that it's relatively easy to monkey-patch it to do so, and no more ugly than monkey-patching always is. Something like (untested):
def ftp_FEAT(self, arg=None):
if not (hasattr(self, 'shell') and hasattr(self.shell, 'feat') and
hasattr(self, 'sendLine')):
log.msg("Assumption needed to monkey-patch FEAT support in Twisted "
"does not hold", level=log.WEIRD)
return defer.fail(ftp.CmdNotImplementedError('FEAT'))
if arg is not None:
return defer.fail(ftp.CmdSyntaxError('FEAT does not take any argument'))
d = defer.maybeDeferred(self.shell.feat)
def _reply(features):
self.sendLine('211- Featuretastic!')
for f in features:
self.sendLine(' ' + f)
return ftp.SYS_STATUS_OR_HELP_REPLY
d.addCallback(_reply)
return d
if not hasattr(ftp.FTP, 'ftp_FEAT'):
ftp.FTP.ftp_FEAT = ftp_FEAT
class Handler...
def feat(self):
if self.encoding_is_utf8():
return ['UTF8']
else:
return []
comment:17 Changed at 2010-06-21T01:52:26Z by davidsarah
- Milestone changed from eventually to soon
comment:18 Changed at 2010-06-21T03:14:26Z by davidsarah
- Keywords names added
comment:19 Changed at 2010-06-21T21:17:32Z by zooko
I opened http://twistedmatrix.com/trac/ticket/4515 (support the FTP FEAT request).
comment:20 Changed at 2010-06-21T21:17:43Z by zooko
- Keywords twisted added
comment:21 Changed at 2011-02-02T23:50:46Z by davidsarah
- Summary changed from FTP frontend should support Unicode filenames to FTP frontend should support Unicode filenames encoded as UTF-8
comment:22 Changed at 2012-12-28T06:32:01Z by zooko
Twisted #4515 has been closed.
comment:23 Changed at 2012-12-28T23:52:45Z by davidsarah
Unfortunately the fix for that ticket isn't sufficient, because
adiroiban wrote in http://twistedmatrix.com/trac/ticket/4515#comment:13:
I don't plan to add IFTPShell.FEATURES in this patch since without UTF-8 support there will be nothing to export. Beside UTF-8 all other features (SIZE, MDTM, ect) are tied to the protocol.FTP implementation.
With this change, it is possible to declare support for UTF-8 by monkeypatching twisted.protocols.ftp.FTP.FEATURES, but that depends on an implementation detail, which is what we were trying to avoid. (Granted, it's a slightly less ugly monkeypatch.)
I don't know why adiroiban ignored me when I pointed out that the goal of that ticket could be achieved in a simpler way that would have been sufficient. Maybe I should have argued the case more strenuously.
comment:24 Changed at 2012-12-29T00:01:34Z by davidsarah
Sigh, and it doesn't have a conformant implementation of OPTS:
def ftp_OPTS(self, option):
"""
Handle OPTS command.
http://tools.ietf.org/html/draft-ietf-ftpext-utf-8-option-00
"""
return self.reply(OPTS_NOT_IMPLEMENTED, option)
http://tools.ietf.org/html/draft-ietf-ftpext-utf-8-option-00 says:
2. UTF-8 Option The user issues the OPTS UTF-8 command to indicate its willingness to send and receive UTF-8 encoded pathnames over the control connection. Prior to sending this command, the user should not transmit UTF-8 encoded pathnames.
comment:25 Changed at 2013-07-27T12:56:12Z by amontero
- Cc amontero@… added
- Description modified (diff)
comment:26 Changed at 2014-12-02T19:41:19Z by warner
- Component changed from code-frontend to code-frontend-ftp-sftp

This is definitely the same sort of encoding issues as in #534. I'll try to have a look at it.