id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	launchpad_bug
2138	file formatting conventions for text files in our source repo	zooko	daira	"This makes it so that emacs knows the intended character encoding, BOM, end-of-line markers, standard line-width, and tabs-vs-spaces policy for these files.

This is also a form of documentation. It means that you should put only utf-8-encoded things into text files, only utf-8-encoded things into source code files (and actually you should write only put ASCII-encoded things except possibly in comments or docstrings!), and that you should line-wrap everything at 77 columns wide.

It also specifies that text files should start with a ""utf-8 BOM"". (Brian questions the point of this, and my answer is that it adds information and doesn't hurt. Whether that information will ever be useful is an open question.)

It also specifies that text files should have unix-style end-of-line markers (i.e. '\n'), not windows-style or old-macos-style.

For Python source code files, it also specifies that you should not insert tab characters (so you should use spaces for Python block structure).

I generated this patch by writing and running the following script, and then reading the resulting diff to make sure it was correct. I then undid the changes that the script had done to the files inside the ""setuptools-0.6c16dev4.egg"" directory before committing the patch.

------- begin appended script::
{{{
#!/usr/bin/env python
# -*- coding: utf-8-with-signature-unix; fill-column: 77 -*-

import os

magic_header_line_comment_prefix = {
    '.py': u""# "",
    '.rst': u"".. "",
    }

def format():
    for dirpath, dirnames, filenames in os.walk('.'):
        for filename in filenames:
            ext = os.path.splitext(filename)[-1]
            if ext in ('.py', '.rst'):
                fname = os.path.join(dirpath, filename)
                info = open(fname, 'rU')
                formattedlines = [ line.decode('utf-8') for line in info ]
                info.close()

                if len(formattedlines) == 0:
                    continue

                outfo = open(fname, 'w')
                outfo.write(u""\ufeff"".encode('utf-8'))

                commentsign = magic_header_line_comment_prefix[ext]

                firstline = formattedlines.pop(0)
                while firstline.startswith(u""\ufeff""):
                    firstline = firstline[len(u""\ufeff""):]
                if firstline.startswith(u""#!""):
                    outfo.write(firstline.encode('utf-8'))
                    outfo.write(commentsign+""-*- coding: utf-8-with-signature-unix; fill-column: 77 -*-\n"".encode('utf-8'))
                    if ext == '.py':
                        outfo.write(commentsign+""-*- indent-tabs-mode: nil -*-\n"".encode('utf-8'))
                else:
                    outfo.write(commentsign+""-*- coding: utf-8-with-signature-unix; fill-column: 77 -*-\n"".encode('utf-8'))
                    if ext == '.py':
                        outfo.write(commentsign+""-*- indent-tabs-mode: nil -*-\n"".encode('utf-8'))
                    if (firstline.strip().startswith(commentsign)) and (""-*-"" in firstline) and (""coding:"" in firstline):
                        print ""warning there was already a coding line %r in %r""  % (firstline, fname)
                    else:
                        outfo.write(firstline.encode('utf-8'))

                for l in formattedlines:
                    if (l.strip().startswith(commentsign)) and (""-*-"" in l) and (""coding:"" in l):
                        print ""warning there was already a coding line %r in %r""  % (l, fname)
                    else:
                        outfo.write(l.encode('utf-8'))
                outfo.close()

if __name__ == '__main__':
    format()
}}}"	enhancement	new	normal	undecided	unknown	1.10.0