wiki:Python3

Version 4 (modified by itamarst, at 2020-07-08T15:20:15Z) (diff)

--

Porting to Python 3

This is still a proposal at this stage.

Motivation

  • Make code behave the same on Python 2 and Python 3, insofar as one can, so e.g. map() is the same on Python 2 and Python 3 (i.e. lazy).
  • Reduce errors by relying on Python 2 behavior and tests as well as manual review.
  • Try to reduce grunt work.

How to choose a module to port

TBD, something involving core abstractions first, then dependency graph topological traversal.

Assume for now we've picked a module.

The porting process, big picture

For a module M, there is also a corresponding module T, the unittests for M. If the tests for M are embedded into a module that tests multiple modules, step one is to split off the tests so there's T that only tests M.

Then:

  1. Update T to run on both 2+3 (see below for what that looks like).
  2. Run T's tests on Python 2. They should still pass! If they don’t, something broke.
  3. Port the code module M.
  4. Now run T's tests on Python 3.
  5. Fix any problems caught by the tests.
  6. Add both M and T to allmydata/util/_python3.py.
  7. TODO: Not yet possible, but once the ratchet infrastructure is in place, update the should-be-passing-on-Python-3 tests list to include the tests in T plus any other newly passing tests, so that future development doesn't regress Python 3 support.
  8. Submit for code review.
  9. Check coverage report. If there are uncovered lines, see if you can add tests, or at least file a separate ticket for adding coverage.

Porting a specific Python file

First, add explicit byte or unicode annotations for strings where needed.

Second, run futurize --write --both-stages --all-imports path/to/file.py.

Third, replace the from builtins import * variant, if any, with:

from future.utils import PY2
if PY2:
    from builtins import filter, map, zip, ascii, chr, hex, input, next, oct, open, pow, round, super, bytes, dict, int, list, object, range, str, max, min  # noqa: F401

When things get complicated

In practice, the methodology above is somewhat idealized: a sufficiently important module might have multiple test files, and might not be easily splittable.

This is where the test ratchet comes in. The test ratchet ensures that once a specific test is marked as passing in Python 3, it can't stop passing on Python 3. As a result, progress in porting need not involve a module being fully ported in one PR, or all tests being made to pass.

Thus, complex modules can be ported over multiple PRs by just increasing the list of passing tests in each PR, and then only marking the module as fully ported in the final PR. This adds builtins that match Python 3's semantics. The #noqa: F401 keeps flake8/pyflakes from complaining about unused imports. We do unused imports so that people changing code later don't have to manually check if map() is old style or new style.

Fourth, manually review the code. Futureize is nice, but it very definitely doesn't catch everything, or it makes wrong decisions.

In particular:

  • map(), filter(), etc. are now lazy.
  • dict.keys() and friends now return a view of the underlying data, rather than a list with a copy.

Fifth, add a note to the module docstring saying it was ported to Python 3.