Why do languages like Java use hierarchical package names, while Python does not?

Python doesn't do this because you end up with a problem -- who owns the "com" package that almost everything else is a subpackage of? Python's method of establishing package hierarchy (through the filesystem hierarchy) does not play well with this convention at all. Java can get away with it because package hierarchy is defined by the structure of the string literals fed to the 'package' statement, so there doesn't need to be an explicit "com" package anywhere.

There's also the question of what to do if you want to publicly release a package but don't own a domain name that's suitable for bodging into the package name, or if you end up changing (or losing) your domain name for some reason. (Do later updates need a different package name? How do you know that com.nifty_consultants.nifty_utility is a newer version of com.joe_blow_software.nifty_utility? Or, conversely, how do you know that it's not a newer version? If you miss your domain renewal and the name gets snatched by a domain camper, and someone else buys the name from them, and they want to publicly release software packages, should they then use the same name that you had already used?)

Domain names and software package names, it seems to me, address two entirely different problems, and have entirely different complicating factors. I personally dislike Java's convention because (IMHO) it violates separation of concerns. Avoiding namespace collisions is nice and all, but I hate the thought of my software's namespace being defined by (and dependent on) the marketing department's interaction with some third-party bureaucracy.

To clarify my point further, in response to JeeBee's comment: In Python, a package is a directory containing an __init__.py file (and presumably one or more module files). A package hierarchy requires that each higher-level package be a full, legitimate package. If two packages (especially from different vendors, but even not-directly-related packages from the same vendor) share a top-level package name, whether that name is 'com' or 'web' or 'utils' or whatever, each one must provide an __init__.py for that top-level package. We must also assume that these packages are likely to be installed in the same place in the directory tree, i.e. site-packages/[pkg]/[subpkg]. The filesystem thus enforces that there is only one [pkg]/__init__.py -- so which one wins? There is not (and cannot be) a general-case correct answer to that question. Nor can we reasonably merge the two files together. Since we can't know what another package might need to do in that __init__.py, subpackages sharing a top-level package cannot be assumed to work when both are installed unless they are specifically written to be compatible with each other (at least in this one file). This would be a distribution nightmare and would pretty much invalidate the entire point of nesting packages. This is not specific to reverse-domain-name package hierarchies, though they provide the most obvious bad example and (IMO) are philosophically questionable -- it's really the practical issue of shared top-level packages, rather than the philosophical questions, that are my main concern here.

(On the other hand, a single large package using subpackages to better organize itself is a great idea, since those subpackages are specifically designed to work and live together. This is not so common in Python, though, because a single conceptual package doesn't tend to require a large enough number of files to need the extra layer of organization.)

If Guido himself announced that the reverse domain convention ought to be followed, it wouldn't be adopted, unless there were significant changes to the implementation of import in python.

Consider: python searches an import path at run-time with a fail-fast algorithm; java searches a path with an exhaustive algorithm both at compile-time and run-time. Go ahead, try arranging your directories like this:

folder_on_path/
    com/
        __init__.py
        domain1/
            module.py
            __init__.py


other_folder_on_path/
    com/
        __init__.py
        domain2/
            module.py
            __init__.py

Then try:

from com.domain1 import module
from com.domain2 import module

Exactly one of those statements will succeed. Why? Because either folder_on_path or other_folder_on_path comes higher on the search path. When python sees from com. it grabs the first com package it can. If that happens to contain domain1, then the first import will succeed; if not, it throws an ImportError and gives up. Why? Because import must occur at runtime, potentially at any point in the flow of the code (although most often at the beginning). Nobody wants an exhaustive tree-walk at that point to verify that there's no possible match. It assumes that if it finds a package named com, it is the com package.

Moreover, python doesn't distinguish between the following statements:

from com import domain1
from com.domain1 import module
from com.domain1.module import variable

The concept of verifying that com is the com is going to be different in each case. In java, you really only have to deal with the second case, and that can be accomplished by walking through the file system (I guess an advantage of naming classes and files the same). In python, if you tried to accomplish import with nothing but file system assistance, the first case could (almost) be transparently the same (init.py wouldn't run), the second case could be accomplished, but you would lose the initial running of module.py, but the third case is entirely unattainable. The code has to execute for variable to be available. And this is another main point: import does more than resolve namespaces, it executes code.

Now, you could get away with this if every python package ever distributed required an installation process that searched for the com folder, and then the domain, and so on and so on, but this makes packaging considerably harder, destroys drag-and-drop capability, and makes packaging and all-out nuisance.

"What are the reasons you'd prefer one over the other?"

Python's style is simpler. Java's style allows same-name products from different organizations.

"Do those reasons apply across the languages?"

Yes. You can easily have top level Python packages named "com", "org", "mil", "net", "edu" and "gov" and put your packages as subpackages in these.

Edit. You have some complexity when you do this, because everyone has to cooperate and not pollute these top-level packages with their own cruft.

Python didn't start doing that because the namespace collision -- as a practical matter -- turn out to be rather rare.

Java started out doing that because the folks who developed Java foresaw lots of people cluelessly choosing the same name for their packages and needing to sort out the collisions and ownership issues.

Java folks didn't foresee the Open Source community picking weird off-the-wall unique names to avoid name collisions. Everyone who writes an xml parser, interestingly, doesn't call it "parser". They seem to call it "Saxon" or "Xalan" or something completely strange.

Why do languages like Java use hierarchical package names, while Python does not?

Tags:

Python

Java

Packages

Related

Recent Posts