Setting default/empty attributes for user classes in __init__
To understand the importance(or not) of initializing attributes in __init__
, let's take a modified version of your class MyClass
as an example. The purpose of the class is to compute the grade for a subject, given the student name and score. You may follow along in a Python interpreter.
>>> class MyClass:
... def __init__(self,name,score):
... self.name = name
... self.score = score
... self.grade = None
...
... def results(self, subject=None):
... if self.score >= 70:
... self.grade = 'A'
... elif 50 <= self.score < 70:
... self.grade = 'B'
... else:
... self.grade = 'C'
... return self.grade
This class requires two positional arguments name
and score
. These arguments must be provided to initialize a class instance. Without these, the class object x
cannot be instantiated and a TypeError
will be raised:
>>> x = MyClass()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __init__() missing 2 required positional arguments: 'name' and 'score'
At this point, we understand that we must provide the name
of the student and a score
for a subject as a minimum, but the grade
is not important right now because that will be computed later on, in the results
method. So, we just use self.grade = None
and don't define it as a positional arg. Let's initialize a class instance(object):
>>> x = MyClass(name='John', score=70)
>>> x
<__main__.MyClass object at 0x000002491F0AE898>
The <__main__.MyClass object at 0x000002491F0AE898>
confirms that the class object x
was successfully created at the given memory location. Now, Python provides some useful built-in methods to view the attributes of the created class object. One of the methods is __dict__
. You can read more about it here:
>>> x.__dict__
{'name': 'John', 'score': 70, 'grade': None}
This clearly gives a dict
view of all the initial attributes and their values. Notice, that grade
has a None
value as assigned in __init__
.
Let's take a moment to understand what __init__
does. There are many answers and online resources available to explain what this method does but I'll summarize:
Like __init__
, Python has another built-in method called __new__()
. When you create a class object like this x = MyClass(name='John', score=70)
, Python internally calls __new__()
first to create a new instance of the class MyClass
and then calls __init__
to initialize the attributes name
and score
. Of course, in these internal calls when Python does not find the values for the required positional args, it raises an error as we've seen above. In other words, __init__
initializes the attributes. You can assign new initial values for name
and score
like this:
>>> x.__init__(name='Tim', score=50)
>>> x.__dict__
{'name': 'Tim', 'score': 50, 'grade': None}
It is also possible to access individual attributes like below. grade
does not give anything because it is None
.
>>> x.name
'Tim'
>>> x.score
50
>>> x.grade
>>>
In the results
method, you will notice that the subject
"variable" is defined as None
, a positional arg. The scope of this variable is inside this method only. For the purposes of demonstration, I explicitly define subject
inside this method but this could have been initialized in __init__
too. But what if I try to access it with my object:
>>> x.subject
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'MyClass' object has no attribute 'subject'
Python raises an AttributeError
when it cannot locate an attribute within the class's namespace. If you do not initialize attributes in __init__
, there is a possibility to encounter this error when you access an undefined attribute that could be local to the method of a class only. In this example, defining subject
inside __init__
would have avoided the confusion and would've been perfectly normal to do so as it is not required for any computation either.
Now, lets call results
and see what we get:
>>> x.results()
'B'
>>> x.__dict__
{'name': 'Tim', 'score': 50, 'grade': 'B'}
This prints the grade for the score and notice when we view the attributes, the grade
has also been updated. Right from the start, we had a clear view of the initial attributes and how their values have changed.
But what about subject
? If I want to know how much Tim scored in Math and what was the grade, I can easily access the score
and the grade
as we've seen before but how do I know the subject? Since, the subject
variable is local to the scope of the results
method we could just return
the value of subject
. Change the return
statement in the results
method:
def results(self, subject=None):
#<---code--->
return self.grade, subject
Let's call results()
again. We get a tuple with the grade and subject as expected.
>>> x.results(subject='Math')
('B', 'Math')
To access the values in the tuple, let's assign them to variables. In Python, it is possible to assign values from a collection to multiple variables in the same expression, provided that the number of variables is equal to the length of the collection. Here, the length is just two, so we can have two variables to the left of the expression:
>>> grade, subject = x.results(subject='Math')
>>> subject
'Math'
So, there we have it, though it needed a few extra lines of code to get the subject
. It would be more intuitive to access all of them at once using just the dot operator to access the attributes with x.<attribute>
, but this is just an example and you could try it with subject
initialized in __init__
.
Next, consider there are many students(say 3) and we want the names, scores, grades for Math. Except the subject, all others must be some sort of a collection data type like a list
that can store all the names, scores and grades. We could just initialize like this:
>>> x = MyClass(name=['John', 'Tom', 'Sean'], score=[70, 55, 40])
>>> x.name
['John', 'Tom', 'Sean']
>>> x.score
[70, 55, 40]
This seems fine at first sight, but when you take a another look(or some other programmer) at the initialization of name
, score
and grade
in __init__
, there is no way to tell that they need a collection data type. The variables are also named singular making it more obvious that they could be just some random variables that may need just one value. The purpose of programmers should be to make the intent as clear as as possible, by way of descriptive variable naming, type declarations, code comments and so on. With this in mind, let's change the attribute declarations in __init__
. Before we settle for a well-behaved, well-defined declaration, we must take care of how we declare default arguments.
Edit: Problems with mutable default arguments:
Now, there are some 'gotchas' that we must be aware of while declaring default args. Consider the following declaration that initializes names
and appends a random name on object creation. Recall that lists are mutable objects in Python.
#Not recommended
class MyClass:
def __init__(self,names=[]):
self.names = names
self.names.append('Random_name')
Let's see what happens when we create objects from this class:
>>> x = MyClass()
>>> x.names
['Random_name']
>>> y = MyClass()
>>> y.names
['Random_name', 'Random_name']
The list continues to grow with every new object creation. The reason behind this is that the default values are always evaluated whenever __init__
is called. Calling __init__
multiple times, keeps using the same function object thus appending to the previous set of default values. You can verify this yourself as the id
remains the same for every object creation.
>>> id(x.names)
2513077313800
>>> id(y.names)
2513077313800
So, what is the correct way of defining default args while also being explicit about the data type the attribute supports? The safest option is to set default args to None
and initialize to an empty list when the arg values are None
. The following is a recommended way to declare default args:
#Recommended
>>> class MyClass:
... def __init__(self,names=None):
... self.names = names if names else []
... self.names.append('Random_name')
Let's examine the behavior:
>>> x = MyClass()
>>> x.names
['Random_name']
>>> y = MyClass()
>>> y.names
['Random_name']
Now, this behavior is what we are looking for. The object does not "carry over" old baggage and re-initializes to an empty list whenever no values are passed to names
. If we pass some valid names (as a list of course) to the names
arg for the y
object, Random_name
will simply be appended to this list. And again, the x
object values will not be affected:
>>> y = MyClass(names=['Viky','Sam'])
>>> y.names
['Viky', 'Sam', 'Random_name']
>>> x.names
['Random_name']
Perhaps, the most simplest explanation on this concept can also be found on the Effbot website. If you'd like to read some excellent answers: “Least Astonishment” and the Mutable Default Argument.
Based on the brief discussion on default args, our class declarations will be modified to:
class MyClass:
def __init__(self,names=None, scores=None):
self.names = names if names else []
self.scores = scores if scores else []
self.grades = []
#<---code------>
This makes more sense, all variables have plural names and initialized to empty lists on object creation. We get similar results as before:
>>> x.names
['John', 'Tom', 'Sean']
>>> x.grades
[]
grades
is an empty list making it clear that the grades will be computed for multiple students when results()
is called. Therefore, our results
method should also be modified. The comparisons that we make should now be between the score numbers(70, 50 etc.) and items in the self.scores
list and while it does that the self.grades
list should also be updated with the individual grades. Change the results
method to:
def results(self, subject=None):
#Grade calculator
for i in self.scores:
if i >= 70:
self.grades.append('A')
elif 50 <= i < 70:
self.grades.append('B')
else:
self.grades.append('C')
return self.grades, subject
We should now get the grades as a list when we call results()
:
>>> x.results(subject='Math')
>>> x.grades
['A', 'B', 'C']
>>> x.names
['John', 'Tom', 'Sean']
>>> x.scores
[70, 55, 40]
This looks good but imagine if the lists were large and to figure out who's score/grade belongs to whom would be an absolute nightmare. This is where it is important to initialize the attributes with the correct data type that can store all of these items in a way that they are easily accessible as well as clearly show their relationships. The best choice here is a dictionary.
We can have a dictionary with names and scores defined initially and the results
function should put together everything into a new dictionary that has all the scores, grades etc. We should also comment the code properly and explicitly define args in the method wherever possible. Lastly, we may not require self.grades
anymore in __init__
because as you will see the grades are not being appended to a list but explicitly assigned. This is totally dependent upon the requirements of the problem.
The final code:
class MyClass:
"""A class that computes the final results for students"""
def __init__(self,names_scores=None):
"""initialize student names and scores
:param names_scores: accepts key/value pairs of names/scores
E.g.: {'John': 70}"""
self.names_scores = names_scores if names_scores else {}
def results(self, _final_results={}, subject=None):
"""Assign grades and collect final results into a dictionary.
:param _final_results: an internal arg that will store the final results as dict.
This is just to give a meaningful variable name for the final results."""
self._final_results = _final_results
for key,value in self.names_scores.items():
if value >= 70:
self.names_scores[key] = [value,subject,'A']
elif 50 <= value < 70:
self.names_scores[key] = [value,subject,'B']
else:
self.names_scores[key] = [value,subject,'C']
self._final_results = self.names_scores #assign the values from the updated names_scores dict to _final_results
return self._final_results
Please note _final_results
is just an internal arg that stores the updated dict self.names_scores
. The purpose is to return a more meaningful variable from the function that clearly informs the intent. The _
in the beginning of this variable indicates that it is an internal variable, as per convention.
Lets give this a final run:
>>> x = MyClass(names_scores={'John':70, 'Tom':50, 'Sean':40})
>>> x.results(subject='Math')
{'John': [70, 'Math', 'A'],
'Tom': [50, 'Math', 'B'],
'Sean': [40, 'Math', 'C']}
This gives a much clearer view of the results for each student. It is now easy to access the grades/scores for any student:
>>> y = x.results(subject='Math')
>>> y['John']
[70, 'Math', 'A']
Conclusion:
While the final code needed some extra hard work but it was worth it. The output is more precise and gives clear information about each students' results. The code is more readable and clearly informs the reader about the intent of creating the class, methods, & variables. The following are the key takeaways from this discussion:
- The variables(attributes) that are expected to be shared amongst class methods, should be defined in
__init__
. In our example,names
,scores
and possiblysubject
were required byresults()
. These attributes could be shared by another method like sayaverage
that computes the average of the scores. - The attributes should be initialized with the appropriate data type. This should be decided before-hand before venturing into a class-based design for a problem.
- Care must be taken while declaring attributes with default args. Mutable default args can mutate the values of the attribute if the enclosing
__init__
is causing mutation of the attribute on every call. It is safest to declare default args asNone
and re-initialize to an empty mutable collection later whenever the default value isNone
. - The attribute names should be unambiguous, follow PEP8 guidelines.
- Some variables should be initialized within the scope of the class method only. These could be, for example, internal variables that are required for computations or variables that don't need to be shared with other methods.
- Another compelling reason to define variables in
__init__
is to avoid possibleAttributeError
s that may occur due to accessing unnamed/out-of-scope attributes. The__dict__
built-in method provides a view of the attributes initialized here. While assigning values to attributes(positional args) on class instantiation, the attribute names should be explicitly defined. For instance:
x = MyClass('John', 70) #not explicit x = MyClass(name='John', score=70) #explicit
Finally, the aim should be to communicate the intent as clearly as possible with comments. The class, its methods and attributes should be well commented. For all attributes, a short description alongwith an example, is quite useful for a new programmer who encounters your class and its attributes for the first time.
Following considerable research and discussions with experienced programmers please see below what I believe is the most Pythonic solution to this question. I have included the updated code first and then a narrative:
class MyClass:
def __init__(self,df):
self.df = df
self._results = None
@property
def results(self):
if self._results is None:
raise Exception('df_client is None')
return self._results
def generate_results(self, df_results):
#Imagine some calculations here or something
self._results = df_results
Description of what I learnt, changed and why:
All class attributes should be included in the
__init__
(initialiser) method. This is to ensure readability and aid debugging.The first issue is that you cannot create private attributes in Python. Everything is public, so any partially initialised attributes (such as results being set to None) can be accessed. Convention to indicate a private attribute is to place a lead underscore at the front, so in this case I changed it to
self.results
toself._results
.Keep in mind this is only convention, and
self._results
can still be directly accessed. However, this is the Pythonic way to handle what are pseudo-private attributes.The second issue is having a partly initialised attribute which is set to None. As this is set to
None
, as @jferard below explains, we now have lost a fail-fast hint and have added a layer of obfuscation for debugging the code.To resolve this we add a getter method. This can be seen above as the function
results()
which has the@property
decorator above.This is a function that when invoked checks if
self._results
isNone
. If so it will raise an exception (fail-safe hint), otherwise it will return the object. The@property
decorator changes the invocation style from a function to an attribute, so all the user has to use on an instance of MyClass is.results
just like any other attribute.(I changed the name of the method that sets the results to
generate_results()
to avoid confusion and free up.results
for the getter method)If you then have other methods within the class that need to use
self._results
, but only when properly assigned, you can useself.results
, and that way the fail-safe hint is baked in as above.
I recommend also reading @jferard's answer to this question. He goes into depth about the problems and some of the solutions. The reason I added my answer is that I think for a lot of cases the above is all you need (and the Pythonic way of doing it).
I think you should avoid both solutions. Simply because you should avoid to create uninitialized or partially initialized objects, except in one case I will outline later.
Look at two slightly modified version of your class, with a setter and a getter:
class MyClass1:
def __init__(self, df):
self.df = df
self.results = None
def set_results(self, df_results):
self.results = df_results
def get_results(self):
return self.results
And
class MyClass2:
def __init__(self, df):
self.df = df
def set_results(self, df_results):
self.results = df_results
def get_results(self):
return self.results
The only difference between MyClass1
and MyClass2
is that the first one initializes results
in the constructor while the second does it in set_results
. Here comes the user of your class (usually you, but not always). Everyone knows you can't trust the user (even if it's you):
MyClass1("df").get_results()
# returns None
Or
MyClass2("df").get_results()
# Traceback (most recent call last):
# ...
# AttributeError: 'MyClass2' object has no attribute 'results'
You might think that the first case is better because it does not fail, but I do not agree. I would like the program to fail fast in this case, rather than do a long debugging session to find what happened. Hence, the first part of first answer is: do not set the uninitialized fields to None
, because you loose a fail-fast hint.
But that's not the whole answer. Whichever version you choose, you have an issue: the object was not used and it shouldn't have been, because it was not fully initialized. You can add a docstring to get_results
: """Always use
set_results **BEFORE** this method"""
. Unfortunately the user doesn't read docstrings either.
You have two main reasons for uninitialized fields in your object: 1. you don't know (for now) the value of the field; 2. you want to avoid an expansive operation (computation, file access, network, ...), aka "lazy initialization". Both situations are met in real world, and collide the need of using only fully initialized objects.
Happily, there is a well documented solution to this problem: Design Patterns, and more precisely Creational patterns. In your case, the Factory pattern or the Builder pattern might be the answer. E.g.:
class MyClassBuilder:
def __init__(self, df):
self._df = df # df is known immediately
# GIVE A DEFAULT VALUE TO OTHER FIELDS to avoid the possibility of a partially uninitialized object.
# The default value should be either:
# * a value passed as a parameter of the constructor ;
# * a sensible value (eg. an empty list, 0, etc.)
def results(self, df_results):
self._results = df_results
return self # for fluent style
... other field initializers
def build(self):
return MyClass(self._df, self._results, ...)
class MyClass:
def __init__(self, df, results, ...):
self.df = df
self.results = results
...
def get_results(self):
return self.results
... other getters
(You can use a Factory too, but I find the Builder more flexible). Let's give a second chance to the user:
>>> b = MyClassBuilder("df").build()
Traceback (most recent call last):
...
AttributeError: 'MyClassBuilder' object has no attribute '_results'
>>> b = MyClassBuilder("df")
>>> b.results("r")
... other fields iniialization
>>> x = b.build()
>>> x
<__main__.MyClass object at ...>
>>> x.get_results()
'r'
The advantages are clear:
- It's easier to detect and fix a creation failure than a late use failure;
- You do not release in the wild a uninitialized (and thus potentially damaging) version of your object.
The presence of uninitialized fields in the Builder is not a contradiction: those fields are uninitialized by design, because the Builder's role is to initialize them. (Actually, those fields are some kind of forein fields to the Builder.) This is the case I was talking about in my introduction. They should, in my mind, be set to a default value (if it exists) or left uninitialized to raise an exception if you try to create an uncomplete object.
Second part of my answer: use a Creational pattern to ensure the object is correctly initialized.
Side note: I'm very suspicious when I see a class with getters and setters. My rule of thumb is: always try to separate them because when they meet, objects become unstable.