Python Slots Speed

4/11/2022

Python Slots Speed

Python Slots Speed Games
Python Slots Speed Game

In Python, there is no default functionality to allocate a static amount of memory while creating the object to store all its attributes. Usage of slots reduce the wastage of space and speed up the program by allocating space for a fixed amount of attributes. Example of python object with slots. Its CPU speed runs at 120MHz (Boost up to 200MHz). Realtek RTL8720DN chip supports both Bluetooth and Wi-Fi providing the backbone for IoT projects. The Wio Terminal itself is equipped with a 2.4” LCD Screen, onboard IMU(LIS3DHTR), microphone, buzzer, microSD card slot, light sensor, and infrared emitter(IR 940nm).

Speed is, in fact, a very important property in data structures. Why does it take much less time to use NumPy operations over vanilla python? Let’s have a look at a few examples. Proposed Change. I propose that the Python virtual machine be modified to include TRACKOBJECT and UNTRACKOBJECT opcodes.TRACKOBJECT would associate a global name or attribute of a global name with a slot in the local variable array and perform an initial lookup of the associated object to fill in the slot with a valid value.

slots provide a special mechanism to reduce the size of objects. It is especially useful if you need to allocate thousands of objects that would otherwise take lots of memory space. It is not very common but you may find it useful someday. Note, however, that it has some side effects (e.g. pickle may not work) and that Python 3 introduced memory optimisation on objects (sse http://www.python.org/dev/peps/pep-0412/) so slots may not be needed anymore ?

The main idea is as follows. As you may know every object in Python contains a dynamic dictionary that allows adding attributes. You can see the slots as the static version that does not allow additional attributes.

Contents

slots
- Quick Example

Here is the slots syntax uing the __slot__ keyword:

The traditional version would be as follows:

This means that for every instance you’ll have an instance of a dict. Now, for some people this might seem way too much space for just a couple of attributes.

Unfortunately there is a side effect to slots. They change the behavior of the objects that have slots in a way that can be abused by control freaks and static typing weenies. This is bad, because the control freaks should be abusing the metaclasses and the static typing weenies should be abusing decorators, since in Python, there should be only one obvious way of doing something.

Making CPython smart enough to handle saving space without __slots__ is a major undertaking, which is probably why it is not on the list of changes for P3k (yet).

I’d like to see some elaboration on the “static typing”/decorator point, sans pejoratives. Quoting absent third parties is unhelpful. __slots__ doesn’t address the same issues as static typing. For example, in C++, it is not the declaration of a member variable is being restricted, it is the assignment of an unintended type (and compiler enforced) to that variable. I’m not condoning the use of __slots__, just interested in the conversation. Thanks! – hiwaylon Nov 28 ‘11 at 17:541

Each python object has a __dict__ atttribute which is a dictionary containing all other attributes. e.g. when you type self.attr python is actually doing self.__dict__[‘attr’]. As you can imagine using a dictionary to store attribute takes some extra space & time for accessing it.

However, when you use __slots__, any object created for that class won’t have a __dict__ attribute. Instead, all attribute access is done directly via pointers.

So if want a C style structure rather than a full fledged class you can use __slots__ for compacting size of the objects & reducing attribute access time. A good example is a Point class containing attributes x & y. If you are going to have a lot of points, you can try using __slots__ in order to conserve some memory.

No, an instance of a class with __slots__ defined is not like a C-style structure. There is a class-level dictionary mapping attribute names to indexes, otherwise the following would not be possible: class A(object): __slots__= “value”,nna=A(); setattr(a, ‘value’, 1) I really think this answer should be clarified (I can do that if you want). Also, I’m not certain that instance.__hidden_attributes[instance.__class__[attrname]] is faster than instance.__dict__[attrname]. – tzot Oct 15 ‘11 at 13:56up vote 4 down vote

Slots are very useful for library calls to eliminate the “named method dispatch” when making function calls. This is mentioned in the SWIG documentation. For high performance libraries that want to reduce function overhead for commonly called functions using slots is much faster.

Now this may not be directly related to the OPs question. It is related more to building extensions than it does to using the slots syntax on an object. But it does help complete the picture for the usage of slots and some of the reasoning behind them.

By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.

The default can be overridden by defining __slots__ in a new-style class definition. The __slots__ declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because __dict__ is not created for each instance.

This class variable can be assigned a string, iterable, or sequence of strings with variable names used by instances. If defined in a new-style class, __slots__ reserves space for the declared variables and prevents the automatic creation of __dict__ and __weakref__ for each instance. New in version 2.2.

Notes on using __slots__

Without a __dict__ variable, instances cannot be assigned new variables not listed in the __slots__ definition. Attempts to assign to an unlisted variable name raises AttributeError. If dynamic assignment of new variables is desired, then add ‘__dict__’ to the sequence of strings in the __slots__ declaration. Changed in version 2.3: Previously, adding ‘__dict__’ to the __slots__ declaration would not enable the assignment of new attributes not specifically listed in the sequence of instance variable names.

Without a __weakref__ variable for each instance, classes defining __slots__ do not support weak references to its instances. If weak reference support is needed, then add ‘__weakref__’ to the sequence of strings in the __slots__ declaration. Changed in version 2.3: Previously, adding ‘__weakref__’ to the __slots__ declaration would not enable support for weak references.

__slots__ are implemented at the class level by creating descriptors (3.4.2) for each variable name. As a result, class attributes cannot be used to set default values for instance variables defined by __slots__; otherwise, the class attribute would overwrite the descriptor assignment.

If a class defines a slot also defined in a base class, the instance variable defined by the base class slot is inaccessible (except by retrieving its descriptor directly from the base class). This renders the meaning of the program undefined. In the future, a check may be added to prevent this.

Warning

effects of a __slots__ declaration is limited to the class where it is defined. In other words, subclasses will have a __dict__ (unless they also define __slots__).

__slots__ do not work for classes derived from ``variable-length’’ built-in types such as long, str and tuple.

Any non-string iterable may be assigned to __slots__. Mappings may also be used; however, in the future, special meaning may be assigned to the values corresponding to each key.

For every instance of any class, attributes are stored in a dictionary.

This means that for every instance you’ll have an instance of a dict. Now, for some people this might seem way too much space for just a couple of attributes.

If you have lots and lots and looooots of instances, and you want to save some memory, you can use __slots__. The basic idea is that when you define the __slots__ class attribute, those attributes will get just the enough space, without wasting space.

Here is the previous example using __slots__:

Now, one side effect of these __slots__ thing is that, whenever you define the __slots__ class attribute, your __dict__ attribute for every instance will be gone!. It’s not a surprise because that’s why you should use __slots__ in the first place… to get rid off the __dict__ in every instance, to save some memory remember?Can’t bind attributes to the instance any more…

Another side effect is that, as there is no __dict__, there is no way to add, at runtime, any attributes to your instance:

# This should should work if there is no __slots__ defined...>>> instance.new_attr = 10Traceback (most recent call last):

File “<stdin>”, line 1, in <module>

AttributeError: ‘myClass’ object has no attribute ‘new_attr’>>>Read only attributes?

Another one is that, if there is some kind of collision between the slot and a class attribute, then the class attribute will overwrite the slot and, as there is no __dict__, the class attribute will be read-only.

However if you want to have a __dict__, you can always insert into the __slots__ the ‘__dict__’ value, and all these little side effects will go away

But what if I wanted to add the ‘__dict__’ value into __slots__ at runtime?

sorry dude but, no can do.

reference: http://mypythonnotes.wordpress.com/2008/09/04/__slots__/:wq

Post-History:
PEP:	266
Title:	Optimizing Global Variable/Attribute Access
Author:	skip at pobox.com (Skip Montanaro)
Status:	Withdrawn
Type:	Standards Track
Created:	13-Aug-2001
Python-Version:	2.3

Contents

Questions
Unresolved Issues

The bindings for most global variables and attributes of other modulestypically never change during the execution of a Python program, but becauseof Python's dynamic nature, code which accesses such global objects must runthrough a full lookup each time the object is needed. This PEP proposes amechanism that allows code that accesses most global objects to treat them aslocal objects and places the burden of updating references on the code thatchanges the name bindings of such objects.

Python Slots Speed Games

Consider the workhorse function sre_compile._compile. It is the internalcompilation function for the sre module. It consists almost entirely of aloop over the elements of the pattern being compiled, comparing opcodes withknown constant values and appending tokens to an output list. Most of thecomparisons are with constants imported from the sre_constants module.This means there are lots of LOAD_GLOBAL bytecodes in the compiled outputof this module. Just by reading the code it's apparent that the authorintended LITERAL, NOT_LITERAL, OPCODES and many other symbols tobe constants. Still, each time they are involved in an expression, they mustbe looked up anew.

Most global accesses are actually to objects that are 'almost constants'.This includes global variables in the current module as well as the attributesof other imported modules. Since they rarely change, it seems reasonable toplace the burden of updating references to such objects on the code thatchanges the name bindings. If sre_constants.LITERAL is changed to referto another object, perhaps it would be worthwhile for the code that modifiesthe sre_constants module dict to correct any active references to thatobject. By doing so, in many cases global variables and the attributes ofmany objects could be cached as local variables. If the bindings between thenames given to the objects and the objects themselves changes rarely, the costof keeping track of such objects should be low and the potential payoff fairlylarge.

In an attempt to gauge the effect of this proposal, I modified the Pystonebenchmark program included in the Python distribution to cache globalfunctions. Its main function, Proc0, makes calls to ten differentfunctions inside its for loop. In addition, Func2 calls Func1repeatedly inside a loop. If local copies of these 11 global identifiers aremade before the functions' loops are entered, performance on this particularbenchmark improves by about two percent (from 5561 pystones to 5685 on mylaptop). It gives some indication that performance would be improved bycaching most global variable access. Note also that the pystone benchmarkmakes essentially no accesses of global module attributes, an anticipated areaof improvement for this PEP.

I propose that the Python virtual machine be modified to includeTRACK_OBJECT and UNTRACK_OBJECT opcodes. TRACK_OBJECT wouldassociate a global name or attribute of a global name with a slot in the localvariable array and perform an initial lookup of the associated object to fillin the slot with a valid value. The association it creates would be noted bythe code responsible for changing the name-to-object binding to cause theassociated local variable to be updated. The UNTRACK_OBJECT opcode woulddelete any association between the name and the local variable slot.

Operation of this code in threaded programs will be no different than inunthreaded programs. If you need to lock an object to access it, you wouldhave had to do that before TRACK_OBJECT would have been executed andretain that lock until after you stop using it.

FIXME: I suspect I need more here.

Global variables and attributes rarely change. For example, once a functionimports the math module, the binding between the name math and themodule it refers to aren't likely to change. Similarly, if the function thatuses the math module refers to its sin attribute, it's unlikely tochange. Still, every time the module wants to call the math.sin function,it must first execute a pair of instructions:

If the client module always assumed that math.sin was a local constant andit was the responsibility of 'external forces' outside the function to keepthe reference correct, we might have code like this:

If the LOAD_FAST was in a loop the payoff in reduced global loads andattribute lookups could be significant.

This technique could, in theory, be applied to any global variable access orattribute lookup. Consider this code:

Even though l is a local variable, you still pay the cost of loadingl.append ten times in the loop. The compiler (or an optimizer) couldrecognize that both math.sin and l.append are being called in the loopand decide to generate the tracked local code, avoiding it for the builtinrange() function because it's only called once during loop setup.Performance issues related to accessing local variables make trackingl.append less attractive than tracking globals such as math.sin.

According to a post to python-dev by Marc-Andre Lemburg [1], LOAD_GLOBALopcodes account for over 7% of all instructions executed by the Python virtualmachine. This can be a very expensive instruction, at least relative to aLOAD_FAST instruction, which is a simple array index and requires no extrafunction calls by the virtual machine. I believe many LOAD_GLOBALinstructions and LOAD_GLOBAL/LOAD_ATTR pairs could be converted toLOAD_FAST instructions.

Code that uses global variables heavily often resorts to various tricks toavoid global variable and attribute lookup. The aforementionedsre_compile._compile function caches the append method of the growingoutput list. Many people commonly abuse functions' default argument featureto cache global variable lookups. Both of these schemes are hackish andrarely address all the available opportunities for optimization. (Forexample, sre_compile._compile does not cache the two globals that it usesmost frequently: the builtin len function and the global OPCODES arraythat it imports from sre_constants.py.

What about threads? What if `math.sin` changes while in cache?

I believe the global interpreter lock will protect values from beingcorrupted. In any case, the situation would be no worse than it is today.If one thread modified math.sin after another thread had already executedLOAD_GLOBAL math, but before it executed LOAD_ATTR sin, the clientthread would see the old value of math.sin.

The idea is this. I use a multi-attribute load below as an example, notbecause it would happen very often, but because by demonstrating the recursivenature with an extra call hopefully it will become clearer what I have inmind. Suppose a function defined in module foo wants to accessspam.eggs.ham and that spam is a module imported at the module levelin foo:

Upon entry to somefunc, a TRACK_GLOBAL instruction will be executed:

spam.eggs.ham is a string literal stored in the function's constantsarray. n is a fastlocals index. &fastlocals[n] is a reference toslot n in the executing frame's fastlocals array, the location inwhich the spam.eggs.ham reference will be stored. Here's what I envisionhappening:

Python Slots Speed Game

The TRACK_GLOBAL instruction locates the object referred to by the namespam and finds it in its module scope. It then executes a C functionlike:
where m is the module object with an attribute spam.
The module object strips the leading spam. and stores the necessaryinformation (eggs.ham and &fastlocals[n]) in case its binding for thename eggs changes. It then locates the object referred to by the keyeggs in its dict and recursively calls:
The eggs object strips the leading eggs., stores the(ham, &fastlocals[n]) info, locates the object in its namespace calledham and calls _PyObject_TrackName once again:
The ham object strips the leading string (no '.' this time, but that'sa minor point), sees that the result is empty, then uses its own value(self, probably) to update the location it was handed:
At this point, each object involved in resolving spam.eggs.hamknows which entry in its namespace needs to be tracked and what locationto update if that name changes. Furthermore, if the one name it istracking in its local storage changes, it can call _PyObject_TrackNameusing the new object once the change has been made. At the bottom end ofthe food chain, the last object will always strip a name, see the emptystring and know that its value should be stuffed into the location it'sbeen passed.
When the object referred to by the dotted expression spam.eggs.hamis going to go out of scope, an UNTRACK_GLOBAL spam.eggs.ham ninstruction is executed. It has the effect of deleting all the trackinginformation that TRACK_GLOBAL established.
The tracking operation may seem expensive, but recall that the objectsbeing tracked are assumed to be 'almost constant', so the setup cost willbe traded off against hopefully multiple local instead of global loads.For globals with attributes the tracking setup cost grows but is offset byavoiding the extra LOAD_ATTR cost. The TRACK_GLOBAL instructionneeds to perform a PyDict_GetItemString for the first name in the chainto determine where the top-level object resides. Each object in the chainhas to store a string and an address somewhere, probably in a dict thatuses storage locations as keys (e.g. the &fastlocals[n]) and strings asvalues. (This dict could possibly be a central dict of dicts whose keysare object addresses instead of a per-object dict.) It shouldn't be theother way around because multiple active frames may want to trackspam.eggs.ham, but only one frame will want to associate that name withone of its fast locals slots.

Threading

What about this (dumb) code?:

It's not clear from a static analysis of the code what the lock is protecting.(You can't tell at compile-time that threads are even involved can you?)Would or should it affect attempts to track l.append or math.sin inthe fill_l function?

If we annotate the code with mythical track_object and untrack_objectbuiltins (I'm not proposing this, just illustrating where stuff would go!), weget:

Is that correct both with and without threads (or at least equally incorrectwith and without threads)?

Nested Scopes

The presence of nested scopes will affect where TRACK_GLOBAL finds aglobal variable, but shouldn't affect anything after that. (I think.)

Missing Attributes

Suppose I am tracking the object referred to by spam.eggs.ham andspam.eggs is rebound to an object that does not have a ham attribute.It's clear this will be an AttributeError if the programmer attempts toresolve spam.eggs.ham in the current Python virtual machine, but supposethe programmer has anticipated this case:

You can't raise an AttributeError when the tracking information isrecalculated. If it does not raise AttributeError and instead lets thetracking stand, it may be setting the programmer up for a very subtle error.

One solution to this problem would be to track the shortest possible root ofeach dotted expression the function refers to directly. In the above example,spam.eggs would be tracked, but spam.eggs.ham and spam.eggs.baconwould not.

Who does the dirty work?

In the Questions section I postulated the existence of a_PyObject_TrackName function. While the API is fairly easy to specify,the implementation behind-the-scenes is not so obvious. A central dictionarycould be used to track the name/location mappings, but it appears that allsetattr functions might need to be modified to accommodate this newfunctionality.

If all types used the PyObject_GenericSetAttr function to set attributesthat would localize the update code somewhat. They don't however (which isnot too surprising), so it seems that all getattrfunc and getattrofuncfunctions will have to be updated. In addition, this would place an absoluterequirement on C extension module authors to call some function when anattribute changes value (PyObject_TrackUpdate?).

Finally, it's quite possible that some attributes will be set by side effectand not by any direct call to a setattr method of some sort. Consider adevice interface module that has an interrupt routine that copies the contentsof a device register into a slot in the object's struct whenever itchanges. In these situations, more extensive modifications would have to bemade by the module author. To identify such situations at compile time wouldbe impossible. I think an extra slot could be added to PyTypeObjects toindicate if an object's code is safe for global tracking. It would have adefault value of 0 (Py_TRACKING_NOT_SAFE). If an extension module authorhas implemented the necessary tracking support, that field could beinitialized to 1 (Py_TRACKING_SAFE). _PyObject_TrackName could checkthat field and issue a warning if it is asked to track an object that theauthor has not explicitly said was safe for tracking.

Jeremy Hylton has an alternate proposal on the table [2]. His proposal seeksto create a hybrid dictionary/list object for use in global name lookups thatwould make global variable access look more like local variable access. Whilethere is no C code available to examine, the Python implementation given inhis proposal still appears to require dictionary key lookup. It doesn'tappear that his proposal could speed local variable attribute lookup, whichmight be worthwhile in some situations if potential performance burdens couldbe addressed.

I don't believe there will be any serious issues of backward compatibility.Obviously, Python bytecode that contains TRACK_OBJECT opcodes could not beexecuted by earlier versions of the interpreter, but breakage at the bytecodelevel is often assumed between versions.

TBD. This is where I need help. I believe there should be either a centralname/location registry or the code that modifies object attributes should bemodified, but I'm not sure the best way to go about this. If you look at thecode that implements the STORE_GLOBAL and STORE_ATTR opcodes, it seemslikely that some changes will be required to PyDict_SetItem andPyObject_SetAttr or their String variants. Ideally, there'd be a fairlycentral place to localize these changes. If you begin considering trackingattributes of local variables you get into issues of modifying STORE_FASTas well, which could be a problem, since the name bindings for local variablesare changed much more frequently. (I think an optimizer could avoid insertingthe tracking code for the attributes for any local variables where thevariable's name binding changes.)

I believe (though I have no code to prove it at this point), that implementingTRACK_OBJECT will generally not be much more expensive than a singleLOAD_GLOBAL instruction or a LOAD_GLOBAL/LOAD_ATTR pair. Anoptimizer should be able to avoid converting LOAD_GLOBAL andLOAD_GLOBAL/LOAD_ATTR to the new scheme unless the object accessoccurred within a loop. Further down the line, a register-orientedreplacement for the current Python virtual machine [3] could conceivablyeliminate most of the LOAD_FAST instructions as well.

The number of tracked objects should be relatively small. All active framesof all active threads could conceivably be tracking objects, but this seemssmall compared to the number of functions defined in a given application.

[1]	https://mail.python.org/pipermail/python-dev/2000-July/007609.html

[2]	http://www.zope.org/Members/jeremy/CurrentAndFutureProjects/FastGlobalsPEP

[3]	http://www.musi-cal.com/~skip/python/rattlesnake20010813.tar.gz

This document has been placed in the public domain.

Source: https://github.com/python/peps/blob/master/pep-0266.txt

taosercapardiva.netlify.com – 2021

taosercapardiva.netlify.com

Python Slots Speed

Python Slots Speed Games

What about threads? What if math.sin changes while in cache?

Python Slots Speed Game

Threading

Nested Scopes

Missing Attributes

Who does the dirty work?

New Posts

What about threads? What if `math.sin` changes while in cache?