Ticket #198 (closed enhancement: fixed)

Opened 10 months ago

Last modified 10 months ago

poor encoding performance

Reported by: akaihola Owned by: nick
Priority: major Milestone: 0.2
Component: Encoder Version: 0.1
Keywords: Cc:

Description (last modified by thijs) (diff)

While stress-testing PyAMF with our worst-case data, a data structure of ~1500 objects took 19 seconds to encode in AMF3.

Running the test through cProfile we saw over a million calls to the __cmp__() method of our custom classes.

Check the related callgrind data on the ProfilingResults page.

Attachments

ticket198-encoder-optimization.diff (7.0 KB) - added by akaihola 10 months ago.
two optimizations to the encoder, description below
class_definition_reverse_lookup.diff (6.6 KB) - added by nick 10 months ago.
Patch that stores a reverse lookup for class definitions in AMF3
type_map_cache.diff (1.1 KB) - added by akaihola 10 months ago.
performance patch: caches encoder functions keyed by element class, all tests pass
encode-int.diff (1.0 KB) - added by akaihola 10 months ago.
Performance patch: replace BufferedByteStream with str in _encode_int()
caching-class-alias.diff (1.8 KB) - added by nick 10 months ago.
Based on your work for _writeElementFunc, I have implemented a cache for ClassAlias

Change History

Changed 10 months ago by akaihola

two optimizations to the encoder, description below

  Changed 10 months ago by akaihola

The __cmp__() calls were largely caused by a == comparison which could be replaced by an is comparison without ill effects.

Another cause for the calls were calls to self.classes.index() in the amf3.Context class. To eliminate them, we added a reverse mapping from class_def objects to their AMF reference numbers. Non-hashable objects are pickled with cPickle.dumps() and wrapped in a private class to prevent collisions.

Even though hashing and pickling do add some overhead, it proved to be vastly less than that caused by the list.index() and object.__cmp__() calls. This change dropped the encoding time from 19 to 4.5 seconds on our test data.

A side effect from the hash/pickle trick is that new-style classes must be defined on the module level and not inside functions or other classes. Is this a too severe restriction?

  Changed 10 months ago by nick

  • status changed from new to accepted

No attempt at optimisation has been attempted yet, we want to get a stable code base/test suite before we do. That being said I think that obvious areas of the code where things are a bit stupid (like using .index) we can correct.

I don't think that using cPickle is the right way to go here, apart from the reason stated, a quicker optimisation would be to use the id function and store a reverse lookup that way. This should give similar results as seen in the patch above, if not better.

I will attach a patch that does just that.

Changed 10 months ago by nick

Patch that stores a reverse lookup for class definitions in AMF3

  Changed 10 months ago by akaihola

Ah, I must have messed up something when I tried using id(). Your patch works and is of course a bit faster than mine, too.

Do you have an estimate of when optimization will be looked into, and what kind of solutions might be used? If it's low in your priorities, we might consider spending some time with it because we need to squeeze response times as low as possible with large datasets.

After your patch the profiling diagram looks very even, so there's no single obvious slug left. I also tried to run the encoding with psyco, but it didn't affect the performance at all. I suppose that's because the functions are very short and simple so most of the time is spent in calling functions.

One spot with a moderate effect to the performance with our dataset is BaseEncoder._writeElementFunc(), which iterates the list of possible data types to find the correct encoding function, but I can't think of clean optimizations. Complex tricks like re-ordering the type map on the fly based on type frequencies or doing some kind of caching might help.

Another expensive spot is the StringIOProxy._get_len() in util.py.

  Changed 10 months ago by akaihola

I posted some related callgrind data on the ProfilingResults page.

  Changed 10 months ago by nick

  Changed 10 months ago by akaihola

I just quickly experimented with an idea: caching _writeElementFunc() return values keyed by data.__class__. It reduced the cost of _writeElementFunc() from 15% down to 1% and sped up our test run from 2.65 to 2.27 secs. Patch below, all tests pass.

Changed 10 months ago by akaihola

performance patch: caches encoder functions keyed by element class, all tests pass

follow-up: ↓ 12   Changed 10 months ago by akaihola

Oh my, now adding an attachment to a ticket damages the ticket just like on the PerformanceResults? wiki page last night. I hope adding another comment will fix it. Something badly wrong with this trac.

  Changed 10 months ago by akaihola

ProfilingResults I meant of course, and yes, the comment fixed the ticket.

  Changed 10 months ago by akaihola

Another dirty trick: replace BufferedByteStream with a string in _encode_int(). Effect: cost of _writeInteger() down from 46.6% to 17.6%. Execution time down from 2.27 to 1.16 secs. This was with the type map cache patch and the patch below on top of [1053].

Changed 10 months ago by akaihola

Performance patch: replace BufferedByteStream with str in _encode_int()

  Changed 10 months ago by akaihola

Attachment broke the ticket again, reviving with this comment.

Changed 10 months ago by nick

Based on your work for _writeElementFunc, I have implemented a cache for ClassAlias

  Changed 10 months ago by nick

The attachment does break some unit tests, but I have looked into it and many unit tests surrounding class management are grouped together so there are snippets like this:

pyamf.register_class(Spam, 'abc.xyz')

x = Spam({'spam': 'eggs'})

# do some testing here .. blabla

pyamf.unregister_class(Spam)

pyamf.register_class(Spam, 'abc.xyz', more_args_here)

# test something slightly different

pyamf.unregister_class(Spam)

So essentially registering/unregistering classes using the same encoder instance which is not a reflection of what will happen in the real world ... certainly during one encode process.

in reply to: ↑ 7   Changed 10 months ago by thijs

Replying to akaihola:

Oh my, now adding an attachment to a ticket damages the ticket just like on the PerformanceResults? wiki page last night. I hope adding another comment will fix it. Something badly wrong with this trac.

I updated the trac trunk to r6538 (one we used till now was r6513) so I hope they fixed the attachment issues;

U    trac/attachment.py
U    trac/ticket/default_workflow.py
U    trac/timeline/web_ui.py
Updated to revision 6538.

  Changed 10 months ago by thijs

fyi: ecarter screwed up in r6513 which was reverted again in r6535 by cboos so I'm convinced the attachment issues are fixed now.

  Changed 10 months ago by thijs

  • description modified (diff)

  Changed 10 months ago by nick

  • status changed from accepted to closed
  • resolution set to fixed

Fixed in r1068.

Note: See TracTickets for help on using tickets.