Daniel Medeiros


Doctorand @ KTH
Data Engineer / DevOps

Share: 

The Search Dilemma - Part 3

Note: The approach described by this topic was tested more than 1 year ago, and we (me and my master’s advisor) decided to drop it as the actual implementation found too many technical barriers (e.g. real-time profiling with low overhead). Mostly of what follows here are some insights I had later.

On the last post, we discussed a first approach to the hot function model: whenever a thread accessed that zone, it would be promoted to the faster cores and, when exiting, demoted to slower cores. The first result did not show any improvements on quality of service. Why?

At the time, we concluded that the main reason for this is the actual overhead of the application that interfaces with the operating system to promote/demote threads - plus the fact that threads may take some time to migrate between a core and another. The issue of migrating all processes threads is simply that everything is so fast that the OS cannot handle all those tasks in real-time. To prove this point, an experiment was done: we developed a scheduling heuristic for situations where the number of physical/processing cores is equal to the number of threads and tested it against three situations: the ‘pure’ version, the ‘instrumented’ version and the ‘scheduled’ version. The first one doesn’t take any modifications, while the second means that all the scheduling code is compiled within the bytecode - our version is the ‘scheduled’ one, where a Java agent act as scheduler and overrides ARM’s scheduler. Results?

As lower is better, the pure version clearly beats the other two. Maybe the heuristic is not good enough? Yeah, maybe, but the difference between the Instrumented version and the Scheduled one makes clear that introducing a Java agent rises a lot the processing overhead. A sensitive approach to reduce this overhead is simply diminishing the number of requests that should be promoted or not. How do we do that? We thought two ways at the time:

Another experiment has partially shown that decreasing the number of promotions/demotions increase the actual service time. But while both of those approaches could be implemented to some degree, the main issue is that the actual heuristic for situations where the number of threads is higher than the number of processors - and Elasticsearch goes this way - is pretty much unknown - and some studies regarding core sharing and core usage would be necessary. Finally, there’s the fact that some application may take more time not only because the processing time per itself, but disk operations (for searching a database, for example) may also take a good chunk of the total service time - hence, a full real-time profiling is necessary and the Java Virtual Machine makes it unlikely due to its sandboxed model (I like to emphasis that because the number of issues I had with JVM/OpenJDK at the time).

But instead of checking only service time, I think this problem could be better handled through the energy point of view. If you migrate only what’s necessary to big and keep the biggest part of queries running on slower cores, you’d pretty much save energy (and money) without violating the deadline. In the end, instead of using heterogeneous core, we decided to move for variable-frequency processors and also a simpler search application - Xapian (hopefully written in C++) - and we took turns for checking the energy approach.

,