The Search Dilemma - Part 5
This really took some time, but it’s here! My paper “Profile-guided frequency scaling for Latency-Critical Search Workloads” was accepted and presented at CCGrid’21, pending only indexing at ACM/IEEE digital libraries. Thus, I am leaving the Preprint version here as I am not sure of the policies in regard of sharing the indexed paper.
A brief explanation of the paper is below.
This paper is essentially a trimmed version of my master’s thesis. After working with Xapian, we found too many limitations with this approach (not enough time to apply the amount of modifications required) and decided to go back to working with Elasticsearch.
The main insight of the paper (and my thesis) is about understanding the application’s behavior for every type of request. While our original min-max approach didn’t work, we found that running most requests at a intermediate state and promoting only the ‘heavy’ queries to the maximum frequency is the optimal way to save energy. When the requests are out of the so called ‘hot function’, they will go to the lowest energy state available.
The main question is: how do we determine if a query is considered ‘heavy’ enough to be promoted to the highest frequency? As we aren’t doing any lexical evaluation of the queries (since we already know that service time is proportional to query length), we resorted to a threshold: if the query stays at the hot function for a time longer the threshold, we promote it. Of course, anyone can see that there is some fragility in this approach. The main one is about how to determine this threshold value, and another one is “what if the requests distribution changes?”. The answer for the formar question is pretty straight-forward: experimentation and find what works. For the second, I’ll discuss later.
There are few downsides to our approach. First, we compromise a part of the service time in exchange for saving energy consumption - while our application met the deadlines, this may not be true for all applications. Second, the entire idea only makes sense if there’s an actual hot function to be profiled within the application - some heavily optimized applications may not be able to use this approach. Finally, this work was only possible because we were using the Java Virtual Machine’s resources to instrument Elasticsearch on-the-fly. Instrumenting binary applications on function-level is probably a lot harder.
Most of this work was done in an insight, as we tried a lot of things that didn’t work - and as my time limit for the master’s was coming near, I had to publish what I had. Very well, if I had more time, these are the main ways I think that this work could be further improved:
- Mathematical Framework: While this work is empirically proven to work, a theoretical framework would be nice to have. Will this problem work for a ‘four-state’ policy? Or a ‘thousand-state’ one? If you have a great range of frequencies to choose and also a great number of requests types, the framework becomes a lot useful.
- Automatic Profiling: Sometimes, the hot function changes - in Elasticsearch, the indexing function can be hotter than the searching function sometimes. I personally believe that this approach would incur in a lot of overhead, hence the best approach for this idea would be sampling the application every X minutes (or seconds) and act accordingly to the hot function.
- Variable Threshold: If your request pattern changes, the threshold will need to change as well. I believe a good formulation for this idea would be using Reinforcement Learning.
- Multi-architectural Systems: We only use a DVFS/DFS system here, and AMP (GPUs, FPGA) ones should be taken in account as well. As seen in older parts of this series, transferring data can be costly for the service time, hence the filtering of even more heavy queries should be done cautiously.