Have you ever thought what if a search engine (like Google, Bing or Yahoo) took hours to answer your search queries? Well, neither do I. But I presume that most people would be angry and just stop using them. This assumption is corroborated by a 2009 study that revealed that a delay of 2 seconds in delivering search results may impact companies’ revenue in over 4% per user; in other words, slow answers equals to less cash flow.
Big companies have many ways to address this (quality-of-service) issue and make this response time faster: the most obvious of them is simply deploying faster processors, more memory caches and upgrading network speed for distributed computing. However, this approach is not really the most efficient as there are financial (deploying more servers cost money) and spatial (your datacenter has limited space) constraints. Jeff Dean shows some manners to circumvent these constraints and maximize the system’s efficiency while guaranteeing the same quality-of-service for all users. I’ll discuss one of them here.
First of all, what is ‘search’? On the user perspective, it’s possible to define it as being the process where for a given ‘kw‘ keyword – with variable length l – we obtain a ranked set of ‘S‘ answers from our search-engine database; the set of keywords may be also known as ‘query’. From the given definition, we can infer that a search engine has 3 different applications that act concurrently: the one who searches for new content (known as crawling), the one who make this content readable for our engine (indexing) and the one who scores the content according to our interest (ranking). For more technical details, Larry Page’s paper (Anatomy of a Search Engine) goes way further in explaining those processes.
In most of the cases, ranking is the serviceable application for users, and their perception of the service may heavily depend on how faster or slow ranking is both computed and returned as a result. As a corollary, optimizing computational resources usage for maximum efficiency while maintaining a consistent quality of service is economically critical for server operators. However, having only powerful processors at the datacenter is not enough, as it is empirically shown by  and Figure 1 that the relationship between energy consumption and the related number of operations per second is almost exponential.
[ Fig 1 ]
Now, let’s assume that we have two types of systems: the brawn-core systems and the wimpy-core systems, where the former has a high single-core performance while the latter is more power-efficient and has lower single-core performance. Let’s also assume that, for stochastic reasons, not all queries are equal. With those assumptions, the conclusion that different queries take different times to be processed is also reasonable. That’s the root of our efficiency problem.
If we set a soft deadline of 2000 milliseconds, as proposed by the study described at the beginning of this text, we’ll notice that part of those queries may take longer processing time than 2000 milliseconds on the wimpy-core systems, while significantly less time on the brawny-core systems – which is pretty safe to assume. So, if we can detect which queries will take longer than our defined deadline and move then to a brawny-core, we (i) improve the quality-of-service, as the user-sent request will not break the deadline, and (ii) also improve energy efficiency as only
The big point is that not all search queries are equal.
The Part 2 of this series will be related to my own solution for this problem, which is more about software profiling and instrumenting. Hope to see you there!
Author’s note: This is part of my in-progress master’s thesis. Except when referenced, graphs and results were created by me. Please do not use any of the content listed here without my explicit consent.
 Energy study.
 Tail at Scale.
 Anatomy of a Search Engine
 18-447-S18-L01-S1, James C. Hoe, CMU/ECE/CALCM, 2018