From times to times, I like to reproduce papers that will ease my life at the Geological Survey of Brazil. However, I usually stumble exactly in reproducibility issues – lack of the implementation code, parameters or even the shown dataset – and that’s exactly what happened when I tried to reproduce the paper “Towards the automated analysis of regional aeromagnetic data“, from Holden et al.
Note: The approach described by this topic was tested more than 1 year ago, and we (me and my master’s advisor) decided to drop it as the actual implementation found too many technical barriers (e.g. real-time profiling with low overhead). Mostly of what follows here are some insights I had later.
On the last post, we discussed a first approach to the hot function model: whenever a thread accessed that zone, it would be promoted to the faster cores and, when exiting, demoted to slower cores. The first result did not show any improvements on quality of service. Why?
One of the works the Geophysicist usually do at the Geological Survey of Brazil is the correlation between geophysics and geology for geological mapping. This task is done through the use of gammaspectrometric data – and, since its penetration (skin depth) is very low, the data is deeply correlated to the surface geology. By assigning colors to Potassium, Thorium and Uranium values, it is possible to generate what is known as ‘ternary map’ – pretty much a geological map. As this map is usually shown in RGB, the range of colors is about 16 million – and this is a issue for interpretation. In this post, we discuss a method to reduce it to 27 colors.
Note: Me and my coworkers are (very slowly) working on this! Check on Github! We hope to have some proof-of-concept in a few months time.
Since I joined the Geological Survey of Brazil, I’ve had some issues with geophysical data processing softwares – mainly for those who are designed for grav/mag/gamma datasets. The softwares lacks either usability (UX Experience) or lacks performance. The GeoFront (a wordplay with the ‘Neon Genesis Evangelion’ series) comes to solve both of this problem while providing a new model for data processing.
One of the most interesting things I’ve come across during my masters years is how applications behave. Obviously, some applications are more prone to code optimization than others, and those applications will most likely be composed of some cpu-intensive functions that may eventually turn into bottlenecks if put in a heavy-load production environment. One of my hypothesis to maintain quality of service and reduce energy consumption consisted in analyzing that specific hot function and monitoring threads – a thread in a core would eventually have its operating frequency upgraded while executing that function and, after it exits the hot function, the operating frequency would be degraded. The assumption is that some functions does not need to execute as faster as the hot function, and hence is consuming more energy.
Have you ever thought what if a search engine (like Google, Bing or Yahoo) took hours to answer your search queries? Well, neither do I. But I presume that most people would be angry and just stop using them. This assumption is corroborated by a 2009 study[R1] that revealed that a delay of 2 seconds in delivering search results may impact companies’ revenue in over 4% per user; in other words, slow answers equals to less cash flow.
Big companies have many ways to address this (quality-of-service) issue and make this response time faster: the most obvious of them is simply deploying faster processors, more memory caches and upgrading network speed for distributed computing. However, this approach is not really the most efficient as there are financial (deploying more servers cost money) and spatial (your datacenter has limited space) constraints. Jeff Dean[R2] shows some manners to circumvent these constraints and maximize the system’s efficiency while guaranteeing the same quality-of-service for all users. I’ll discuss one of them here.