The current trading systems on the exchanges and other trading venues are commonly written in C/C++. We hear the exchanges quote round trip times through the exchange in micro-seconds. These numbers measure when an order message is received from a market participant to when the fill message is transmitted to that same participant. These times do not include any latency to or from the exchange. The leading exchange is able to fill orders in under 200 micro-seconds.
Application developers use the system call gettimeofday() to obtain the current time from the system in microseconds. As the speeds increase, developers are challenged to write faster and more efficient code. The calls to gettimeofday() often remain in the code in production so latencies can be measured during different times of the day - particularly peek times at 9:30-10:00am and 3:45-4:00pm. However, there is overhead in constantly making calls to the system clock to get the current time.
Developers have started to look for more accurate timing methods. Linux provides access to the HPET (High Precision Event Timer) that is able to return the current time in nanoseconds rather then microseconds. By using the HPET we are able to measure the overhead of calls to gettimeofday().
A C program was written and complied under Linux to run tests on both Intel and Power based systems. The code was compiled without any optimizations using the same options on both platforms. The code makes 100 gettimeofday() calls then averages the time to estimate the overhead obtaining the current time. Using calls to the HPET, nanosecond granularity is achieved. However, there is also overhead when calling the HPET. By using the average over 100 calls, the overhead should be minimized. This was repeated 10,000 times to see how much variability exists between the systems.
Intel Setup: Dual Socket 2.4 GHz Nehalem. RHEL 6.2.
Power Setup: Dual Socket 3.7 GHz 6-core P7 IBM with 2 Sockets installed. RHEL 6.2
Run 1: LPAR with 4 dedicated CPU's
Run 2: LPAR with 12 shared CPU's in a pool
Code Block used for testing
for(i=0; i < NUM_RUNS; i++)
Summary (all times in nano-seconds, lower is better)
Graph of Results
1) There was significant jitter on the first few test results during any given run. The first 10 results were omitted from the test results as the program seemed to "settle down" to a more consistent result
2) The Intel system provided significantly more jitter and more extreme outliers during the testing
3) Contrary to the expected result, the average time for the timer call was faster in an LPAR with shared CPU resources as compared to the dedicated resources. However, the standard deviation was lower suggesting lower jitter when using dedicated CPU's.
Multiple calls to gettimeofday() can have an impact on performance of the system. Developers need to be wary of making too may calls to the timer and only utilize the calls when necessary as the overhead can quickly add up. In addition, the Intel platform had some significant outliers that could cause slowdowns in the system. While typically taking less than 100 nano-seconds to get the time, on occasion, an Intel based system can take close to 3 microseconds to make the call. This test is only able to highlight the outliers and is unable to determine the cause of the outliers. The IBM Power platform provides faster and more reliable access to the timer.
Director of Engineering
Integration Systems, LLC