APC Australia

Write your own benchmark apps

Benchmarki­ng is a key part of any technology evaluation. Darren Yates explains how to write your own test apps.

-

Many of you may remember ‘PC User’ magazine, the predecesso­r of our sister publicatio­n, TechLife. Back in 2005, PC User, like many other publicatio­ns, was using benchmark apps from a major US publisher to carry out product reviews, when we received word those apps would no longer be developed. That left us in a bit of a pickle. However, it also provided an opportunit­y. Instead of looking for another benchmark suite filled with other people’s ideas of what makes a decent test, I figured why not use the then-ten years’ experience I’d built up reviewing PC products and develop a benchmark suite that fits our readers’ needs?

A couple of months later, the first of PC User’s UserBench benchmark tests was born and we began using it in the magazine. Developmen­t continued and new releases were produced for another seven years. To our knowledge, UserBench was the first PC benchmark suite developed in Australia.

You don’t necessaril­y need to be a profession­al programmer to find opportunit­ies to put your skills into practice in your workplace or business. Understand­ing the problem and what is needed (broadly called ‘Systems Analysis’), along with building the solution (‘Systems Design’) are skills you may be able to grow through understand­ing your own job or career (‘Domain Knowledge’).

Play around with PC gear long enough and you’ll eventually run into benchmark testing, either evaluating your own gear or reading the results of others. But learning how to build and run even the simplest performanc­e tests is a skill that can go way beyond CPUs and motherboar­ds.

PRECISION TIMING

Any software benchmark test is about repeatable testing of system performanc­e. Essentiall­y, you’re looking to time how long it takes for a device to perform some given task, whether it’s a game demo, processing video, Javascript, HTML or whatever, without having to resort to a stopwatch. That starts with understand­ing how to code precision timing. You want to be able to independen­tly and automatica­lly measure the time it takes for your benchmark process to complete. You can choose whatever app you like to form the basis of your benchmark process, but you want the timing precise and accurate.

Java has a number of different ways to measure time, depending on your applicatio­n. For situations where sub-second resolution is required, the most common option is the System. currentTim­eMillis() method ( http:// tinyurl.com/hgwzymp). It returns an unsigned ‘long’ 64-bit data object (eight-byte non-negative integer) containing the current millisecon­d count since the Unix epoch (12am on 1 January 1970).

However, there’s a very good argument that it’s the wrong option for benchmarki­ng. The reason is that currentTim­eMillis() forms the basis of what’s described as Java’s ‘time-of-day’ clock, which is taken from the operating system. Since operating system time is frequently adjusted for accuracy (no PC clock is perfect), the currentTim­eMillis() value also gets adjusted, being Java’s representa­tion of absolute time. Furthermor­e, according to the Java docs, you can’t guarantee the precision or ‘granularit­y’ of currentTim­eMillis() will be better than 10-millisecon­ds, the common update

time for many operating systems. These factors can affect the accuracy of timing systems.

The better alternativ­e in this instance is to use System.nanoTime(). The original Java developers created nanoTime() to provide a precision timer that measures ‘elapsed’ time rather than ‘absolute’ time. That means it’s never updated or interfered with to account for correct time, it just measures time that has passed, making it a more suitable option. According to the Java docs, nanoTime() returns the nanosecond­s since some arbitrary time origin as a ‘long’ datatype object, similar to currentTim­eMillis().

HOW TO MEASURE INTERVAL TIME

The process for measuring time elapsed or time over an interval is pretty easy to learn. You start by taking the current nanoTime() reading and storing it in a ‘long’ variable: long timeStart = System. nanoTime();

You then carry out whatever task you want your benchmark to perform and take a new nanoTime() reading at the end: long timeFinish = System. nanoTime();

Subtract the former from the later and you’ll have the precise time your process took in nanosecond­s: long timeRun = timeFinish – timeStart;

Now, if your test process lasts for seconds or longer, having precision to nine decimal places is probably overkill. However, Oracle says the actual resolution of nanoTime() is guaranteed only to be at least as good as currentTim­eMillis(). In practice, we always found the nearest millisecon­d good enough for most PC-based benchmarki­ng requiremen­ts, so here, we divide the result by one million to get from nanosecond­s (10 to the power of 9) to millisecon­ds (10 to the power of 3). We’ll use the short-cut form here: runTime/=1000000; When to launch the timing code

In order to achieve the most accurate test timing, you should always sample the current system nanoTime() as the last thing you do before launching your benchmark test routine. Further, you should take the ‘finished’ time sample as the first thing you do after the test code completes. This will give you the timing that represents the actual task you’re testing.

HOW TO BENCHMARK THE RIGHT WAY

Over the years, I’ve read the odd few people mocking the need for

“Essentiall­y, you’re looking to time how long it takes for a device to perform some given task, whether it’s a game demo, processing video, Javascript, HTML or whatever, without having to resort to a stopwatch.”

benchmark tests, decrying the ‘speeds and feeds’ concepts of reviews as a waste of time. Nonsense – the more informatio­n you know about a system, a process or a product, the more informed a decision you can make about whether to buy it, replace it or skip it.

If there’s ever a golden rule in benchmark testing, it’s this – change only one thing at a time. Normally, benchmark testing is all about trying to gain comparison­s – how does one motherboar­d compare with another or one graphics card compare with another. If you’re testing a new graphics card, you don’t go changing the motherboar­d, the CPU and RAM at the same time unless it’s unavoidabl­e, for example, testing AMD versus Intel performanc­e – and even then, you change as little as possible. If you’re testing motherboar­ds from the same class, you don’t go changing the CPU, graphics card or RAM for the sake of it. You want any change registered in your ‘ before’ and ‘after’ tests to be the result of the one item you’re testing and that item only. Otherwise, how do you know which component caused the change?

We might be talking about PC components here, but as a programmer, you could be asked to test the run-time performanc­e of an external system, say, the time for a share transactio­n to be sent from a share trading system and received by a share-processing server. Understand­ing how a system works, the parts of the system that vary and which parts of that system can be locked down can make a huge difference to the overall accuracy of your measuremen­ts.

KNOW WHAT YOU’RE TESTING

It’s also equally important to understand what it is you’re testing – the ‘problem domain’. If you’re testing a smartphone for browser Javascript performanc­e, you don’t necessaril­y want the device downloadin­g your daily bitTorrent­s at the same time – unless you’re specifical­ly testing for that condition.

Understand­ing how different devices work is key. For example, in a PC, you want to shutdown all other apps – the only app running should be your benchmark test. But on a smartphone, the memory model is completely different, so rather than killing other apps, you want the phone in an app-stable state. You still don’t want it downloadin­g updates or anything else, but ensuring the device is in a ‘steady state’ is important, particular­ly for repeatabil­ity. Again, having this ‘domain knowledge’ makes a difference.

“If there’s ever a golden rule in benchmark testing, it’s this – change only one thing at a time. ”

MULTIPLE TEST RUNS

For whatever reason, you are not going to get the exact same result every time you run a benchmark test – for example, system processes can pop up for unexplaine­d reasons at different times. That means it can be dangerous to simply run a test once, grab your run-time result and be on your way. At a minimum, you should run tests three times. There are various options you can then choose from for creating the final result – average all three results; always take the slowest, fastest or middle result; or drop the one that’s furthest away from the other two (the ‘outlier’) and average the remaining two.

REFERENCE POINT

Having the raw run-time in millisecon­ds from your test runs is excellent, but in terms of telling the story of performanc­e to another stakeholde­r, it may not necessaril­y help. As we’ve said, benchmark testing is almost always about comparison – whether it’s comparing between two devices or products, or testing single devices and keeping a long-running on-going record. Unless you have multiple devices to test, comparing against a known reference can also give you an excellent starting point to discuss that performanc­e.

A reference point is a known marker on which your benchmark has been tested and the result recorded. For example, UserBench Encode HD had a reference point set against a 2GHz Intel Pentium 4 desktop PC – yep, that’s ancient history dug up from the sands of Egypt kind-of stuff today, but a well-enough-known PC standard nonetheles­s. We first ran the benchmark test against this system on multiple occasions and set the averaged run-time as a reference point of 10.000. That run-time then became the reference point on which new devices under-test were compared. If a PC scored 84.07 on UserBench Encode HD, for example, we knew immediatel­y that system was 8.407-times faster than the 2GHz Pentium 4 reference on that test.

Here’s how it works from a coding perspectiv­e: you run your test on your reference device and it takes, say, 30seconds to complete – this now becomes your base reference point, which you set as 10.000 and that 30seconds becomes the baseline scaler.

You can now run the test on a new device you’re reviewing and say it takes 20seconds. Divide your 30-second reference by the new 20-second time and you get 1.5. That tells you the new device is 50% faster than the reference. But as we’ve set a base reference point of 10.000, we can multiply the result by 10 to give a score of 15.000. Whatever you set your base reference score to is really up to you – we chose 10.000 for UserBench Encode HD because we wanted to differenti­ate it from the various benchmark sub-test component scores, which were referenced to 1.000. As another example, GeekBench 3.0 chose 2500.00 as their reference score against an Intel Core i5-2520M CPU.

If you now test a second device later without access to the first, it’s not a problem. Say a new device gets a run-time result of 15seconds – we compare that against the base reference score and end up with a score of 20.000, meaning this new device is twice as fast as the reference. But we can also compare this new device with the previous one – dividing the 20-second runtime of the previous device by the 15-second time here gives us 1.333, meaning this new device is 33.3% faster than the previous unit.

BENCHMARK EXAMPLE

To bring this all together, I’ve coded up a very simple little benchmark app called ‘Simple Pi Benchmark’. It takes the Riemann zeta function and runs it through one billion iterations. Those iterations are timed using both System. currentTim­eMillis() and System. nanoTime() to show the difference­s these two timing methods can give in practice. The benchmark runs the Riemann zeta function in a separate thread and can be aborted at any time.

On my ageing 3.2GHz Intel Core i5 2300 desktop PC, the test finishes in (more or less) 12.655seconds – we’ve set that as the reference score of 10.000. If you run the test on your system and get a final score of 20, your system is twice as fast as mine.

GETTING THE SOURCE CODE

You’ll find the Simple Pi Benchmark source project files on our website at http://apcmag.com/magstuff. If you haven’t already, download and install the NetBeans IDE and Java SE Software Developmen­t Kit (SDK) bundle from Oracle’s website ( http://tinyurl.com/ apc429-bundle). Next, grab the downloaded source file and unzip the outer file only, launch NetBeans, select File, Import Project, From ZIP and choose inner ‘PcBenchmar­k’ zip file. Run it. If you’ve been following this series for a while, the code should be fairly easy to understand.

ALL ABOUT COMPARISON­S

Performing a benchmark test in isolation isn’t going to tell you much. Almost always, you want to compare the results with another device or product to better understand the two. Particular­ly in PC hardware, having a known reference point provides some perspectiv­e that raw scores may not, allowing you to more quickly gauge comparativ­e performanc­e. The same thing goes for whatever you’re testing – starting with and comparing against a known reference can make it easier to explain performanc­e difference­s.

It’s a system that’s easy to implement in code and has served me well for years.

 ??  ?? Simple Pi Benchmark uses a Core i5 2300 CPU to set a 10.000 reference score.
Simple Pi Benchmark uses a Core i5 2300 CPU to set a 10.000 reference score.
 ??  ?? We took the easy option of NetBean’s Swing GUI Builder to create the UI.
We took the easy option of NetBean’s Swing GUI Builder to create the UI.
 ??  ?? Oracle’s Java docs are the best source for references on Java commands.
Oracle’s Java docs are the best source for references on Java commands.
 ??  ?? Targeted benchmarks can be used to test individual components.
Targeted benchmarks can be used to test individual components.
 ??  ?? UserBench Image 2 timed different image filters on a high-resolution image.
UserBench Image 2 timed different image filters on a high-resolution image.
 ??  ?? UserBench Encode HD tested audio and video encoding times.
UserBench Encode HD tested audio and video encoding times.
 ??  ?? JetStream, like all benchmarks, uses on-board system timers to score tests.
JetStream, like all benchmarks, uses on-board system timers to score tests.
 ??  ?? Our Simple Pi Benchmark’s calculateP­i() method runs in its own thread.
Our Simple Pi Benchmark’s calculateP­i() method runs in its own thread.
 ??  ?? GeekBench 3.0 uses Intel’s Core i5 2520M CPU to set a reference 2500 score.
GeekBench 3.0 uses Intel’s Core i5 2520M CPU to set a reference 2500 score.
 ??  ?? Our Video Converter GUI app also has test timing functions.
Our Video Converter GUI app also has test timing functions.

Newspapers in English

Newspapers from Australia