Archive for the ‘technology’ Category
Making the iPhone go faster
We need things to go fast. Really fast. So when came across Ryan Block’s old post about the iPhone’s vector floating-point coprocessor, I was encouraged. But how to access this new coprocessor? I was excited to find some examples on Matthias Grundmann and Wolfgang Engel’s Google code project, vfpmathlibrary.
They’re just getting started and only have 4×4 matrix operations coded up so far. Hopefully we can collaborate with these guys to expand the library and do some performance testing on different applications. More to come on that.
Mono: now with SIMD
I just came across Miguel de Icaza’s post about recent Mono performance enhancements. I’m very happy to see all of the innovation in the C# language by Microsoft (see C# 4.0), but the .NET runtime isn’t getting any faster, unless you’re a dynamic language guy.
However, the Mono runtime is getting faster. They’ve got static compilation, a brand new JIT compiler, and a framework extension so you can easily take advantage of the SIMD (vector) operations that have been available on Intel CPUs for a decade.
A simple way to think about it is that, for instance, you could use the Vector4f class (4 floats) and perform a floating-point vector operation four times faster. I can’t wait to work with the new Mono.
Android performance 3: iPhone comparison
The internet is just incredible. Within 30 minutes of logging onto the #iphonedev IRC channel on freenode, I got timing results for the iPhone on the simple loop benchmark from my last post. Thanks to ‘august’ for the help.
Here’s the benchmark converted into objective-C:
NSDate *start = [NSDate date];
int arr[8*320*480];
for(int i = 0; i < (8*320*480); i++)
arr[i] = i;
NSDate *end = [NSDate date];
NSLog(@”%g”, [end timeIntervalSinceDate:start]);
Results:
- iPhone (2.1 firmware, Objective-C): 9.5 milliseconds
And, from last time:
- G1 (R29 firmware): 922 milliseconds.
- G1 (R29 firmware): Loop only. 520 milliseconds.
Conclusions:
Objective-C kills the Java implementation on Android. It’s almost exactly 100 times faster. Note that I’m unsure if the memory allocation is included in the timing, so a more conservative statement is that Objective-C can run a tight loop 50 times faster than the Dalvik JVM. It’s also true that real applications aren’t full of tight loops, and a real Android application won’t be 50 times slower than an iPhone counterpart. Nevertheless, all else being equal, it will be slower, and potentially a lot slower.
For now, we’re sadly going to put our Android development on hold and switch to iPhone, and keep an eye out for performance improvements.
Android performance 2: Loop speed and the Dalvik VM
Let’s run a simple benchmark on the G1.
I had noticed that Android was running on some Java virtual machine called Dalvik, but hadn’t given it much attention otherwise. It turns out to be pretty important, after all. As far as I can tell, Google decided it would be a good idea to favor a small memory footprint over speed. Here’s the benchmark I just performed:
long start = android.os.SystemClock.uptimeMillis();
int[] image = new int[8*320*400];
for(int i = 0; i < (8*320*480); i++) {
image[i] = i;
}
long end = android.os.SystemClock.uptimeMillis();
long elapsed = end - start;
So, how long did it take?
- G1 (R29 firmware): 922 milliseconds.
- G1 (R29 firmware). Loop only. 520 milliseconds.
And, for comparison:
- Fujitsu T4220 (2.4 GHz Intel T7700). C#. 14 milliseconds.
- Fujitsu T4220 (2.4 GHz Intel T7700). Java. 16 milliseconds.
Note: I ran everything in “Run” mode (not Debug mode). (Debug mode causes the G1 to run about 4X slower in this benchmark.)
Conclusions:
Dalvik puts a big wall between you and the (already pretty slow) CPU.
It is claimed that Dalvik is designed for slow machines with low memory, powered by a battery. However, I don’t understand how the Dalvik interpret-only VM actually achieves this, other than through programmer castration. What does that mean? Well, you can’t really do much on Android that isn’t built into the runtime, or your application will crawl, and you’ll be forced to strip those features out and rely on the optimized ones that are built into the libraries. The memory footprint for Dalvik is lower, because there’s no JIT compiled chunks of code sitting in RAM. But isn’t RAM cheap, fast, and low-power these days?
Next time (Android performance 3):
- A duel with the iPhone. I’d like to run this same trivial benchmark on an iPhone in Objective C and see what happens.
Android performance 1: The G1
We are starting to develop on Android. The first Android device, of course, is the HTC/T-Mobile G1. We got one of the units this week, and I am just starting to look into exactly how fast the G1 runs. Performance is a hard thing to measure, but I wanted to start blogging about it in the hopes we can start a discussion and learn more about how to best measure and optimize performance on Android devices.
The best way to start, I think, is to just list the hardware specifications (from Wikipedia).
- CPU: Qualcomm MSM7201A (MSM7200 details)
- 528 MHz. ARM11 (same family as iPhone).
274MHz ARM9 coprocessor (not really “dual core” as is commonly claimed).
Java hardware acceleration but not on the Dalvik VM (Android). - GPU: (Shared with CPU)
Capable of 4M triangles/sec.
Capable of hardware-based image signal processor and JPEG encoder. - Video Decoding:
Chip supports 30fps VGA in
MPEG-4, H.263, H.264, Windows Media® and RealNetworks® - Video Encoding: (Not yet available on Android!)
Chip supports 30fps VGA in
MPEG-4, H.263 and H.264 - Network speed:
Supports T-Mobile UMTS (3G) 800/1700/2100 MHz
Possibly supports AT&T UMTS (3G) 850/1900/2100 MHz. This disassembly shows the RTR6285 radio chipset, which supports both 3G platforms. However, there are two power amplifiers — one at 2100 MHz, and one at 1700 MHz (T-Mobile frequencies). Nevertheless, I don’t think the 1700 MHz amplifier attenuates 1900 MHz. Look at the datasheet and see what you think. So AT&T 3G would probably have a reduced range, but I think you could make it work.
There are a few open questions I’d like to answer related to this specification list. Can the G1 support AT&T 3G (see above)? Does the Android JVM benefit from the CPU’s Java hardare acceleration? Does the JPEG encoding (Bitmap class) on G1 tap into the hardware?
We should also run a series of benchmarks and compare the Qualcomm processor’s Java performance, to, say, an Intel Core 2, for a number of tasks. That way we can roughly estimate how fast something will run on a G1 before actually porting and deploying. If you’ve seen any benchmarks like this, let me know, so we don’t reinvent the wheel!
Next time (Android performance 2):
- Benchmark for a simple array-indexing loop.
Android spam: Faster than UPS
We decided to get an Android G1 phone for work, and it arrived today. I was excited that we had already received two text messages!
But not so happy when we looked closer. One was a message from T-mobile, sent yesterday, telling us what the phone’s number was. Useful — timely. The other message was from “FStick13″, also sent yesterday. Not so useful — but timely. Nice work, spammers, you beat the delivery guy. If only your talents were used for something a little more positive.
Dawn of the Vision Era
In the 70’s, we thought it would be easy to create machines that could see. We were wrong. But today, we’re on the cusp of something exciting.
If you can define the vision problem precisely, odds are, we can build a machine that rivals or exceeds human ability: We can build machines which are better at recognizing faces than people. We’re wired to recognize a few hundred or a few thousand faces, but security software can scan for one in a million. It’s not just for security, anymore. We can do this across the web, and recently, in our own photo sets.
In swimming pools, Lifeguards aren’t always vigilant, but increasingly, computer vision systems are.
We’re getting better at taking large collections of photographs and recreating full 3D (or 4D) scenes. Photo tourism is already changing the way we review large collections of photos in popular areas.
We still suck at building vision software that can perform general object recognition as well as humans. But some groups are working on that. I don’t think it will take long before these systems rival human ability for any visual task that you can perform in under a second.
The most exciting thing is that the game doesn’t stop when we match human ability across a broad spectrum of tasks. Instead, it gets more interesting. Today, we can’t see through walls, and we can’t recognize everyone in a crowd. We can’t jump three-hundred feet in the air to get a birds-eye view. We can’t recognize every species of plant and animal. We can’t read text in more than a handful of languages. We can’t see beyond the human visual spectrum. You get the idea. It’s the dawn of an exciting time.
