Archive for the ‘computer vision’ tag
Dawn of the Vision Era
In the 70’s, we thought it would be easy to create machines that could see. We were wrong. But today, we’re on the cusp of something exciting.
If you can define the vision problem precisely, odds are, we can build a machine that rivals or exceeds human ability: We can build machines which are better at recognizing faces than people. We’re wired to recognize a few hundred or a few thousand faces, but security software can scan for one in a million. It’s not just for security, anymore. We can do this across the web, and recently, in our own photo sets.
In swimming pools, Lifeguards aren’t always vigilant, but increasingly, computer vision systems are.
We’re getting better at taking large collections of photographs and recreating full 3D (or 4D) scenes. Photo tourism is already changing the way we review large collections of photos in popular areas.
We still suck at building vision software that can perform general object recognition as well as humans. But some groups are working on that. I don’t think it will take long before these systems rival human ability for any visual task that you can perform in under a second.
The most exciting thing is that the game doesn’t stop when we match human ability across a broad spectrum of tasks. Instead, it gets more interesting. Today, we can’t see through walls, and we can’t recognize everyone in a crowd. We can’t jump three-hundred feet in the air to get a birds-eye view. We can’t recognize every species of plant and animal. We can’t read text in more than a handful of languages. We can’t see beyond the human visual spectrum. You get the idea. It’s the dawn of an exciting time.
The right tool makes all the difference
You’ve been there. We all have. We’ve all tried to make the round peg fit into the square hole. Only this time I’m not referring to it figuratively. I mean we’ve all literally tried to make the round peg fit. It didn’t fit, so we tried to jam it. Occasionally it actually works out: “Who needs the square peg anyway? Square pegs are a total ripoff.” But sometimes, you give up and shell out for the square peg. And it’s incredible how well it gets the job done. You never seem to fully appreciate the power of having the right tool until you’ve been in one of these situations. You also never seem to learn your lesson, “Hmm… I wonder if this square thing can do an octagonal hole…”
You’re probably wondering, what does this have to do with anything?
I’m actually writing about a new tool that was recently built by Nicolas Pinto and several other artificial vision researchers at MIT. The goal: learn to replicate the human vision system. One of the big challenges in artificial vision research is testing your hypotheses efficiently. Since we’re trying to replicate the human vision system, we can look to biological neuroscience for qualitative hints on how our artificial systems should work. But there are still so many unknowns that, even when we’ve nailed down an algorithm, we end up with a wide swath of magical parameters to fill in. Artificial vision is so computationally complex, that running even a small-scale test with one parameter setting could be time-consuming on a normal CPU. If the test “fails,” should we blame the parameter setting, the scale, or the whole approach?
Nicolas and the team recently assembled a computer with sixteen graphics processors. The idea is that this new tool can run tests so much faster (kinda like Devver!), that they can afford to toggle more parameters and run at a larger scale. I’ve personally been frustrated with the inability to run a lot of tests, so I envy this new square peg, and intend to follow what they manage to learn with it.
In case you’re wondering what this thing looks like:
(Thanks Nicolas for permission to republish the photo.)

