Over the past four years, readers have doubtlessly noticed quantum leaps in the quality of a wide range of everyday technologies. Most obviously, the speech recognition functions on our smartphones work much better than they used to. When we use a voice command to call our spouses, we reach them now. We aren’t connected to Amtrak or an angry ex. In fact, we are increasingly interacting with our computers by just talking to them, whether it’s Amazon’s Alexa, Apple’s Siri, Microsoft’s Cortana, or the many voice-responsive features of Google. Chinese search giant Baidu says customers have tripled their use of its speech interfaces in the past 18 months.
Machine translation and other forms of language processing have also become far more convincing, with Google GOOGL 0.44%, Microsoft MSFT 1.05%, Facebook FB -0.05%, and Baidu BIDU 0.15% unveiling new tricks every month. Google Translate now renders spoken sentences in one language into spoken sentences in another for 32 pairs of languages while offering text translations for 103 tongues, including Cebuano, Igbo, and Zulu. Google’s Inbox app offers three ready-made replies for many incoming emails.
READ MORE :
- Brain waves can be used to detect potentially harmful personal information
- Top three Responsive Themes for Your WordPress Blogs
- Commentary: Don’t adopt the ‘overregulate ’em all’ strategy on game apps
- Different Kinds of Sports Display Cases
- Virgin Mobile – Technological Advances inside the Mobile Industry
Then there are the advances in image recognition. The same four companies all have features that let you search or automatically organize collections of photos with no identifying tags. You can ask to be shown, say, all the ones that have dogs in them, snow, or even something fairly abstract like hugs. The companies all have prototypes in the works that generate sentence-long descriptions for the photos in seconds. Think about that. To gather up dog pictures, the app must identify anything from a Chihuahua to a German shepherd and not be tripped up if the pup is upside down or partially obscured, at the right of the frame left, in fog or snow, sun or shade. At the same time, it needs to exclude wolves and cats. Using pixels alone. How is that possible?
The advances in image recognition extend far beyond cool social apps. Medical startups claim they’ll soon be able to use computers to read X-rays, MRIs, and CT scans more rapidly and accurately than radiologists, to diagnose cancer earlier and less invasively, and to accelerate the search for life-saving pharmaceuticals. Better image recognition is crucial to unleashing improvements in robotics, autonomous drones, and, of course, self-driving cars—a development so momentous that we made it a cover story in June. Ford F 2.25%, Tesla TSLA 0.10%, Uber, Baidu, and Google parent Alphabet are all testing prototypes of self-piloting vehicles on public roads today. But what most people don’t realize is that all these breakthroughs are, in essence, the same breakthrough. They’ve all been made possible by a family of artificial intelligence (AI) techniques popularly known as deep learning, though most scientists still prefer to call them by their original academic designation: deep neural networks.
The most remarkable thing about neural nets is that no human being has programmed a computer to perform any of the stunts described above. In fact, no human could. Programmers have, rather, fed the computer a learning algorithm, exposed it to terabytes of data—hundreds of thousands of images or years’ worth of speech samples—to train it, and have then allowed the computer to figure out for itself how to recognize the desired objects, words, or sentences The Know It Guy. In short, such computers can now teach themselves. “You essentially have software writing software,” says Jen-Hsun Huang, CEO of graphics processing leader Nvidia NVDA 0.09%, which began placing a massive bet on deep learning about five years ago.