Menu

Post image 1
Post image 2
1 / 2
0

The Machine That Reads, Watches, Listens — All at Once

DEV Community·Bongho Tae·about 1 month ago
#lVoUtUl1
Reading 0:00
15s threshold

Imagine you're trying to help a friend over the phone book a flight. They send you a screenshot of the airline's website, then a voice memo describing what they want, then a short video they recorded of the confusing pop-up that keeps blocking the booking button. You glance at the picture, listen to the memo, watch the clip, and tell them: Click the small "X" in the upper right of the gray box, then scroll down past the seat selector. You did something extraordinary in that moment, and you didn't notice. You combined four different streams of information — pixels, sound waves, motion over time, and the memory of the conversation up to that point — and you produced one coherent answer. You did it in maybe ten seconds. For most of the past decade, getting a computer to do this has been the kind of problem that quietly drives researchers to drink. Not because any single piece is impossible — there are systems that read text well, systems that recognize images, systems that transcribe audio.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More