|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Speak and See
Wednesday, August 14, 2002
I made this speech recognition toy- you say things and then see them appear on the screen. Originally I wanted it to be a sort of art piece that provided a visual backdrop to whatever conversation was happening nearby. Due to the accuracy of both speech recognition technology and google image searches, it doesn't always show what you might expect. If you say 'Boston', you may see a city map, a boston terrier, an album cover for a the band by the same name, or a black and white drawing of guys dressed up like indians throwing crates of tea into a harbor. How it works: It's a full screen app. On startup, it's just blank white. When the system recognizes a word or phrase, it goes out to Google and searches for images matching the word or phrase. Then it shows them on the screen in a 4x5 array of thumbnails. System Requirements:
Setup/Install/Run:
Implementation Details: It's an HTML Application, a strange beast concocted by MS. It's really just a web page, so you can edit it with any text editor. The code is JavaScript, which accesses the MS Speech API (SAPI) and HTML DOM via COM interfaces. The SAPI functions are pretty obvious. The DOM access is for picking images out of the web page returned by the Google image search. This turns out to be a lot easier than trying to screen-scrape the html for image links, or using Google's Web Service API. What Would Really Rock: Get a highly focused microphone, and a portable speaker. Connect them to a portable computer running some evil software: take the incoming signal from the microphone, invert and phase shift it (correcting for propagation delay), and send it back out through the speaker. It could silence things, like people who are trying to tell you to shut up about your stupid ideas! What Woud Also Rock: Same device as above, but instead of simply inverting the signal: run a speech-to-text algorithm on the incoming signal, run the text through babelfish.altavista.com, run that through a text-to-speech algorithm in the voice of the opposite sex of the original speaker's, mix it with the inverted and phase shifted original signal and send that back out. Everything the guy says would come out sounding female and in french, for instance. Now who's laughing? |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||