XENOPHONZ opened this issue on Jun 20, 2005 ยท 42 posts
danamongden posted Mon, 20 June 2005 at 11:13 PM
I think sound may turn out to be the real problem to "actorless" movies. Forget about conveying dramatic emotions, most text-to-speech solutions are doing well to be understood. I've heard some of the really high-end ones, and they're better, but they're still a far cry from sounding truly human. I know someone who does a lot of speech-driven voice-mail menus, and the ironic thing is that while they're getting better at understanding speech, the speech that the computer plays to the customer is all recorded from real humans. Beyond that, I think about all the foley artists, they guys who make all the sound effects, from dripping water to Earth-shattering Kabooms. While technically not actors, they do most of their work by recording real objects and putting them through trial-and-error modifications to get something that just sounds right. For the young filmmaker to crank something out in his basement, he'd need to solve these sound issues. The reason this seems to be a limiting factor to me is that I haven't seen much work done in this area. I first thought about the issue back in 92 or 93 when playing with an early version of 3D Studio. While we've spent a couple of decades and millions of research dollars working out the way light interacts with a variety of environments, I haven't seen much effort put into sound. For example, what is the sound of someone walking across the floor? Well, what kind of shoes? What kind of floor? Is it carpeted? What kind of carpet? How thick? What material? How heavy is the step? What material is on the walls and ceiling? How far away are the walls and ceiling? Where is the microphone relative to the steps? Are there any other sound sources in the room? We deal with a lot of the same issues in graphics rendering, but with light. Where's "Poser for Sound"? It doesn't exist, not even version 1. Maybe there's an active community involved in this research, but I've missed it. And each year that I go to SIGGRAPH, I look for presentations on this kind of thing. Admittedly, it's a graphics, not sound, conference, but there is a lot of animation there, and they have to deal with sound issues all the time. And yet, the closest thing I've seen was something that allowed you to move various sound sources (instruments in a string quartet) to be moved around a virtual room and then balance that sound between eight speakers spread through the real-world room. When I later asked one of the presenters how his software would deal with a change in the type of wall surfacing, say switching from cloth to marble, he just got this blank face and said, "We don't deal with those kinds of problems." Plus, the human ear is very discriminating. We think of it as a low-bandwidth sense because sound compresses so much tighter than video, but studies show that our visual processing is all about throwing away visual information to grab only key details, i.e. edges, motion, specific colors, etc. Maybe our hearing is similar in that respect, but I think about the innate ability to pick out a mistuned note in a chord structure, and I realize that we've got some powerful audio processing in our heads. Now, I'm no sound expert, but I think this is going to be a tough problem, and to my knowledge, it's not being attacked aggressively.