Voice Changing and Imitation [Archive] - The Explosives and Weapons Forum

View Full Version : Voice Changing and Imitation

hatal

November 19th, 2008, 06:18 AM

There two tools which I came across lately which can morph/change your voice by computerized means. These are: "AV Voice Changer" and "Vocal Imitation". Each has something unique to offer and each has something missing. But both can be very valuable.

With AV Voice Changer you can modify your voice in online chatting in real-time. This a voice morph application. Which means you can disguise, distort your voice. Even in real-time using Skype for instance. Sadly I have found nothing in the specs about using recorded segments as template for voice changing.

On the otherhand Vocal Imitation allows the user to imitate vocal characters segments from one person into other person voice in such a way that a second person voice shall be heard speaking in the same voice as the first person. The program also provides manual adjusting and fine tuning. Sadly It does not seem that it has real-time functions to get along with VOIP applications.

(There is a last question that boggles me? Is the process of changing reversable? Meaning: can somebody who recorded your morph voice reverse the morph or imitation process, so he can hear your real voice? Encryption possible?)

EDIT: There was a special version made of "Vocal Imitation" just for the fedgov's. Wonder why? ;)

Rhadon

November 19th, 2008, 08:56 AM

can somebody who recorded your morph voice reverse the morph or imitation process, so he can hear your real voice? Although some information may be lost in the process, you have to assume that the original audio can be recovered. At least you will have to assume that the output will still contain some characteristic features of your voice. If someone has got two audio files, one of you speaking naturally, one altered by voice imitation, I guess that it would not be hard to let a computer find out striking resemblances between the two.

If you want to make it irreversible, a lot of information must be destroyed in the conversion process. This is usually done by removing all dispensable frequencies and a process called quantization (http://en.wikipedia.org/wiki/Quantization_(signal_processing)) with as few steps as possible (this will give you this typical blackmailer kind of voice you know from TV). Then, you would still need to change the way you speak, so I guess it's just easier to use a text-to-speech program if you really want to be safe :).

megalomania

November 25th, 2008, 12:16 AM

I tried AV Voice Changer some months ago, but the crapware program would not work on my PC. By the way, about a year ago I read a journal article about an analysis technique the fedgov has that can reverse process any voice changing technology to its original sound. This is something akin to a flaw in encryption, taking advantage of recognizable fingerprints of computer alteration and working backwards to unravel any voice alteration technology. If that’s what they can do in a declassified article…

Rhadon

November 25th, 2008, 09:32 AM

By the way, about a year ago I read a journal article about an analysis technique the fedgov has that can reverse process any voice changing technology to its original sound.
Do you have a link at hand? This is something I'd like to have a look at.

It sounds plausible for the majority of commercially available voice changing products, because like I said in my previous post, you cannot assume that enough information has been dropped.

Yet, it is safe to say that there do exist voice changing algorithms which cannot be reversed, no matter what kind of conspiratorial methods the feds will use ;). That is, again, because a safe voice changing algorithms will drop enough information in order to not allow a reconstruction close to the original input. For a simplified illustration, have a look at the picture I attached. The red graph shall describe the original, unaltered voice, while the blue is an example of what an output could look like. Note that while the output does still resemble the original function to some extent, a lot of information is irreversibly lost.

The hard part would be to drop as much information as possible without making it too difficult to understand the output.

iHME

November 25th, 2008, 11:52 AM

Hawing a ridiculously lossy compression and transformation could make the reversing a royal pain ot better yet, impossible.

Being a little bit of a electronics minded person I thought about some analog filter with the software system could mask it even more.

Lets say we first cut all unnecessary frequencies off, like the iridium satellite phone systems or better, old analog SSB radio systems.

Then the already compressed voice is inverted and then inverted back to normal, both done with separate circuits and based around different op-amp chips, do get some variety.

Of cource it could also make the system so good that the voice would be so masked and altered that you could not even make up what he/she/it is saying :p

Bugger

November 26th, 2008, 05:49 AM

Fake Voice v.1.08:
http://rs1tl2.rapidshare.com/files/88791444/Fake-Voice.v1.0.8.rar 4,451 Kb

iHME

November 26th, 2008, 05:24 PM

Now I came to think of this, why not start from the "end"?
Does any one know what kinds of analyzation and filtering systems and methods due the feds/other people who you want to stay anonymous from, in fact use?

Lets say that they use method X, so why not alter the voice with method Y that would make the method X useless?

I know nothing on voice analysis though.

megalomania

November 27th, 2008, 04:09 AM

I think the most secure way is to use pre-recorded voice samples to computer generate human speech (a library of sound bytes used as a speech synthesizer). Essentially, the best thing to do is not use your voice to begin with.

festergrump

November 27th, 2008, 09:21 AM

Well, in that case, IIRC every Microsoft operating system from 2000 on has had a build in Narrator program which would do the job easily enough.

I picture someone using it on a laptop in some B-grade movie to direct the ransomees to where to drop the suitcase full of money. Better type quickly!

The best thing about using such a distributed copy (as in you get it even if you don't want it because it comes with the laptop, like it or not) is the deniability factor. Practically everyone has it, so there's no getting caught with "the specific program used" in a caper.

Rhadon

November 27th, 2008, 12:05 PM

Does any one know what kinds of analyzation and filtering systems and methods due the feds/other people who you want to stay anonymous from, in fact use? You can never know which methods will be used, that's why your voice changer will have to "defeat" all of them, even unknown ones (as far as that is possible...). That's why I wrote you need to drop as much information as possible or if at all possible, use text-to-speech in the first place.

If you're having two audio files, one with altered, one with unaltered voice, I can see three main ways to find similarities between the two:

Using correlation. You can find out how long you have to delay one of the signals in order to get the best superposition between the two. You can also see how well they match when using this delay. This is probably the weakest tool because it will fail unless there is at least one identical text part which you recorded into both files.
Using fourier analysis. Can find similarities in the frequencies. Should work fine even if you recorded a completely different text into the audio files. While the frequencies will be affected by voice any changing algorithm, you will probably still find some similarities between the two.
Using wavelet analysis. This should be the strongest tool at hand. As far as I can say, it's possible to create a very complex analysis of how specific audibles, ligatures and the like are pronounced. Just like fourier analysis, it will look at the frequencies in the audio files, just in a more powerful way: Put simply, the advantage is that you can see which frequencies there are at any point of time in your audio file, while fourier analysis can only be applied to a certain range of it (e.g. the whole file or 20 seconds in its middle).

hatal

November 27th, 2008, 02:22 PM

I tried TTS softwares but the free versions seemed and sounded very primitive. Totally un-life-like. What I had in mind was software which morphes your voice (irreversibly), performs this process in real-time and is compatible with atleast one voip-application.

festergrump

November 27th, 2008, 02:53 PM

At the possible expense of being overly simplistic and perhaps obtuse, have you considered a distortion stompbox pedal, like for a guitar effect? Portable (as long as the amp in question is), uses line level input, and practically cheap as dirt.

Sidenote: I once converted an old belt-worn hearing aid box into a supersmall guitar amp. The overdriven distortion that provided was an excellent fuzzbox.

I know nothing about how the effect may be filtered back to original wave configuration by compression or other means, but I thought it might be worth mentioning. (Flame on if I'm a dumbass, but sometimes the KISS methods are best!) :p

megalomania

November 29th, 2008, 08:11 PM

I don't think this is the right article, but it is close:

State of the art automatic speaker recognition systems show very good results in the discrimination between different speakers under controlled recording conditions. In a forensic context, the conditions are uncontrolled and voice can be disguised. In cases of terrorism claim, extortion or kidnapping, it is of great interest for offenders to conceal their identity. Voice disguise is an important constraint to speaker discrimination. Some disguises produce a great variation of parameters and change the perception of an identity. The main risk is to confound a disguised voice and a normal voice and accuse an innocent individual. This paper proposes on one hand to present the impact of voice disguise on automatic speaker recognition and, on the other hand a statistical study in order to detect and identify four disguises among the most common. The first step consists in extracting features and the second step to classify them. MFCC (Mel Frequency Cepstral Coefficient) are considered as features and different classification algorithms have been tested. The studied disguises are based on a deliberated and non electronic way. The proposed analysis of disguised voice classification provides interesting results in detection by the use of SVM (Support Vector Machine) and in identification by the use of GMM (Gaussian mixture models).