An elephant never forgets.
Okay, first a lesson in Fast Fourier Transforms, or FFT for short. FFT is a magic wand that turns periodic sound waves into frequency spectrums... and it can also do the reverse! Note that a periodic wave, in this context, is a wave that wraps around cleanly, so if you make a copy of it and put the copies end-to-end, the ends would meet. Most sounds that are recorded naturally are not exactly periodic, so if you read on the internet about using FFTs, you'll likely find a bunch of information on how to trick an FFT into interpreting a non-periodic wave without spewing out garbage. (If you see the term "window function", that's the trick.) HOWEVER that mess doesn't apply to us, because our objective is not to turn a wave into a spectrum. Our objective is the reverse: to turn a spectrum into a periodic wave, and a backwards FFT is perfect for the job.
Both the input and output of an FFT is in the exact same format: a one-dimensional array of two-dimensional vectors. Um, like this:
Those vectors are just random and not meaningful, the picture is just intended to show the data structure that we're working with. Each vector is stored as a pair of coordinates, but sometimes it's also useful to think of the vector as a length value and an angle value.
Again, both the wave and the spectrum are represented in this format. This is a little unintuitive, because sound waves are not vectors: we don't normally think of sound waves as having an angle, just an amplitude that changes over time. As it turns out you can safely ignore the extra axis when interpreting the data as a sound wave, and if you follow my directions the extra axis will be constant at zero anyway. However, the spectrum that we'll start with requires both length and angle.
When designing a spectrum, each vector describes a sine wave. If you set all of the input vectors to zero except for one, and then run it backwards through the FFT, the output will be a sine wave, AKA a "pure tone".
If two of the input vectors are nonzero, then the output will be the sum of two sine waves at different frequencies. Each sine wave has three properties, and these match three properties of the spectrum vectors:
1. The amplitude of a sine wave matches the length of a spectrum vector. Sometimes I refer to this as the "energy" of the wave: it's how loud it is.
2. The phase of a sine wave matches the angle of a spectrum vector. For example, a cosine wave is just a sine wave with a phase offset of 90 degrees. Note that humans generally can't distinguish waves by their phase, so in a sense the angle doesn't matter. However, if you leave the angle at zero for all spectrum vectors, then all of the output sine waves will rise at the same time, leading to a very degenerate case. I recommend using random angles to balance out the phases, but don't change the angles if you perform multiple FFTs with different energy distributions.
3. The frequency of the sine wave is determined by the index of the vector in the spectrum array, and also by how long the array is. This is a complicated relationship, but it's important to understand because our objective is to create noise, and noise is characterized by a distribution of energy across the frequency spectrum, so I'll explain in a moment...
The duration of the output wave is exactly the same as the number of vectors in the input spectrum, and performing an FFT is most efficient if this is a power of 2. For example, if the length of the array is 2^17 = 131072 vectors, then you'll end up with 131,072 audio samples, which is about 3 seconds at 44,100 hertz (this is the standard sample rate). For demonstration purposes let's stick with this duration, though you will likely decide to make it shorter.
The frequency of a sine wave is exactly equal to the index of the vector in the spectrum array divided by the length of the array. The first index is 0, and 0/2^17 = 0, so the frequency of the first sine wave is zero. This is not very useful, so I recommend always setting the first vector to zero. It's a special case.
input[0].x = 0.0;
input[0].y = 0.0;
The next index is 1, which means that the corresponding sine wave has a frequency of 1/2^17, which means that it repeats once every 2^17 samples, or about once every 3 seconds. You'll notice that this is the period of our entire output wave. Also notice that repeating once per 3 seconds is way below a human's hearing range. We can't hear any pure tone that repeats less than 20 times per second. So again, you can probably set the first few vectors to zero, (in this case the first 60 vectors) but it's not terribly important.
You can also ignore the frequencies higher than about 20000 times per second. However, if you do the math, you'll find that the entire second half of the frequency spectrum is higher than that, from index 2^(17 - 1) to index (2^17) - 1. That's okay, don't think too hard about the second half. You only need to design the lower half of the spectrum, and then afterwards you have to put a backwards copy of the lower half in the upper half. Um, just trust me, do it like this:
input[SAMPLES / 2].x = 0.0;
input[SAMPLES / 2].y = 0.0;
for (int i = 1; i < SAMPLES / 2; i++) {
input[SAMPLES / 2 + i].x = -input[SAMPLES / 2 - i].x;
input[SAMPLES / 2 + i].y = -input[SAMPLES / 2 - i].y;
}
As for the remaining input vectors, if you set all of them to have a length equal to 1 (and random angles) then you will get white noise after you perform the FFT. Technically, white noise has a flat distribution of energy across the frequency spectrum, and we represent that by setting each amplitude to the same value. However, humans are used to noise with less energy at higher frequencies, so I prefer to start with pink noise, which you can compute like this:
input[0].x = 0.0;
input[0].y = 0.0;
for (int i = 1; i < SAMPLES / 2; i++) {
// Pink Noise:
float length = 1.0 / sqrt((float) i);
float angle = randomAngle();
input[ i ].x = length * cos(angle);
input[ i ].y = length * sin(angle);
}
// And finally copy the lower spectrum into the upper spectrum.
input[SAMPLES / 2].x = 0.0;
input[SAMPLES / 2].y = 0.0;
for (int i = 1; i < SAMPLES / 2; i++) {
input[SAMPLES / 2 + i].x = -input[SAMPLES / 2 - i].x;
input[SAMPLES / 2 + i].y = -input[SAMPLES / 2 - i].y;
}
To generate something other than pink noise, multiply the "length" variable in the first loop by whatever you want! Then run it through the FFT and you've got your audio wave. Download some FFT library and call whatever function you need to call to perform the FFT. I've had success with
FFTW, which is very fast at the cost of having a complicated API, but any FFT library should work.
Afterwards, you'll want to copy the 2D audio samples into 1D audio samples, by dropping whichever dimension is left at zero, before trying to output the wave through speakers. You'll also probably want to normalize the wave, so that the peak of the wave is 1.0. And now you're done!
...Okay that was pretty complicated after all, sorry.