So I have laid out a strategy for creating a surround spacializer or virtual surround sound generator. I've briefly reviewed the maths on Fourier transforms and Laplace transforms, as well as how these two relate to linear electric circuits. I forgot how much I learned in my undergraduate years in college!
The good news is that after forgetting all of this material (it was basically elective coursework for me in college), I finally remembered that it's not just possible, but rather straightforward to make a machine that will generate the HRTF of the human ear for a *monaural* sound source. The trick is to write KIrchoff's Voltage Law (KVL, basically enerrgy conservation for electric circuits) in terms of first principles (ie basic physics laws of each electric component ) to develop an equation to which you will apply either the Fourier or Lapace transform (read below).

To get to the frequency response of a physical circuit you first have to express
it's energy conservation law as an integro-differential equation

As an intro to the maths, I can just summarize in this way: basically what you're trying to do is to express the electrical energy conservation law (expressed as KVL), in terms of sine waves and exponential functions using complex variables (Euler's equation and related theory). The reason is that there is something called Fourier's Transform Theorem that basically says that any time domain signal, like music for example, can be substituted by an infinite number of sine waves of different amplitudes and frequencies, similar to the way an mp3 player or a 1980's graphic equalizer can break the signal into different bands of frequencies that you can control individually to your liking.
The advantage of expressing a signal in terms of sinusoids of different frequencies is that sine waves have special maths characteristics that allow you to simplify the calculations a great deal when you need to apply calculus or process differential equations. The first principle laws that govern the behavior of capacitors and inductors in a circuit turn from differentials and intergrals in the time domain, into multiplications and divisions in the frequency domain, and your KVL equation is converted into a single and a fairly simple algebraic equation.
So now you can perform the Laplace Transform (shown) or the Fourier Transform to express in the complex frequency domain

The equation turns into an algebraic equation in terms of the complex frequency "s"

So now you can start evaluating an algebraic equation that explicitly tells you how frequencies are handled by the individual components or the collective assembly of electric components in your circuit. In other words, you can "translate" between electric components like capacitors and inductors directly into a plot that gives you Transfer Function, or frequency response of the electric circuit. You can get the phase as function of frequency too (The two plots together are called a Bode Plot). And that is exactly what we want. A Transfer Function that matches or approximates the HRTF that I was talking about before, and a way to look at the phase of signals as well.
And solve for current (shown) or voltage behavior as a fuction of the complex frequency.
If "s" is set to have only an imaginary component (j) that is the same as a Fourier Transform and
all the functions in the plots are expressed in terms of frequency, "f"


The Laplace transform, is basically very similar to a Fourier transform, except it deals with functions of frequency expressed in the entire complex number domain as opposed to the Fourier transform that deals functions of frequency and imaginary numbers that represent rotating vectors called phasors. In fact, if you set the complex numbers in the Laplace Transform to have their real component equal to zero, then the Laplace Transform is exactly a Fourier Transform. The HRTF plot I compiled from the German researchers above is a Fourier Transform, but it's trivial to convert to a Laplace Transform.
In fact during the design process, you're not going to have to calculate too much other that put together a collection of simple RLC circuits that will produce quadratic equations that look similar to the HRTF curves, then you just work your way back to what the values of the RLC components are! Not very different than building a 1980s graphic equalizer - in fact that is exactly what it is, except that you need one graphic equalizer for every single "virtual sound source" you want to "hang" somewhere in 3D space.
The difficulty is that using analytical equations, you can only do this for point sources, because it's impossible (as far as I can tell) to find a maths transform that will map a time domain signal (eg music) to an infinite number of space locations (inside your room) with varying frequency response. In fact, I think there is an infinite number of ways you can arrange speakers in a room to give you a given time domain signal or equivalently a frequency response distribution in your room. This difficulty is referred to as an inverse design problem. You can try an infinite number of ways to give you the same result, and consequently there is no analytical method to give you a unique solution for a given asked question.
More specifically, you can't map the time domain signal to a signal expressed as function of azimuth or elevation around your head; that'd be like having an infinite number of 1980s style frequency equalizers for every possible angle around your head. And to make it worse, for a stereo signal, azimuth is rather arbitrary, because musicians are not necessarily encoding the sound location of say, an instrument or a singer, in their stereo signal. The latter, is in fact what quadraphonic systems did in the 1970s; these were methods to try to encode a sound location into the music and then decode it again. The music was manipulated in terms of phase and strength to make sure that you could extract a hidden signal using the "R-L" difference between left and right channels. That meant a finite number of channels.
So what to do? Well, you have to generate discrete virtual sound sources in space, each one should differ from the other one, even if by only an arbitrary parameter. Like having a finite number of speakers floating in your room. How many do you want? 4, 6, 8 virtual speakers? Because that is what you will have to do. Trying to do something more complicated would require that you encode the exact location in a way that a digital computer could separate that specific sound instrument from a sound track and manipulate its sound accordingly, for each and every single instrument. Only NASA's Convolvotron can do that.
But this makes my task much easier now. I can take that HRTF and use Excel to come up with a quadratic equation or set of quadratics to design a frequency profile for each virtual speaker around your head. I I have 2 real and 4 virtual channels, I need to build 4 frequency equalizers, basically. That is very straight forward. What is more nebulous, is how you will separate the 2 channel music into 6 different channels.
One approach is to leave front right and front left channels untouched (I'd prefer that) as opposed to using a center channel, and maybe add a virtual center channel (which I feel might not be necessary), then try to have "exaggerated" or weighted R-L signals. So say for example, immediately 45 degrees clockwise from the Front Right (FR) channel, you develop a "2R-L" signal that exaggerates the right side of the stereo music by a factor of two. Then another 45 degrees clockwise and you have a virtual channel that puts out a "3R-L" signal that exaggerates the right side even more, relative to the left side. But these are not real channels, bit only synthesized virtual channels. The frequency content of your music will be affected, inevitably - this is not for audiophiles, but for people who want to hear a special effect. This by definition is a stereo expander, and I suspect that was exactly what Sony did in the 3- and 4 speaker "Matrix Stereo" portables. The difference is that the channels are not going to real speakers, but rather they're being filtered through the HRTF circuit ("equalizers") and being pumped through the front stereo speakers.
The main hidden secret I see in this method is the manipulation of phase and time of arrival. In theory, the HRTF already has the right phase changes "encoded" as part of the frequency response (Bode Plot of HRTF), but I need to make sure that the phase distribution of the signals actually makes any sense when I try to reproduce the HRTF. There are very rough approximations to phase and time of arrival differences bewteen right and left ears that were measured in laboratories throughout the 20th C. I need to make sure that the circuits I build are actually obeying those basic rules, and not just the HRTF data, and if missing phase and time delay cues, I need to synthesize phase and time delays wherever needed. That will probably be the most difficult part, but I won't find out until I do the Bode plots.
A proposed 8- discrete channel Virtual Speaker System for Project PSSSIII
L is the Left stereo signal. R is the Right stereo signal.
H
n is an operator that filters the signal with the Head Related Transfer function for:
the real front speakers if n=0, and the n
th location of the virtual speaker if n>0
d is an additional time delay as a function of center front location to ensure a precedence effect for the virtual speakers

But does using the "R-L" signal even make sense now? We're not decoding anything at all! It's original purpose was to extract differences between right and left channels and use that to amplify the sound separation of sound sources, taking advantage of previously manipulated sound in phase and volume to extract create a new channel. In the event L=R all the other channels except the front go to zero, which is very dramatic.
I've been thinking that another approach would be to exclusively rely on the stereo expander "nR+L" method. The logic is as follows: towards the center of the room there is a lot of similarity between the R and the L signal. The sound clues are in fact identical according to the HRFT. So for all practical purposes the front of the room is an "R+L" signal. As the angle between the sound source and the front of your face increases, the signal difference in terms of intensity and phase increases (largely as a function of the sine of the angle, BTW), until you have maximum separation at 90 degrees or close to that.
If one is to "wrap" stereo around ones head, it stands to reason that the Left signal would be strongest and more "pure" next to your left ear and same for the right ear. The signal should be "less pure" in front of you. So I'd propose to stop using difference signals to generate the virtual speakers, and just rely on exaggerating the left signals by varying their relative intensity only using the "nL+R" approach (equivalently "L+R/n," as such:
Another proposed 8- discrete channel Virtual Speaker System for Project PSSSIII
L is the Left stereo signal. R is the Right stereo signal.
H
n is an operator that filters the signal with the Head Related Transfer function for:
the real front speakers if n=0, and the n
th location of the virtual speaker if n>0
d is an additional time delay as a function of center front location to ensure a precedence effect for the virtual speakers

Each method has its pros and cons. Both of them are assigning weights toward one channel to create a new channel, gradually putting more weight on one side as the stereo signal leans toward that side; however the difference method cancels a signal with equal right and left content everywhere except for the front, whereas the sum method does not at all, which means that the sum method gives you a center signal that goes to the center of the room, not the front. The stereo separation could be much lower with the sum method. On the other hand the sum method avoids potential sound cancellation problems by not adding phase differences too abruptly as the difference method does. A signal with a right sided content only cannot interfere with a left signal that doesn't exist, so the potential for odd "echoes," common in Dolby surround is eliminated. The sum method also preserves the frequency content of the program by not lowering the volume of the centered sound sources. a common problem in SQ Quadraphonic. The delay operator would be forced on the two front speakers. To increase sound separation a sharp weight would be used for the two real front channels. The two front channels still should sound like stereo. Maybe use a golden ratio rule or geometric rule for the distribution of weights of all the channels to allow a perfect net energy conservation across the left and right side content.
One added perk of the discrete virtual speaker system (DVSS?) I outline is that you can always install jacks that tap into the front end of the "R-L" circuits and actually if you have extra amplifiers pump the same stereo expander channels into real speakers (minus HRTF and phase change elements) to compare the performance of the virtual speaker to the real speaker! That would be something!
You could balance the sound a bit by introducing a center channel (R+L) and introduce phase changes - I suspect- as Sony did by delaying the center channel. That would take advantage of something called the "Precedence Effect," where the brain perceives a sound which arrives early to one ear as being the original source of the sound, creating the illusion of sound location. The most advance quadraphonic systems, and the Sony Matrix most likely used this phenomenon. This could be a reason *I may have* to use a center channel - I will try to find out if I need the center channel or not, but the stereo separation between speakers is very low already, and that would reduce it even more.
As a last remark, the system, being pumped through two speakers, would be fully compatible with headphones. I see no reason why the DVSS system can't be used directly in headphones. In theory that is in fact the preferred method of reproduction for the HRTF filtered music!
All in all, this would be an exceptionally flexible project, and potentially complicated as well. The question is how complicated I want to make it. The complexity is determined by how many "humps" you are using to approximate the HRTF for a given virtual speaker, and obviously by the number of virtual speakers. This means potentially an awful lot of Op Amp circuits, depending. So I need to get creative to simplify the HRTF and work my way from there.