Monday, September 20, 2010

Piano synthesis

I've bought a digital piano. Its most important feature is head phones (not included). No more neighbour complaints.

Because of this, I've also been spending some time, actually too much time, on looking into piano sound synthesis recently.

The problem is that commercially available synthesizers built into digital pianos don't sound very good. Actually, for people who don't play the piano, they're probably fine. But that doesn't help me!

So I set out to find out what else is out there. Basically, there are two approaches.

Play back
One approach is that you sit down in front of a real piano with a couple of microphones and record the sound from all of the keys several times.

You can't just record one note and shift its frequency because the character of the tones change dramatically from the low end to the high end.

Furthermore, you can't just record each key once because when you press a key harder, the sound is not just louder but also changes character. For a pianist this change of character is very important. I don't know whether this is true for others, but at least for me my brain quickly adjusts itself to changes in loudness so without a change of character, the result is a flat sound inside my head.

So the end result is that you need a lot of recordings. 88 notes times at least 15 hardness levels and the decay of the low notes can easily last over 30 seconds before they die out. That's a pretty large amount of data to record and process; as I understand it, you need postprocessing to smooth out inconsistencies in the recordings. And still, in my experience, nobody has figured out how simulate a short staccato attack convincingly with a long sample. Not to mention other interesting things you can do with a real piano, once you get it singing.

Mathematical models
Instead of recording a real piano, another approach is synthesizing the sound from some sort of mathematical model, just like ray tracing a modeled 3D scene to generate realistically looking special-effect images in movies.

Now, some models start from the idea of trying to emulate the sound of the piano itself directly, based on observations like the sound being nearly harmonic so if we just add some sine waves at regular intervals, we've almost got a piano sound.

As far as I can tell these all sound really awful.


One of the first pianos. It was before the steel frame was invented. From Wikipedia.

I think the explanation is that generating sound is easy, even easier when you've got a computer to add all the sine waves you like, but getting an interesting sound is really hard. And piano makers have worked on this since at least 1720 using wood and metal, materials with plenty of opportunities for interesting sounds. So a fairly simple model is just not going to cut it, not when it's up against a 400 kg carefully refined wooden steel-string monster.

Another idea is to try to deduce how the various things inside the piano affect each other. If you look closer, the vibration of a string is a periodic push and pull at the ends of the string. If we follow the interactions from a key hitting a hammer that in turn hits the strings, causing them to vibrate, setting the big wooden plate inside the piano, the sound board, into motion, causing the air to vibrate, which is finally perceived as sound in our ears after undergoing reverberations from the room walls; then we can, in theory, reconstruct a perfect sound through the formulas from physics governing the interactions. A similar idea is used to simulate what happens when things collide in games these days, by the way.

Now, in practice, it's of course not so simple.

First, as far as I can tell, nobody has an accurate complete physical model from hammers to strings to soundboard to dampers. So the equations are not there - not yet, although there appears to be a few people working on it.

Second, modeling a whole piano in a naive manner is just not tractable in real-time with commodity computers because you need to simulate the whereabouts of a huge number of points at least 44100 times per second to generate a playable 44.1 kHz sound. Thus it is vital to identify what constitutes the audible important dynamics in the process and spend the time simulating them.

Despite the challenges, these physical models can generate quite convincing sounds. You can buy them, currently either in the form of Pianoteq or the Roland V-Piano. What's nice about them is that you can get them to sing like a real piano because they're actually simulating vibrating strings, not just turning recordings on and off.

Unfortunately, the models aren't perfect yet, and it's certainly audible, generally mostly in the middle registers, the reason being, I think, that at the upper end the sound is actually mostly the thump from the key hitting the bottom of the keybed, and at the lower end it's mostly about generating a really big set of partials so errors here are less noticeable.

Research on physical modeling
When I started looking at synthesizers I was mostly annoyed that neither the company behind Pianoteq or Roland disclose how they do it. Which probably explains why both haven't gotten it quite right yet. So I started googling for research.

It does not take more than a cursory glance to see that synthesizing acoustic instruments of various kinds is an active research topic. As far as I can tell there are two cooperating groups working on piano synthesis these days, at least in the English-speaking part of the world.

One group is at Helsinki University of Technology, some papers published last year, some earlier. Unfortunately, I got the impression that their model is not very convincing yet.

The other group is at Verona University, in collaboration with an Italian company and instrument maker Viscount. It's possible to google up some information about this group, but I think only Bal√°zs Bank who's also been at Helsinki has an up-to-date home page.

I have spent some time working on a test implementation of the Verona model based on the paper describing it, and I'll dedicate the rest of this blog entry to my findings.

Modal piano synthesis
The Verona model is based on modal synthesis where one starts from the equations from physics and from them figure out at which frequencies the strings exhibit standing self-reinforcing waves; which means they last and are audible after several seconds, an eternity in the millisecond world of impacts.

The physical responses at these frequencies or modes or partials are then simulated with a simple second-order model. Second-order means that the simulation on each time step uses the previous two values. So for instance, for one string partial, the displacement at time n is calculated as y[n] = a*y[n-1] + b*y[n-2] plus a contribution from the hammer if it is present. If you're wondering how this is started when n = 1, well, we just assume the string starts at rest so y[n-1] = 0. The calculated displacement is basically the sound.

Hence, once the constants a and b have been chosen, the math is actually pretty simple.

Of course, in reality it is a bit more complex. The model keeps track of the hammer motion, simulates the impact on the string and the resulting forces and motions at the modes which include both transverse modes (up-down or left-right motion) and longitudinal modes (motion back-forth in the length of the string) plus some longitudinal motion stemming from tension changes by the transverse modes. In order to simulate the sound passing through the soundboard (and the room), the motion of the modes are added and convolved with a recorded soundboard impulse response.

The details are in the paper.

Disappointingly, if you implement what's described in the paper and enter some reasonable values of the constants, it doesn't sound like a piano. This is of course with the caveat that I might have made a mistake in my implementation.

However, as far as I can tell, there are at least two important things missing from this model. One is the dampers. All pianos have a felt damper connected to each key, the purpose of which is to silence the strings of the key when it is not pressed down. Pianos produce quite powerful sounds so without some way of silencing notes again, it would be hard to make quick pieces sound good. Surprisingly, there does not appear to be much literature on dampers. The only thing I've found is this paper from last year.

The other thing missing is some feedback loops. From the soundboard back to each string, between the strings, especially between the strings on the same key (most keys on the piano have more than one string), and between the different transverse directions on the same string. These feedback loops cause interesting dynamic effects.

In the paper the soundboard is simply dealt with by setting the decay rates of the modes according to a simple model. This corresponds to the soundboard stealing energy from each mode and not delivering anything back. I did an analysis of the piano samples available from the University of Iowa, the decays of the partials are plotted below together with the simple model fitted to the data.



As you can already see from the plot, this is not a terribly accurate model.

In the following, the spectrograms show at each frequency the level of sound over time, capped at 5 kHz and 7 seconds, I've generated them with sox. First a sampled tone, C3 at fortissimo.



I think the weird vertical bars might be echos from the initial thump. Now, if you use the simple function derived from the decay data, the result synthesized sound looks like this (sound example).



The thing is that the soundboard doesn't really follow the simple model in the way it responds to the strings, e.g. it has modes of it own. As a quick measure, I tried to set the decay to decay rate of the nearest measured mode, then it looks like this (sound example).



I'm not sure what the conclusion is. One attractive feature of the simple model is that only has a couple of parameters so it's easy to change around. I think if one has a more complete model of a soundboard, it might be possible to deduce from that more precisely how it absorbs the sound and set the decay rates accordingly. But still there might be some dynamic effects that aren't taken into account.

A thing one notices in the spectrogram for the sampled tone is the pulsating nature of the partials. They don't just die out exponentially. The pulses come from both the interaction between the different transverse directions on each string and also the interaction between the other strings on the same key.

The paper has a couple of suggestions for dealing with these. Basically it boils down to adding some extra modes that can produce beating. The thing is that if you add a mode close to another, the difference in frequency causes the sound to pulsate because of the interference between the two.

But there's no recipe for how to set the beating.

My conclusion so far is that you can't use a global value for the beats, they need to be specific for each mode. I haven't gotten around to actually measure these values from the samples. As a quick work-around, I'm right now setting a random beat for each partial; this improved the sound considerably. While measuring samples may be one way to fix the issue, I can't help thinking that this is one area where physics could solve the problem much more precise with less hassle if somebody could figure out how.

I haven't yet gotten to the point where this sounds realistic enough to warrant a full synthesizer that can eat MIDI. But I get the impression that Pianoteq is using modal synthesis too, so it's certainly possible. If you are interested in the code I've written so far, feel free to contact me.

If you're interested in how the piano works, a good resource is the Five lectures on the acoustics of the piano.

I'll conclude with a random note. The Verona paper only operates with one string per key, but actually a real piano has more so if you use physical meaningfully parameters and don't take this into account when computing hammer impacts, it's going to be off by quite a lot; for a hammer, there's a difference between hitting one string and three.