Since I'm probably the foremost expert in the actual math of creating your own sounds from nothing in this community (but maybe not) and no one has actually answered the question:
If you want to work from scratch, you have to turn it sideways. In the same way that standing outside listening to a bunch of stuff--a bird chirping, the leaves rustling, wind, cars, etc.--can be modeled by playing all the sound effects, you can theoretically make any sound in the world by simply adding enough sine waves. We call this additive synthesis, it's mathematically complex if you want to do it for real, but it's also probably the easiest to understand intuitively. Get something that can generate a few hundred sine waves, look up one of the recipes or just try random stuff, and out come cheep 8-bit analog sounds. Libaudioverse can do this in real-time, and Pyo can do it in real-time for smaller quantities of sine waves.
Alternatively, you can start from a more complex basic waveform: a square or a triangle or whatnot. And then you apply lowpass and highpass filters, as well as more exotic things. This is called subtractive synthesis: you're starting with more than you need and pulling stuff out. You can get higher quality results from subtractive synthesis and don't need to know as much math, but the basic idea of it is certainly less intuitive than additive synthesis. Both of the above packages can also do subtractive synthesis, though Libaudioverse is currently more limited; my goal is fast and for games, and so I don't have all of the really advanced stuff from Pyo coded yet.
Both of these techniques are in what is called the frequency domain: instead of viewing the sound as an electrical signal in time, you're viewing it as a sum of sine waves that just sort of theoretically run forever. If you want to work in the time domain, you're either going to plot a mathematical formula (i.e. sine or square wave formulas) or you're going to do physical modelling of some sort. Physical modelling will need you to know some pretty serious math and programming. Think at least calculus. You'll still be well below just recording the sound in the first place in terms of quality, so it's usually not worth it. When it is used, it's not usually used alone; the synths that do it still often integrate recorded samples to make things work and cut down the really insane computing requirements to a marketable level. But it is technically possible, though I'm not aware of any packages that are short of audio-specific programming languages which will let you explore it (and even then, if you have to ask this question you're probably far short of knowing the needed math to do it. We're talking some seriously scary stuff). The basic idea is to work out equations that can accurately model the thing making the sound, and then to use them to pull out samples. The biggest place you'll see it is in the synth for your screen reader: the algorithms behind Eloquence are an application of physical modelling applied to the vocal tract.
What actual sound designers do is get some recordings of whatever. Take a car engine for example, pitch it down, apply some distortion and stuff, and suddenly it's a giant jet engine or what have you. You can do a lot just by getting recordings of animals and pitch bending them. You can do a lot by just recording yourself making weird noises and applying crazy filters in your audio editor of choice. But you can't work without pre-recorded sounds to start with and never really could. Trying to control the signals in the way you describe would be like trying to paint by laying down every single individual paint molecule, one by one by one. All of the alternatives lead to less realistic sounds or the type of stuff you'd find in electronic music, generally require a good deal of programming or fiddling with half-accessible software, and generally require a good deal of math or experience to get even that. If you're really dead set on starting from scratch without getting even basic recordings from someone else, you need a really good microphone, preferably with a flat frequency response; some imagination; and an audio editor.
My BlogTwitter: @ajhicks1992