"As part of the calibration, the speed of sound is also a parameter which is optimized to obtain the best model of the system, which allows this whole procedure to act as a ridiculously overengineered thermometer."
Reminds me of the electronics adage: "all sensors are temperature sensors, some measure other things as well."
Back in high school, I built (with some parental assistance) an apparatus to measure how quickly the pressure would drop (in a pressurized cylinder) when a very small hole allowed air to leak out.
Turns out, not only can you measure temperature that way, but can extrapolate the graph out to find absolute zero (IIRC my result was out by about 20 kelvin, which I think is pretty damn good for a high-school-garage project).
I love these kind of inadvertent measurements. One of my favorite examples is that a sufficiently accurate IMU can get you relatively accurate longitude measurements from the Coriolis effect.
That's the same principle used by cheap solder stations to regulate the tip temperature without employing a thermal sensor: they measure the heater resistance, presumably during the off state of the PWM signal that drives the heater. In that case the measurement is less accurate than using a real sensor, still good enough for cheap solder stations where a few degrees don't make a big difference.
Interesting. If the voltage across the speaker voice coil can be sampled with enough sensitivity at a fast-enough rate, you have an undocumented microphone.
Would this also be true for electrostatic speakers as well? Though would probably would require greater gain/amplification or, potentially the application of some kind of bias voltage for the capacitive diaphragm of the speaker.
Just speculation based on the shared operating principal with condenser microphones
I think the most you can tell from an IMU or gyro is that there is a change in velocity in a direction aligning with East-West when there is a change in location and that the change in velocity is greater when the location changes in line with North-South. The change in velocity would be greater as one approaches the poles and lesser at the equator.
Thought experiment: if I zeroed my IMU at the North pole and traveled in a straight line away from the pole along longitude zero, following the guidance of the IMU. By the time I got to 45° latitude I’d be traveling Westward at 1,180 kph (.95 Mach) to keep the IMU at zero.
The flat earther used a fibre optic gyro. You don't "zero" it, it continuously outputs a measurement of its own angular rate around it's sensitive axis. For a 3-axis gyro placed still on earth, it will read about 15 degree/hour around wherever the axis of earth is oriented.
The earth’s surface closer to the poles has less distance to travel for any rotation than the surface closer to the equator. As a result the inertial navigation systems of long distance systems must be adjusted. Iirc, this is also the case for artillery firing computations.
Coriolis corrections are thrown into sniper ballistic calculations, too. Not a huge effect in most conditions, but not zero, and there have been a lot of long shots in the past two decades.
I believe this is one of the initial steps an aircraft INS uses to find north while it is aligning, but it's been too long since I had aircraft systems theory in the front of my brain.
Yes, from earth rotation the INS could figure out true north if the latitude is known. Or figure out the latitude if current heading is known. But normally it's aligned with a starting position from pilot input or GPS.
If you are at an airport you will sometimes notice large signs giving Longitude and Latitude of the individual stands at an airport. These are used to give the initial position to the INS via the FMS. Of course these are now all built into the database these days so are only used (if at all) for gross error checking.
Similarly, diesel engines come with a reserve fuel supply that you can accidentally use once. (diesel engines will happily run on engine oil when warm)
You don't have to try hard. Just use it as a photodiode and it magically works. However, if it's inside a plastic case that blocks light, it doesn't.
Due to some law about entropy, efficient processes are necessarily reversible. That's why electric motors - some of the most efficient machines ever invented - are also generators.
> However, if it's inside a plastic case that blocks light, it doesn't.
You want an ordinary diode to allow current to flow easily when it senses light? Simple: shine a powerful laser at the plastic-encased diode and it will melt the plastic and liquify the metal, fusing it together and allowing current to flow again. See? You just needed to try harder.
I have a Temporarily light-emitting harddrive cable. Really old 40 MB hdd connected to an old computer with a cheap power supply that most likely couldn't handle the slightly lower than standard power in a friends house.
a colleague of mine spent months analysing fluctuations in narrow band signal from a geophone only for a more senior colleague to get fed up with it and demonstrated that actually the fluctuations simply correlate with the air temperature and do so within the spec sheets reported temperature tolerance.
I first encountered it in Elecia White's book Making Embedded Systems, but the attribution is anonymous and whom it's attributed to may have heard it elsewhere.
A lot of people like myself consider heat a form of light but I guess a photographer would be just thinking visible light. They say that about 50% of the sun's light emissions comes in the infrared frequencies.
That seems like a mistake since heat can transfer e.g. via contact without any electromagnetic emission. In fact, that is what I think happens with the sensor also, given that there is an IR filter in front of it.
I’m not sure how the speed of sound could depend on altitude, even in principle. The air doesn’t know where it is!
Putting that aside, in an ideal gas, the speed of sound depends on the composition of the gas and the temperature and, interestingly, does not depend on pressure, and pressure is the main way that the altitude would affect the speed of sound. So measuring the speed of sound in air actually makes for a pretty good thermometer.
"The speed has a weak dependence on frequency and pressure in ordinary air, deviating slightly from ideal behavior."
"The speed of sound is raised by humidity. The difference between 0% and 100% humidity is about 1.5 m/s at standard pressure and temperature, but the size of the humidity effect increases dramatically with temperature."
"Slight" can matter significantly in an application like this.
> the size of the humidity effect increases dramatically with temperature.
This has little do with the behavior of sound. The fraction of the air that consists of water vapor at 100% relative is very small at cool temperatures and increases to 100% at 100 degrees C.
(Yes, water boils at the temperature at which air that is saturated with water vapor is all water vapor.)
Not unless you change the average mass of the molecules.
An ideal gas’ pressure is a function of number of particles per unit volume, its temperature, and nothing else. If you do anything involving adding or removing heat or changing the volume or pressure, you probably also need to know the specific heat at constant volume and the specific heat at constant pressure or, frequency, their ratio. That ratio is called the adiabatic index or the heat capacity ratio, it’s written as gamma, and it’s the last parameter in the speed of sound of an ideal gas. Interestingly, it doesn’t vary all that much between different gasses.
Right, it gets even worse: Air pressure in not only altitude-dependent but fluctuates even at constant altitude. The pressure (altitude) dependence is comparatively weak, though.
By definition, sure. But one always needs some effect which changes some electrical property. We can't just hook up an ADC (analog digital converter) to thin air and hope for the best.
In practice most microphones measure the displacement of microscopic membranes, which are deformed by the air pressure.
The next question then becomes how to measure microscopic movements of a tiny membrane.
Turns out the membrane forms part of a capacitor and the electrical characteristics of capacitors depend on their geometry.
There are at least 4 different types of microphones. Condenser which does in fact form part of a capacitor, dynamic which is effectively a linear generator (coil attached to membrane), ribbon which is a change in resistance as a small ribbon flexes and piezoelectric which is some black magic witg crystals
For me I see a lot more dynamic than condensers but I guess if you are talking about what is in like every single IOT thingamabob then you might be right there.
Fascinating. Is there a book about the history of microphones?
I find this to all be in the realm of "I don't believe you that any of this works at all" if I didn't have a lifetime of experience with the fruits of successfully-functioning microphones.
I once did a project to do multilateration of bats (the flying mammal) using an array of 4 microphones arranged in a big Y shape on the ground. Using the time difference of arrival at the four microphones, we could find the positions of each bat that flew over the array, as well as identify the species. It was used for an environmental study to determine the impact of installing wind turbines. Fun times.
Reminds me of Intellectual Venture's Optical Fence developed to track and kill mosquitoes with short laser pulses.
As a side-effect of the precision needed to spatially locate the mosquitoes, they could detect different wing beat frequencies that allowed target discrimination by sex and species.
This laser mosquito killer is, and always has been, a PR whitewashing campaign for Intellectual Venture's reputation.
This device has never been built, never been purchasable, and it is ALWAYS brought up whenever IV wants to talk about how cool they are.
And I say this as someone who loosely knows and was friends with a few people that worked there. They brought up this same invention when they were talking about their work. They eventually soured on the company, once they saw the actual sausage being made.
IV is a patent troll, shaking down people doing the real work of developing products.
They trot out this invention, and a handful of others, to appear like they are a public benefit. Never mind that most of these inventions don't really exist, have never been manufactured.
They hide the extent of their holdings, they hide the byzantine network of shell companies they use to mask their holdings, and they spend a significant amount of their money lobbying (bribing).
Why do they need to hide all of this?
Look at their front page, prominently featuring the "Autoscope", for fighting malaria. Fighting malaria sounds great, they're the good guys, right?
Now do a bit of web searching to try to find out what the Autoscope is and where it's being used. It's vaporware press release articles going back 8 years.
Look at their "spinouts" page, and try to find any real substance at all on these companies. It is all gossamer, marketing speak with nothing behind it when you actually go looking for it.
Meanwhile, they hold a portfolio of more than 40,000 patents, and they siphon off billions from the real economy.
Part of their "licensing agreement" is that you can't talk badly about them after they shake you down, or else the price goes up.
I did a similar project at 18. Needless to say I didn't have enough HW and SW skills to do much since I implemented the most naive form of the TDOA algorithms as well as the most inefficient way of estimating the time difference through cross correlation. I still learnt a lot and it led me to eventually getting a PhD in SAR systems, which are actually beamformers using the movement of the platform instead of an array
What were the results of your study? I’ve heard that bat lungs are so sensitive that when they fly across the pressure differential of large turbines their capillaries basically explode
I would love to do something like that to track the bats in my garden, how feasible would it be for an amateur to do as a personal project?
Any good references on where to start.
Honestly, that sounds like amazing work. I wish I could afford to get out of enterprise software engineering and just do academic software development like that.
Then you can take Jetson (or any I2S capable hardware with DSP or GPU on it) and chain 16 microphones per I2S port. It would seem a lot easier to assemble and probgam, if comared to FPGA setup.
(OP here) tverbeure hit most of the main points, but mostly cost ($2/mic vs $0.5/mic adds up when there are 192 microphones), difficulty of finding things with enough i2s interfaces (even with 16 way daisy chaining, thats still more than most/all things will have). The FPGA/custom hardware was part of the fun as well!
Yeah, I've also had difficulty finding something with enough I2S. It was a while back and I've used Sprocket carrier for Jetson TX2 - it had 6 lanes, so up to 96. It was for a SODAR application, so the sampling frequency was not that critical and to me it felt like the perfect trick to make an array with off-the-shelf hardware. So I was just curious, if this was something you've considered.
For something indoors, yes, I can see how low sampling frequency gets very limiting. And 192 microphones, that's really pushing it. Love it.
The $2/mic vs $0.5/mic argument is a fun one. You've obviously poured enormous amount of engineering in there, involving PCB design, FPGA and network programming, writing custom CUDA kernels, signal processing, PyTorch, the list goes on. And you've had 4090 plugged in your PC in 2023. Classic hobbit in a mithril vest ;)
I've considered making a phased array myself, but never got around to sending out the PCB. But here are two reasons by I2S is not the best option:
* I2S requires 3 instead of the 2 pins of PDM. However, in the datasheet that you provided, it shows how you can daisy-chain microphones which is really cool (even if not standard I2S.) So that argument goes away.
* PDM gives you access to way higher sample rates which in turns gives you more flexibility in choosing the delay for a delay-and-sum operation. For example, if the PDM clock is 2MHz, you could theoretically delay with a precision of 0.5us. In practice, you'll do that with lower precision, but with I2S, the clock will typically max out at 192kHz.
Not OP, but I looked in to this a few years ago. It was more expensive then, and only went to 20 kHz. Higher frequencies are helpful if you're listening for the hiss of leaking gas, or corona discharge of an electric arc.
The Orin has 6xI2S ports internally, so that would work up to 16*6 = 96 microphones, which is a good number. But it looks like maybe only 3 are brought out & on different dev board connectors [1]? As with a lot of design, the devil is in the details. An FPGA could be easier to configure if you need more than 96 microphones.
Look up acoustic cameras on YouTube, there are some pretty impressive demonstrations of their capability. This is one of the companies I've been watching for a while, but it looks like FLIR and some other big names are getting into it: https://www.youtube.com/@gfaitechgmbh
The one use case that is both creepy and interesting to me is recording a public space and then after the fact 'zooming in' to conversations between individuals.
I am very interested in how small these arrays can be. From talking with a friend with cochlear implants, I would assume this could help dramatically with the right signal processing to help him hear.
I would love to see this come to our various mobile devices in a nicely packaged form. I think part of what is holding back assistants, universal-translators, etc, is poor audio. Both reducing noise and being able to detect direction has a huge potential to help (I want to live-translate a group conversation around a dining table, for example).
Firstly it would be great if my phone + headphones could combine the microphones to this end. But what if all phones in the immediate vicinity could cooperate to provide high quality directional audio? (Assuming privacy issues could be addressed).
For the hard of hearing like me the killer application would be live transcription in a noisy setting like a meetup or party, with source separation and grouping of speech from different speakers. Could be life-changing.
(Android's Live Transcribe is very good now but doesn't even try to separate which words are from different speakers.)
* Automatic speech recognition (ASR) systems have progressed to the point where humans can interact with computing devices using speech. However, the distance between a device and the speaker will cause a loss in speech quality and therefore impact the effectiveness of ASR performance. As such, there is a greater need to have reliable voice capture for far-field speech recognition. The launch of Amazon Echo devices prompted the use of far-field ASR in the consumer electronics space, as it allows its users to interact with the device from several meters away by using microphone array processing techniques.*
This is known as the Cocktail Party Problem. It turns out or brains do an incredible amount of processing to allow us to understand a person talking to us in a noisy room.
In general the position of the microphones in space must be known precisely for the phase shifting math to be done well, and also the clocks on the phones would need to be in sync at high precision like 10x the highest frequency sound you're picking up. In other words within 10s of thousands of a second. Also if the array mic locations is not a simple straight line, circle, or other simple geometry the computer code (ie. math) to milk out an improved signal becomes very difficult.
10ms? That's a very long time. Phone clocks are much more accurate than that because they're synced to the atomic clocks in cell towers and GPS satellites.
Hell even NTP can do 1ms over the internet. AFAIK the only modern devices with >10ms inaccurate clocks by default are Windows desktops. I complained about that before because it screwed up my one-way latency measurements: https://github.com/microsoft/WSL/issues/6310
What I meant by that millisecond order of magnitude was that the clocks on the phones would need to be highly synchronized, with each other, to high precision, which would require pre-planning and special efforts.
In 10ms sound can travel about 3 meters, which is on the order of magnitude of a room, and represents the range of time offsets we're talking about. This has nothing to do with the actual frequencies of the sound itself, or the rate of PCM-type sampling you need to record quality sound. That's a different issue, and doesn't have to do with synchronization of different devices.
Regarding the math: A circular array is better than a grid (or random placement) because there's only one single math formula that's used to compare any mic to any other mic. With a grid array the number of unique formulas involved goes up as the square of the size of the array. And the mics at the 'center' of a grid are basically worthless, and offer no added value.
Armchair comment. I would LOVE to be a grad student again and try to pair it with ultrasound speaker arrays, for medical applications. Essentially a super HIFU (High-Intensity Focused Ultrasound) with live feedback. https://en.wikipedia.org/wiki/Focused_ultrasound
I do my PhD in in-air ultrasound with phased arrays and talk to the medical guys at conferences/labs that we talk to and it's soooo much harder in solids/liquids. The frequency is significantly higher, think 1-10MHz instead of like 40khz, so any normal electronics are out the window.
Hey saw your message a while back in a thread talking about continuous glucose meters and feeling tired and fatigued etc.
Mind contacting me? I'd love to chat. My email is in my profile
Boeing ginned up a spherical version of these and used it on 787 prototypes to identify candidates for sound deadening material.
Apparently in loud situations like airplanes, audio illusions can make a sound appear to come from a different spot than it really is. And when you have a weight budget for sound dampening material it matters if you hit the 80/20 sweet spot or not.
If somebody wants to play around with Zynq 7010's - have a look at the EBAZ4205 board. They can be bought from Aliexpress (20-30€). These are former Bitcoin Mining controllers.
Some people reverse engineered the entire thing. It can be found in GitHub. And there's an adapter plate available for getting to the GPIOs.
For a less complex entry there are also Chinese FPGAs ("Sipeed" boards which use a GoWin FPGA. They are quite capable and the IDE is free.
OP here, cool to see so many people are interested in this project! Happy to answer any questions (and I'll go around to reply to any questions already here)
I'm a bit surprised by those long "arm" PCBs. They are already doing calibration to account for some relatively large offsets: why not place each sensor on its own PCB, mount them to some carrier structure, and let calibration deal with the rest?
Huh, you're right. I expected 24-inch-long PCBs to be quite a bit more expensive, but even 4-layer boards at those sizes are still available at discount prices. I guess such thin boards could be used to fill in edges of mixed-order panels? It does make me wonder why they say "the array" was $700. Maybe assembly was extremely expensive
It doesn't seem they weren't really able to benefit from it all that much, though: half of them arrived defective, and they had to do quite a lot of debugging to fix them.
(OP here) the $700 was for 50 arm boards and 5 hub boards, fully assembled and shipped including all the parts (enough for 2 full arrays, with some spares). $350 @ qty 2 is pretty good, considering just the microphones is ~$100 for each array!
Unfortunately the assembly/DFM didn't work out well, but with some better design and foresight it should be much less work/wiring compared to wiring them manually.
I was just doing research and landed on this exact page last night! I was wondering if anyone knows how someone could mic a room and record audio from only a specific area. For my use case I want to record a couch so I can watch TV with my friends online and remove their speech + show noise from the audio. Setting up some array of mics and using them for beam steering would probably work but there's not a lot of examples I could find on GitHub with code that works in real time.
From the article "The simplest method of beamforming is delay-and-sum (DAS)". Measure distance from a point (couch) to each microphone, delay the signal in time domain by the time the sound takes to travel from point (couch) to microphone, and add up the signals. Pretty trivial. Basically you want the microphones receive the couch signal at the same time, even though they are different distances away.
Make sure there is enough variation in microphone distances for this method to be effective.
Starting to see more & more of this with drones. In some cases, it's for military to detect drones nearby. In others, it's being used by drone delivery companies to detect other planes in the sky in a way that is cheaper, works in low-visibility, and doesn't use the same power requirements as radar.
A similar technique is very popular in industrial automation to spot leaks in compressed air pipes and their connections from far away. These leaks are extremely loud in the ultrasonic range. It's overlayed with a camera picture.
I've always wanted this for videoconferencing room. A microphone array around the screen should be able to dynamically focus on the active talkers and cancel out background noise and echos to get much better sound quality that the muddy crap we usually get.
If there were a speaker array around the screens too, you might be able to localize the audio for each person so that it seems like the sound is coming from where their head is on the screen.
Microsoft Research had papers on speaker arrays that allowed speaker focus and noise cancelling a couple of decades ago. I think the technology eventually ended up in the Kinect.
I think Cisco had something similar in their large screen meeting room video conferencing systems that could do positional audio tracking of multiple people. Could be wrong, but I think that was at least 10 years or so ago, if not more.
I wish could rent one to figure out which device in my office has a squealing capacitor. I can hear it well enough to be driven crazy by it, but not well enough to find it. I start disconnecting things to narrow it down but then convince myself that it's my ears ringing.
I'm unsure if I'll age out of this problem, or if worse hearing will just recreate it at different thresholds.
You might have some luck with a spectrum analyzer app[1]. A fixed-pitch whine should show up as a line on the waterfall graph. If you move the phone around to differently locations, you might see the line getting stronger or weaker. You can also try rotating the phone to different orientations to see if it is coming from a particular direction.
I used this to locate an annoying squeal coming from some equipment at work once. And to confirm that it wasn't imaginary.
At a rough guess from the audio samples, that array is producing an acceptance angle much narrower than any Soundfield mic is capable of. The noise source is only 45 degrees off-axis; I'd say any first-order microphone polar pattern (i.e. those a Soundfield mic is capable of) would capture more of the noise than is demonstrated here.
Of course, you can improve on the rejection of off-axis sound by instead using a microphone with a more specialized polar patten (e.g. a shotgun mic), but then you lose the property of the pattern being steerable merely by signal processing.
Lastly, such an array of dirt cheap pressure sensitive mic capsules with some clever computation behind them strikes me as the sort of thing you could throw Moore's law at, if you could justify the quantity. Whereas, Soundfield mics don't make much sense unless you're working with very precisely machined pressure-gradient capsules.
Still, I get the feeling it'll be a while yet before this technique starts looking viable for audio production work, but it's very interesting.
This is more or less the same principle of how Amazon Echo devices work, but on steroids.
Very neat. I would be surprised if you aren’t seeing some diminished marginal returns from all those extra mics, but I guess you’re trying to capture azimuth deltas that Echo devices don’t really care about.
I wonder how well this would work with laser microphones on a pane of glass. Can you infer keystrokes with near infrared laser? That is, can you identify the heatmap of keystroke events to infer which keyboard they're using, then replay the tape to identify the strings of characters being typed? Can you localize the turning of pages with UV?
This beamforming effect only works well when each sensor is getting a dramatic enough "different angle" on the signal that each one can use phase shifting to cancel out other noise, but with a laser there's not really any noise to cancel out (i mean you're just monitoring a vibrational spot on a window), and you also don't have a far enough "different angle" to shine from, if you're monitoring from one spot.
However having multiple lasers from multiple different locations might be able to create an improved signal if all signals are averaged, but it wouldn't really be due to the phase shifting that's used in beamforming.
Didn't Israeli students show that you can recover audio from the vibrations of bulb filament with a fast photo diode?
I'd test that with a CCD line sensor plus a wide aperture lens and reading it out with 8kHz. Then you have 128 audio pixels that can cover an entire city.
Line of sight might be an issue there. I'm thinking more high-end clandestine eavesdropping. Fun fact: curtains are a pretty good defeat for laser microphones, but if the building is really old and made of solid stone, you can point at the rock instead!
The rock?! That’s incredible. I would have guessed it was too dense to pick up normal speaking volume. Then again, even the window glass vibration seems pretty magical to me.
Because the distance between the mics needs to be 1) large and 2) consistent. It would work with a grid but the mics near the middle would be "underutilized" (not maximally taken advantage of), and also in a grid the mathematics is horrendous, but with a circle it's simple.
(OP here) Primary reason is that you can make a big array with only 2 boards, a small board in the middle and a bunch of long boards around it.
Radial pattern of linear arrays with exponential spacing should also be pretty close to optimal for the distribution of pairwise microphone distances to maximize the gain with a fixed number of microphones.
Could this be combined with a smaller number of high quality mics and then machine learning or something else incorporating them to boost the overall quality while maintaining all the other features?
afaik, it really depends on the spatial structure of the audio field.
think nyquist sampling rates, applied to space, and you can't apply a low-pass filter just because you don't care about higher-order signals. that means that for any given audio environment, there will be some "spatial spectrum" of signal, and you need to sample it densely enough to avoid aliasing.
Reminds me of the electronics adage: "all sensors are temperature sensors, some measure other things as well."
reply