Finding Alignment by Visualizing Music
This article details why music visualization is the perfect problem to deliver us from a primitive time. It also records what was learned or confirmed while doing the technical planning and assembly of a basic tech demo. We also propose how we will make direct and open contributions to the core innovation we aim to enable.
Context
Positron has broken ground on µTate (Mu Tate), an open source music visualizer, so that we can bootstrap the supply side of our prototype two-dimensional fund-raising solution, PrizeForge. In our last update we introduced the strategic pivot that lead to here.
Dividends Come From Small Things
Many key graphics techniques (and programmers) were born from the demo scene, pushing Atari and similarly meager hardware to do a whole lot it wasn't designed for. If you don't believe in dinosaurs, look it up.
Big is Slow
The conundrum right now for the machine learning industry (as it has become) is the necessity of scale. Training requires a modern ENIAC. The training data is massive. The feedback loops are slow. The runtime requirements are huge. Few people can even play the game, and yet the capital must flow because enough undeniable revenue has already appeared. This is a soft underbelly of the innovator's dilemma, and Open AI and others are already feeling it, fastened to the chatbot that is their fate and prize.
We All Lose
If dividends come from small things, we can reason that a substantial amount of lost progress is accruing for as long as Large Language Models command our attention. Nevertheless, the law of large will remain a chorus joined by Jensen Huang and others for obviously motivated reasons. They at least benefit from the exclusive game while it lasts. So do retail investors. So do the engineers who have one of the lucky seats upstream of all the gift-wrappers attempting to take chatbots to market.
The Big Gambit on Small
AI will become tiny in both megabytes and compute. An open development approach can deliver smaller AI faster and in forms that are better suited for tight integrations with real problems like materials science and custom protein design for space-age medical therapies. The path to success will start with radical innovation on the low-end rather than streamlining the behemoths.
How Low Small Can Go
The lower cost asymptote is symbolic reasoning aka computation. That's a while off. Nearer to the conversation is re-use of weight layers, for which there are already papers if not implementations out there. That is but one trick. If the gaming industry was as immature as AI, every frame would be rendering all maps in a 100GB game in GPU memory without so much as frustum or depth culling for the billions of wasted triangles. There is so much headroom.
The hardware requirements get smaller as natural approaches formal in the pursuit of utility and correctness.
That lowest cost asymptote exists where the model-domain relationships are reversible at long ranges, a behavior exclusively enabled by approaching formal consistency and the perfect truth preservation it enables. The transformations we may apply to terse sentences within a consistent model can be described with simple routines rather than billions of weights that are barely relevant. The meta languages that describe such rules to encode them, making the leap from natural to formal, do not benefit from embedded knowledge of human versus neanderthal synthesis rates for DHA. So much of what Big LLM is required to encode will become but behaviors of future models.
Right Motivating Problem to Drive Success
The right problem will pull eager money into ambitious architectures by making them more successful than competing solutions. Most importantly, it will enable them to succeed when they are still in their primitive forms.
Chat Bots a Terrible Problem For Innovation
Chatbots are absolutely not that problem. Because of emergent behaviors of scale and reliance on massive training data, less-good execution of more sophisticated architecture is where great innovation goes to die. To be minimally viable at chatbots, your product needs to accurately recite the history of the Shang, Zhou, and Qin dynasties. Anything less is barely playable, sucks, 1 of 5 stars. You can do something brilliant and be completely irrelevant.
Hallucination in chatbots is undesirable. In music visualizations hallucinations are a critical customer requirement.
In delightful contrast, music visualization suffers no such high barrier. If your music visualizer's latent space dreams up Barack Obama fast-roping out of a UFO to save us from velociraptors, then both you and the user are winning. Music visualization makes trippy and unexpected results of a sophisticated yet woefully under-tuned model into a valuable asset. A smart model with an inconsistent grip on reality is an advantage here. Hallucination is good. Models that only know how to render cats are cool and unique.
Virtuous Feedback Loops
Tolerating roughness opens up massive virtuous feedback loops. Because of the demand for real-time and continuously updating results, small models with fast execution are a requirement. Being open source pushes us to train and fine-tune on user-obtainable amounts of data. Training small models on small data is naturally faster. The feedback loops are faster. The development is fun.
A Welcoming Market
Adding to this already prompt-critical mass is the potent enthusiast interest of the local LLM enthusiasts in communities such as r/LocalLLaMA. They gather to spend thousands of dollars on beefy hardware, seeking to tinker, seeking privacy, or seeking a firm mastery and to be a participant in the wave. With powerful consumer interests like that, small advanced AI is just waiting for a visible place to put the money in.
Our Contributions to The Sauce
For context, we are actively working on what we call "crowd cognition," more advanced kinds of crowd sourcing in order to "make money smarter" for PrizeForge. That work has entailed filling up many trashcans with doomed designs and a whole lot of inspiration from machine learning and probabilistic modeling obsession going back over a decade. This has paid some dividends or at least restricted stock options.
Wiring The Socket
Our job to get µTate off the ground is to connect an audio metric space through a latent space over to several very different graphics input metric spaces and perhaps integrate the video feedback, yet another metric space, almost like synthetic data, into self-supervised online learning. That's fancy plumbing. It might enable pursuit of auto-formalization or something near it. The point of an open process is to make the socket for other people's models to plug into the manifolds, so we first just need to build the worst things to put into the socket and then allow open development to proceed.
Descent Without Gradient
To frame our intended crimes, let's talk about shader languages. We settled on Slang early since it seems to be capturing interest to unify graphics and non-gaming ML programming. We also found that Slang supports automatic generation of function derivatives. This should clue us in on how prominent ML has already become in gaming but also how devoted the whole industry is to back-propagation training.
Since the worst version of the most sophisticated things will win and because we don't yet have access to ZIRP-style capital from the Bay Area bandwagon (fairly because we have not done enough yet), we have to reduce the training cost to the budget of a middle-school toothpick bridge. Back-propagation, besides requiring all sorts of contortions from what it trains, is too expensive.
Beyond Back-Propagation
Automatic derivatives are neat, but while the negative Nancies of Big Gradient™ will have you believe that H300s and back-propagation are the only way to fly, know that there is another kind of training. Like its cousin, the honest Monte Carlo, a greedy particle method doesn't need to know the gradient. In our hands that were not born yesterday, it also has the potential to be much, much cheaper, sailing across potential wells, refining, exploring, and cross-validating almost all at once. They are trivially parallel as populations. They are easier to code than the derivatives they omit.
In the space we have outlined, this criminal set of techniques comes with a great release from a heavy yoke, freedom from differentiable functions or even caring how to commute disparate derivatives at all. The Jacobian lies of the vanishing and exploding gradients are of no concern. This unleashes the forward pass to do insane things with utter disregard for mathematical morality.
Engineers don't know anything, but physicists can't do anything.
Particle methods have not seen wide embrace. Among their drawbacks, they are dark art. They cannot be scientific. There are good ones and better ones but no correct ones. Let us not overstate this play until its success is self-evident, but we intend to leverage the full cruelty of the true saying that "Engineers don't know anything, but physicists can't do anything." Reality-based rigor can come later, motivated to bring us to justice to which we claim no immunity.
Default Minimally Viable
There's no question that we can build a better Milkdrop. That is a simple matter of updating the tech. Using off-the-shelf machine learning for mapping music to visual inputs wasn't an option in 2001. It is par for the course in 2025.
To glimpse quickly at ProjectM's limitations capabilities, we can look at the preset definitions. A small sample of (not) beautiful preset scripting:
shape_1_per_frame1=a=bass;
shape_1_per_frame2=rad=(bass+mid+treb)/3;
shape_1_per_frame3=x=if(below(bass,1.2),rand(10)*0.1*treb,0.5);
shape_1_per_frame4=y=if(below(bass,1.2),rand(10)*0.1*mid,0.5);
shape_1_per_frame5=
shape_1_per_frame6=r=1-(0.7+0.5*abs(sin(0.05*time+0.5*bass)));
shape_1_per_frame7=g=1-(0.5+0.5*abs(cos(0.05*time-0.4*mid)));
shape_1_per_frame8=b=1-(0.5+0.5*abs(sin(0.05*time+0.6*treb)));
In the full preset source, you will find variable bindings, limited expression support, and embedded shader code. Some bindings, such as bass and treb are provided "for free," but their definitions are holdovers from being a plugin reliant on Winamp's DSP.
The "language" reads about as clearly as assembly. There is a lot that the GPU can do that cannot be expressed with this vocabulary and grammar. It's OpenGL. Its entire design context predated much of the programmability of GPUs. It's C++ in a time when up-and-comers are likely preferring Rust. Even machine-learning heavy Python people are feeling Rusty. We are steering with the wind.
Beat (Non)-Detection
ProjectM has one job, and yet is woefully bad at it. It used to be fun to argue over how sophisticated the beat detection implementation was without looking at the code. In 2025, no amount of wishful thinking can hide from the conclusion that ProjectM has only an extremely rudimentary capability to feel the rhythm. The audio-visual coordination is choppy, unreliable, and has little understanding of instruments or musical patterns.
The Programmer Art Problem
Milkdrop is from the procedural era. It has access to more hardware than Atari demos, but it is not more sophisticated. It draws things in a buffer and re-samples the output buffer as input through a warped texture lookup mapping in order to achieve the trails and distortion fields. It can use raw waveforms as textures and users can embed static textures, although these are rarely used and don't really look that impressive or pleasing.
Worst of all, it is abstract. It is meaningless. It is as cool yet as uncool as zooming in on a Mandelbrot set, a figure with infinite intricacy yet no relation to artistic forms in the physical world. We are visualizing a medium that is often about deep subjective meaning yet giving every song the same tired party trick.
Good First Increments
The vision is complex. We need small steps so that our limited resources will pull in more resources faster than we culminate. The fact is that current generative AIs are simply not ready for being real-time at many tasks unless they are so low quality as to be actual gibberish (not the kind that Internet's talking head AI hate peddlers love to opine about in their ad-economy pandering contests).
Generative Particle Systems
Instead of generating big images, we will generate small ones. Instead of generating complex meshes, we will generate point clouds that approximate visual forms. In game graphics, this is what we call particle systems. Particles are little billboards (camera-facing primitives) that we sample small textures onto. This is feasible. This can be real-time. This can train cheaply, quickly, and locally.
The non-photorealistic outputs we believe are ahead will begin as still abstract forms like Rez or Okami. Rorschach test today. Expressionist art from music tomorrow.
Findings From the First Triangle
Let's go over the promised fact-finding outputs on the technical side. The first thing was we wanted to pick the Vulkan API version, extensions, and strategies that will land us in the sweet spot of productive but not too crazy. Vulkan has evolved a lot. Thanks to Kane Rogers-Wong for steering us onto buffer device address, dynamic rendering, and Slang, which were not at all mature or did not exist the last time we touched Vulkan.
A Lot of Synchronization
The synchronization work is deep and diverse. While PrizeForge's Leptos frontend and Axum backend mostly involves a lot of async / await coding, switching to work on µTate brings us back to good 'ole traditional threaded asynchronous programming, lock-free, and the wonderful world of Vulkan synchronization, which can be CPU-GPU or GPU-GPU.
Talking to Pipewire
Talking to Pipewire requires giving a thread to the library and then talking to that thread via a channel (in the Rust bindings) implemented by writing a byte to a file descriptor. The ownership and thread structure are coupled, and that causes most of the headaches. It really feels like Rust needs a notion of a callback that can only run in the same thread it was created in, something more restrictive like a scoped thread.
CPAL as A Stopgap?
We could integrate the CPAL crate, which seems to have gotten better at providing access to monitors, but still relies on Pulse Audio on Linux. The Pipewire integration we've done may turn out to inform work for other platforms. Since we only do monitoring and not playback, CPAL might actually be a distraction. We'll see.
Audio-Visual Synchronization is Key
We want all of this to be fast, to balance back pressure, to deliver updates at the last possible moment for late binding, and to time video presentation very accurately with graphics. The audio-visual synchronization is why it's an important area of work. The good news is that it all mostly needs to be figured out once on the Vulkan side and once per platform on the audio side.
Ownership Structure
We didn't start with a clear idea of how to structure custom types and ownership, so we slopped out a dumb-rich prototype with messy Options. Some dependencies appeared that are separated by annoying indirection. This reveals which information must be propagated into the execution stack.
Pumping the Lemma
This is again just a case of building quickly and haphazardly to cause obvious problems to jump out instead of trying to predict them. Just pile it up until it leans. The litmus test for "how good" early engineering should be is that it has to be better than a loose pile of rubble that cannot lean and therefore gives no information about structural deficiencies.
Swapchain Dependents
The first example with µTate is the swapchain. While we didn't even implement window re-sizing, we were able to see that many images and buffers must be the same metric size as the swapchain. This means we have to track all of the dependent resources downstream so that we can react to swapchain resizing. Tying that all together begins upstream. If it changes according to window size or number of windows, we have to know about it and communicate the swapchain changes into it.
Memory Management
This will also affect our memory management strategy on the GPU, where out-of-memory won't be paged away for us. We have to upscale to make the application work even if there simply isn't enough memory to do what we want to do. Different AI models will need different amounts in addition to what we use for graphics.
Macross Plus Vibe Compression
Vulkans reputation for requiring a thousand lines to get to a first triangle is rightly given. If we write code like this forever, we will go insane. There is a lot of unwanted coupling across interfaces like push constants, shader inputs, and structures written into those push constants. This is the kind of fickle drudgery we want to automate away.
More Lemma Pumping
When you don't yet know which code will result in lots of duplication, copy-pasting will quickly reveal what doesn't change. Boilerplate can first be handled with vibe coding, completing the boring parts of expressions. Macros are vibe compression. Using the great lessons of Lisp, we will write a limited language that covers only our use case.
Write Macros Early
If you are an engineer who has not written a lot of macros, you may think it's hard. In fact, the early phase of a project is the best time to start looking for macro opportunities. Macros that don't need to do much are the easiest to write. Mature macros that have lots of knobs for generating tricky code are the ones that are hard to re-write. They are usually written just like the rest of the program, starting with simple things that work.
Aside to Those Active in Editor Technology
Macros are a great example where, because we are developing a DSL that is only used in our current program, pre-trained LLM integrations cannot have the correct completions baked into their weights. Correctly writing code that uses macros is completely dependent on having the macros in context until online-learning is available locally. Making macros complete well with Rust Analyzer is itself extra work and so inference based completions have a lot of potential in speeding up development.
Unsafe As Far as The Eye Can See
We agree with Kane that there's little point in trying to make really low level interactions with GPUs safe. The limitations we may impose on the programming language without cost of friction depend on the application, and any safe wrappers will tend to get RAII points wrong for other applications. We're just going to roll with it and say, "be good at C in Rust" while using macros to handle boilerplate, code generation, and coordination of changes across interfaces.
Fast Fourier and Friends
On the surface, work on µTate is a distraction. In reality, it is a requirement, a rite of passage. Platforms require bootstrapping, and that means doing the work of the platform.
Like everything else we use, MuTate is written in Rust. The FFT's re-use of many sub-problems to calculate many related values is both a way to extract audio features and a window to understanding the core of crowd cognition. The AI we need to program and integrate into µTate will become part of our honed code inventory we employ in the solution of other incidental problems. It is all extremely synergistic and will lead the right people to us.
Adventure Awaits
We are building a team. If you think this article's technical aspects are interesting, you are likely an engineer with relevant knowledge and our recruitment strategy is working, so you should consider becoming a co-founder. This is the cool story you were looking for. Don't get ready for an interview. Instead just go contribute on µTate. Make bad code better because we are still about a year away from luxury coding, and getting through the sloppy phase is an unavoidable requirement for everyone who will be early enough to earn the distinction.
We Are Aligned
Bringing µTate out of vaporware has definitely taken things in the right direction. We had intended to be the supply side on programmer tools. For a variety of reasons, the community we started with, the language those tools were programmed in, and the culture of the niche that we are losing alignment with even as users ourselves, that was a wrong choice that we had to only try before moving on.
Moved to The Back Burner
Programmer tools might have actually worked if we put more effort into it, but with billions of potential users and spin-off usages for live shows, desktop environment widgets, games, games, and gaming (a $200bn USD industry), music visualization was so much of a better choice! We can get back to programmer tools later when businesses are spending millions of dollars into those funding streams.
The Opportunity Uncovered
What we found as we tried to dream up applications for modern machine learning to the problem has iced the cake until the fluffy dollops flow over the sides. We will benefit. Users will benefit. Other programmers will benefit. Progress towards serious AI tool kits for solving hard problems will benefit. This is game on.
Gifts of Demeter Await
This is what we call a "fertile problem space." Music visualization is an unbounded problem, one that is forgiving to the innovator and has many potential dividends to deliver to many hands. µTate by itself would be a startup in its own right, and we will develop it entirely in the open.
Our µTate is a Prize for the Forge, the soft red-hot billet upon which we hammer. Every investor we talked to in late September (2025) had some version of the same question: How will you bring users onto your platform? Like so many before us, we will be the supply side and offer our labor, asking only the devotion of a spectator who we entertain through our own trials by fire, building the platform we ask them to use until it is fit for purpose for so many others to come after us.
Properly Funded Open Source
Through PrizeForge, many of our colleagues may receive substantial financial compensation for contributing code. It is the kind of consumer program that open source users have too often been asked to do without. Still yet, the spiteful may decry commercialization of open source as a soiled dove. To the billions of non-programmers that have for decades needed stronger open source counterweights, what words do they say that they did not also say to no good effect in all of those decades lost? Let the masses choose to whom to spend their billions of hard-earned dollars and what development model will deliver the programs critical to their lives, prosperity, and self-determination.
Choice For The Non-Programmers
The non-programming consumer constitutes a numerous class, each of whom have an earned dollar to spend. The business that does not build technology for sale still needs a way to finance its production for their own uses. We pursue the privilege to work on their behalf to deliver the means of coordination and social finance so that this tremendous value may be realized for all by the same programmers who too often in the past were asked to toil for free, unfair as it was, in the name of user freedom while those doing the asking simultaneously bore no interest in providing the non-programmer a lever with which to advance their own freedoms alongside us by exchanging the honest fruits of their labor for ours.
Benedictions For Future Milestones
May the same interstellar winds that guided Jeff Minter in the development of Polybius be with us as we enter into a next phase of bringing crowd cognition out of vaporware. Forgive the helplessness of the ignorant haters whose cynical judgements have no effect on the drum of our audacious keyboards or the willful sign-ups of our appreciated users. The malingering internet citizen only dispenses the same words to which they wrongly bend their own knee. They have never known the wisdom of the white wolf, Amaterasu. To those who would remake upon the internet the same old world in which the talented must ask legacy insiders for permission to play, Kali Ma, Kali Ma, Kali Ma Shakti De!