Sound+Image

Warning: this page contains AI content*

-

Last issue I wrote about AI text-to-image, which is such fun: ask for “a hi-fi turntable in the style of a da Vinci codex” and you receive magnificen­t art in return (right), at the risk of enraging everyone remotely artistic. I jested last issue that soon music services will be similarly AI-enabled, and we will ask (by voice, of course, nothing so 20th-century as a keyboard) for “Nick Cave songs in the style of Abba”, or indeed Abba songs in the style of Nick Cave if you like, and presto, Loverman a là Chiquitita. ‘Oh what fun we will have’, I wrote, ‘what law-suits we will see’. What a riposte Mr Red Hand Himself would unleash upon the world if his vocal style and band were mimicked, rather than merely his poetry.**

It was mainly a joke, what I wrote, but a mere two months on, we’re alarmingly close. Indeed all manner of AI has leapt into the news, especially the text generation of ChatGPT and its host of rivals, able to spool out endless articles, scripts, automated search responses and more, all on demand, and clean enough for some websites to have tried flooding the web with AI-generated content, in order to pull traffic and ads and money, while potentiall­y saving on their staff.***

Of course my focus is less on the internet — that never-ending ad-splattered scroll to the bottom. I prefer to play in the design-enhanced spaces of our print and digital issues, by which I mean this thing you’re reading now, our issue #350. Look, you’re on page 6. You can’t imagine an internet page talking like this to you, can you? It would seem uncivil. But this is a magazine: you can touch us; we appreciate you. ****

Anyway, while I’m elbows deep into AI art, I’m moderately confident I won’t (yet) be pasting any bot text into Sound+Image. I keep trying text engines and chatbots to see how they’re going, and basically they tend toward the wildly bland and mildly inaccurate.

And that’s exactly as you might expect from a process which is effectivel­y averaging the internet, boiling it all down, so that it’s likely to generate the text equivalent of the grey mush into which all matter will eventually decay once the nanobots get out of control. (Don’t worry, it could be years away.)

The ‘mildly inaccurate’ bit has thankfully been making headlines — not only in the rarefied segments of Media Watch, but at the grand launch of Google’s ChatGPT rival Bard, where part of the demo response was clearly inaccurate. This apparently surprised a lot of people, which would suggest those people haven’t been using much AI. AI is very often surprising, and very often inaccurate. All this offers great hope to the humble journalist.

So yes, I hesitate to say ‘watch and see’, because next thing you know you’re thrown to the side of the road as the Next Big Thing barrels up the highway, or you wake up with only an arm left in a sea of grey goo. So I’m keeping my eye in.

And my ear too, more recently, because while my AI-making time is still largely expended on the pictures, I’m actively playing with… drum roll, AI sound generation.

Yes, text-to-audio has now begun to surface, though it’s hard to get invites to the trials, and as with all AI, the more popular it gets, the harder it is to do more than a little of it for free.

Text-to-voice is relatively easy, and UberDuck can do Eminem raps and a host of other voices (talking and sometimes kinda singing). Premium tier available. AudioDLM does simple text-to-sound, with suggestion­s like “A hammer is hitting a wooden surface”. We requested “a woodland glade full of fairies” and it returned a 30-second clip of mono low-bit Brian Eno with birdsong over the top — not at all unpleasant, though perhaps you wouldn’t want a whole evening of it.

Harmonai looks promising but goes to the coding level, too hard for me. I scored a Discord invite to the musically-minded Polymuse — no fluffy descriptio­ns here, but the ability to combine styles from an expanding list, then define key and chord progressio­ns, BPM and more. The results (his own samples, as I blew my invite) are musical, diverse, and controllab­le in interestin­g ways.

The designers of all such systems must have been a bit gutted by the arrival of Google Research’s MusicLM, which popped up at the end of January. It is frightenin­gly sophistica­ted, and perfectly happy with abstract instructio­ns — such as art gallery descriptio­ns of paintings, fed into the generator. The music is really interestin­g: low-res, but complex; the engine must be very well-trained. MusicLM’s vocals are all wordless tunes, endless Elizabeth Fraserisms, so that’s an obvious oddity, though some are finding them strangely beautiful.

Currently MusicLM is able to “generate music at 24kHz that remains consistent over several minutes”. Imagine scoring a film — paste in each scene descriptio­n and get four options back in a minute. (Try the Nick Cave & Warren Ellis filter.)

Again, of course, it’s not hard to see a downside: this time for musicians, stock music companies, everyone in lifts and supermarke­ts. You might expect the whole idea to be bought up and shut down by Muzak, or Neil Young, but hey, this is Google, nobody is shutting down Google (except possibly ChatGPT). But Google may be thinking hard about the consequenc­es, because as yet they’re not letting MusicLM loose; it’s demonstrat­ion only for now. But really, have a look, and a listen: tinyurl.com/googlemusi­cLM.

It’s certainly the AI engine I’d most like to be prompting right now.

Next month? Who the freak knows. Cheers!

Jez Ford, Editor

* Relax, just the pictures, not the text. Read on…

** He recently tore into some lyrics that were composed by AI in the style of Nick Cave. Not amused.

*** Disclaimer: Sound+Image feeds into whathifi.com, a competitor to all tech websites, so you might expect me to be opposed here. Thankfully I’ve heard nothing yet about being ousted by a bot, but presumably the first you’d know is when your connection is suddenly c…

**** While this is true, we must confess that part of this comment has now gone online, so it’s not quite as printexclu­sive as originally intended.

 ?? ??
 ?? ??

Newspapers in English

Newspapers from Australia