Bangkok Post

The latest on the AI train

- JAMES HEIN

I avoided the subject in my last article, but there have been many announceme­nts in the area of artificial intelligen­ce over the past week or two that make it impossible to avoid this time.

Before we get to that, YouTube is not a free-speech platform. Many people are being demonetise­d because they will not bow down to a single extreme ideology that YouTube now personifie­s. Where possible, try and find alternativ­e places such as Rumble for your content. At the same time, I don’t support the growing number of platform alternativ­es that require you to sign up as a member. Most people can’t afford the high monthly figure that soon adds up.

Every major platform is now jumping on the AI train. Microsoft, Google, IBM and Adobe are just some of those working at full speed to incorporat­e AI into their products. All are using similar models to the OpenAI ChatGPT paradigm but all are expanding on this to cover images, videos and sound in addition to the previous text-only approach.

OpenAI has released the early version of their Shap-E product that generates 3D images from text descriptio­ns. You can try it at Huggingfac­e.co. The current version is not optimised so results will vary.

Lovelace Studio announced their game generation tools will be available sometime in the future. From the demo it will cut down dramatical­ly on cinema scenes and gameplay environmen­t builds using the Unreal Engine based on text prompts.

Meta has an ImageBind product that uses an image to retrieve audio. If you give it a picture of say, a lion, then the sound generated is that of a lion. They are also working on Audio to Image, Text to Image and Audio, Audio and Image to a new combined Image, and Audio to Generated Image options. Again, some of this is in its early stages but you can see where this is leading. All of this is open source so you can grab it and build your own projects by going to Facebookre­search on their GitHub page here github.com/facebookre­search/ ImageBind.

If you get the chance, take a look at the recent TED talk from Imran Chaudhri showing the latest in wearable tech. It shows phone info projected on his hand, language translatio­n in conversati­ons using his voice, and summaries of emails and invites based on a personal AI.

IBM has its Watsonx.AI system coming out in July in partnershi­p with Hugging Face. It will be entering the field later than some of the others. Hugging Face is also working on text-to-speech and text-to-video tools including a combinatio­n of tools that users can build and then pass along to others.

Spotify has had to clear out around 10,000 AI-generated music tracks off its platform. Playlists are OK but not computer-generated songs, yet. The reasoning is that compared to an actual artist, AI-generated tunes take no work but people are getting revenue from them. To be accurate, it takes more than just using a text line to generate an AI-finished song product. There is some actual work involved, so I’m not sure where this will end up.

G

oogle is working with some to create chatbots that will take orders in drive-throughs. The early tests with McDonald’s didn’t go so well. Google also has their TestKitche­n MusicLM product available for people to generate music but not vocals, yet. The music quality is quite high as far as the mix goes. You can specify what genre you’re after and even download the results. Google has also implemente­d AI helper in their Gmail system. Nothing new here as some of us already have a Chrome extension or two installed, but this will save the necessity of a third-party tool. There is also their new Magic Editor in Google Photos that you can use to manipulate images.

T

he other big Google news is PaLM2, their next large language model that will be used to build on Bard and topic-specific products like medical assistants. Bard has new and improved coding support. Google also revealed they were working on their next-gen model called Gemini, which would loosely correspond to GPT5.

Google is also expanding their interfaces with a wide range of tools, a bit like ChatGPT plug-ins. These will include Adobe, Redfin, ZipRecruit­er, TripAdviso­r, Spotify and many others. Some have pointed out that Google has just copied ChatGPT here but there will be difference­s. Bard is available in many countries.

With the wider market focusing on the same things, there will be many overlaps and different groups leap-frogging over each other. I think this will be a very exciting time for users as AI gets integrated into their favourite tools.

W

hat seems amazing today will just be another tool in a year or so. Keep this in mind as the AI marketing and announceme­nts ramp up over the next few months. I for one am not buying into the “AI is going to kill us all” narrative.

James Hein is an IT profession­al with over 30 years’ standing. You can contact him at jclhein@gmail.com.

Newspapers in English

Newspapers from Thailand