Alex Douzet is CEO of Pumpkin, a pet insurance and wellness care provider founded to help ensure pets live their healthiest lives.
In recent weeks, a flurry of artificial intelligence (AI) product announcements has shed light on the true potential of large language models (LLMs). Initially perceived as chatbots, LLMs are now poised to become the core of a groundbreaking mobile operating system. This paradigm shift is fueled by advancements in multimodal AI, which takes data absorbed from a variety of inputs (e.g., images, sound or video) and combines that information to produce a more intelligent output.
I have spent decades developing tools with AI and using them in my businesses, and from my view, AI progress in past years pales in comparison to the explosion in advancements in recent months. It’s moving quickly, and I believe it’s imperative for business leaders, and the people they employ, to keep up with emerging technologies and learn how to adopt them effectively.
Multimodal AI could replace traditional apps.
OpenAI has been at the forefront of the movement to disrupt the current smartphone market. Rather than relying on conventional apps, it’s been reported that the company envisions a device powered by AI. This vision is strengthened by the collaboration between OpenAI’s CEO Sam Altman, former Apple designer Jony Ive and Softbank, the controller of Arm, a chip design company that is the primary provider of smartphone chips.
I believe these developments allow us to reimagine the traditional mobile human-computer interaction model so that we no longer interact with our smartphones via apps. I expect that the input to the phone could be a photo, voice command or video, and the output rendered by the LLM could also take any of these forms or a combination of them. From my view, recent developments such as integrating voice and image capabilities into ChatGPT represent a major leap toward realizing the vision of artificial general intelligence. Google, too, is embracing this trajectory with its AI chatbot, Bard, which now connects to a variety of apps and services.
As an example of how one might use this, imagine you want to meet a colleague for lunch at a specific location. Today, that might require three separate smartphone apps: a calendar to invite your colleague, a restaurant booking platform and Uber to transport you. But from my view, with an LLM as your operating system, you might be able to simply say, “I’d like to meet John Evans tomorrow for lunch on the corner of Spring and Crosby. I need an Uber to get me there by 1 p.m.” The LLM could take your voice input and translate it into the tasks needed to create the desired result without the hassle of multiple app interactions.
LLMs offer greater usage potential than mere chatbots.
Prominent AI developer Andrej Karpathy’s post on X (formerly known as Twitter) underscores the transformation of LLMs, positioning them not as mere chatbots but rather as an “emergence of a whole new computing paradigm.” He suggests LLMs are evolving into the fundamental kernel, which is the program at the core of a new operating system. As the kernel, the LLM orchestrates inputs and outputs across modalities, acting as a code interpreter and even serving as a database for embeddings, among other functions. Karpathy’s insight is uplifting, as he indicates that LLMs could revolutionize how we interact with computers and the world around us in ever more helpful ways.
Your smartphone is fast becoming a tool that perceives and interacts with the world much like humans can. I believe multimodal AI is set to redefine smartphone interactions and could enable natural and intuitive exchanges through text, audio, vision and gestures. This could lead to a more natural and intuitive user experience that helps us perform tasks with our smartphones that are currently difficult or even impossible. From my view, the potential applications of ChatGPT alone as a multimodal smartphone are vast, from real-time language translation to content creation and smart home device control.
LLMs could enhance user experiences across smart devices.
Envisioning LLMs as the core of next-generation operating systems opens exciting opportunities. From my perspective, these systems promise to alter user interfaces and make our smartphones and smart devices more attuned to individual needs. With multimodal inputs, for instance, users could primarily interact through device cameras, offering an enriched perception of their environment through instantaneous identification and translation. LLM-driven systems can amplify personalization by shaping experiences based on unique user preferences. Over time, I think LLMs might even discern a user’s habits and optimize the interface and system functionalities to cater to specific user requirements.
It’s worth noting that there are still challenges to address before a successful and widespread LLM implementation, including ensuring accuracy and reliability in each device’s ability to understand and respond to different inputs. These devices also need to be user-friendly and affordable, with systems in place to ensure that LLMs and the data they gather are used responsibly and ethically.
As the kernel process of a new operating system, LLMs can facilitate untold improvements in our lives: intelligent personal assistants, innovative ways of interaction and more efficient learning and work processes. At my company, our product and experience teams are experimenting with tools that will give our customers and veterinarians more natural, intuitive ways to access expertise and care for pets. As we continue to evolve, I expect to see greater AI-enabled efficiencies across the company; we’re targeting 15% to 30% across engineering, design, content and marketing and up to 50% across different operational functions in 2024.
The race to intimacy has just begun, and the role of LLMs as the core of a new operating system is central to this transformative journey. Moving forward, it’s important that business leaders are not complacent about LLMs, though I also want to express that a key component of shifting to human-computer intimacy will be earning trust. It’s already hard to establish trust with your customers, never mind when large volumes of personal data become involved. There needs to be education, reassurance and ethical practices established. In the meantime, I am excited to observe and adapt as LLMs continue to form new and innovative computing experiences.
Forbes Business Council is the foremost growth and networking organization for business owners and leaders. Do I qualify?
Read the full article here