Opinion

Developments in generative AI models

Microsoft-owned OpenAI’s ChatGPT 4o is an advanced iteration of ChatGPT 4. This version is available for free to a certain extent; some users can try it for free, while others cannot, without the need for a monthly subscription, unlike ChatGPT 4 which comes with various smart models such as those for generating images and videos.

We have previously discussed ChatGPT 4 and compared it with ChatGPT 3.5 and Google’s Bard, which has been renamed Gemini. The comparison was based on several factors, including the structure of each model in terms of the amount of data used in training and parameters, and the capabilities and efficiency of each model.

ChatGPT 4o has created a digital stir; awakening the digital community and alarming some of its members. This opens the door wide for competition among digital companies. We are on the brink of seeing rapid advancements in other AI models, particularly generative ones like Gemini. Google is expected to announce updates to its upcoming version to take on its rival OpenAI. In this article, we will explore the updates of this advanced generative model, understand some of its features that distinguish it from ChatGPT 4 and introduce its rapid advancements that contribute to changing our lifestyle and the nature of our social interactions.

ChatGPT 4o is an improved version of ChatGPT 4, with enhancements focused on greater capabilities in natural language processing. The reasons for this superiority lie in the doubling of the amount of data used in training the new model and the increase in parameters. OpenAI has not yet disclosed the size but acknowledged that the quantity has increased, explaining the model’s superiority.

ChatGPT 4o excels in high accuracy in language processing and understanding complex contexts compared to the previous version’s capabilities. It can generate clear and precise responses and texts quickly, including in complex topics, allowing for deeper discussions with more sophisticated language and analysis than ChatGPT 4, making its language more akin to human style. Additionally, the new model can generate long textual contexts without disruptive breaks, a feature the previous version lacked, leading to inconsistencies in the information contained within the text. One of the most significant features giving the new model exceptional superiority is its ability to interact via text, voice and image. Although these features were present in the previous model, its voice and visual interactions were less human-like compared to the latest version. I found ChatGPT 4o’s voice interaction smart and fast.

Users can set a specific tone for the model’s voice, language and dialect, including Arabic dialects like Omani and Egyptian, which I tested during the interaction. You can directly take a photo for the model or upload a saved image and ask it to describe and identify it. Its interaction is high, precise and fast compared to the previous model, which often lacked accurate and efficient description capabilities.

In case of video clips — which I have not yet tested due to its unavailability — ChatGPT 4o has the ability to act as a special tutor for a student, helping solve math problems and assisting a blind person with a precise description of the external environment.

Regarding security and privacy, the new model boasts better data handling capabilities, and reduces information leakage risks and cyber attacks thanks to updates in the model’s algorithms.