41 C
Dubai
Sunday, June 22, 2025
spot_img

World News | If AI image generators are so smart, why are they having trouble writing and counting?

[ad_1]

Streaks of light seen in California. (Photo credit: Video Grab)

Canberra, 5th July (Talk) We were blown away by the ability of generative AI tools like Midjourney, Stable Diffusion, and DALL-E 2 to generate great images in seconds.

Yet despite their achievements, there remains a puzzling gap between what AI image generators can generate and what we can generate.

Read also | US alligator attack: 69-year-old South Carolina woman was attacked and killed by an alligator while walking her dog on Hilton Head Island.

For example, for seemingly simple tasks such as counting objects and generating accurate text, these tools often fail to deliver satisfactory results.

If generative AI has reached unprecedented heights of creative expression, why is it having trouble accomplishing tasks that even elementary school children can do?

Read also | Israel-Palestinian conflict: Gaza militants fire five rockets into southern Israel amid escalation in West Bank clashes.

Exploring root causes can help reveal the complex digital nature of AI and the nuances of its capabilities.

The Limits of AI in Writing

Humans can easily recognize text symbols (such as letters, numbers, and characters) written in a variety of different fonts and scripts. We can also generate text in different contexts and see how context changes meaning.

Current AI image generators lack this inherent understanding. They can’t really understand the meaning of any text symbols.

These generators are built on artificial neural networks trained on large amounts of image data, from which they “learn” associations and make predictions.

Compositions of shapes in training images are associated with various entities. For example, two lines that intersect inward might represent the tip of a pencil or the roof of a house.

But when it comes to text and quantities, associations must be very accurate, because even tiny flaws are obvious. Our brains can ignore slight deviations from the tip of a pencil or the roof, but not so easily when it comes to the way words are written or the number of fingers on our hands.

As far as the text-to-image model is concerned, a text symbol is just a combination of lines and shapes. Because text comes in many different styles—and because letters and numbers are used in seemingly endless permutations—models often don’t learn how to reproduce text effectively.

The main reason for this is insufficient training data. Compared with other tasks, AI image generators need more training data to accurately represent text and quantity.

Tragedy at the hands of AI

Problems also arise when dealing with smaller objects that require intricate detail, such as hands.

In training images, hands are usually small, holding objects, or partially occluded by other elements. It has become challenging for artificial intelligence to associate the word “hand” with an accurate representation of a human hand with five fingers.

As a result, AI-generated hands often look misshapen, have extra or fewer fingers, or are partially covered by objects such as sleeves or purses.

We’re seeing a similar problem with volume. AI models lack a clear understanding of quantities, such as the abstract concept of “four.”

Thus, the image generator can respond to the “four apples” cue by learning from an infinite number of images containing a large number of apples, and return an output of the wrong number.

In other words, the large diversity of associations in the training data affects the accuracy of the output quantities.

Can artificial intelligence write and count?

It’s important to remember that text-to-image and text-to-video conversions are relatively new concepts in artificial intelligence. The current generation platform is the “low-res” version we expect in the future.

As the training process and AI techniques improve, future AI image generators may be more capable of producing accurate visualizations.

It is also worth noting that most publicly accessible AI platforms do not provide the highest level of functionality. Generating accurate text and numbers requires a highly optimized and customized network, so a paid subscription to a more advanced platform may yield better results. (dialogue)

(This is an unedited and auto-generated story from a syndicated news feed, the latest staff may not have modified or edited the body of content)


[ad_2]

Source link

Related Articles

Anchana Kota’s work redefines leadership—inviting high achievers to flourish through presence, not pressure. Her story is a blueprint for values-based transformation.

The Unlikely Path to Purpose “I’ve always been curious about what helps people feel truly alive and in sync with their potential,” says Anchana Kota,...

Engineering Impact at Scale: The ByteQuest Story of Talent, Trust, and Technology That Transcends Borders

A Spark from the Source In the ever-evolving world of technology startups, success stories often emerge from the most unexpected places. For Shashank Jain, co-founder...

FIBA Asia Cup 2025 Announced: Saudi Arabia Set to Host Prestigious Basketball Tournament in Jeddah

FIBA Asia Cup basketball tournament, a significant event that will take place in the vibrant city of Jeddah from August 5 to 17, 2025. This...

Donald Trump and Elon Musk’s Bitter Feud Spiraled Out of Control in Just Days

Donald Trump and tech billionaire Elon Musk rapidly deteriorated into a highly public and bitter feud. The conflict began when Musk, who had recently stepped...

Execution Over Inspiration: How Real Leaders Drive Lasting Change

The Humble Beginning Arghya’s professional journey began humbly, not in corporate boardrooms but on the buzzing sales floors of an automobile dealership in 2006. “I wasn’t...

Latest Articles