23 C
Dubai
Sunday, December 22, 2024
spot_img

World News | If AI image generators are so smart, why are they having trouble writing and counting?

[ad_1]

Streaks of light seen in California. (Photo credit: Video Grab)

Canberra, 5th July (Talk) We were blown away by the ability of generative AI tools like Midjourney, Stable Diffusion, and DALL-E 2 to generate great images in seconds.

Yet despite their achievements, there remains a puzzling gap between what AI image generators can generate and what we can generate.

Read also | US alligator attack: 69-year-old South Carolina woman was attacked and killed by an alligator while walking her dog on Hilton Head Island.

For example, for seemingly simple tasks such as counting objects and generating accurate text, these tools often fail to deliver satisfactory results.

If generative AI has reached unprecedented heights of creative expression, why is it having trouble accomplishing tasks that even elementary school children can do?

Read also | Israel-Palestinian conflict: Gaza militants fire five rockets into southern Israel amid escalation in West Bank clashes.

Exploring root causes can help reveal the complex digital nature of AI and the nuances of its capabilities.

The Limits of AI in Writing

Humans can easily recognize text symbols (such as letters, numbers, and characters) written in a variety of different fonts and scripts. We can also generate text in different contexts and see how context changes meaning.

Current AI image generators lack this inherent understanding. They can’t really understand the meaning of any text symbols.

These generators are built on artificial neural networks trained on large amounts of image data, from which they “learn” associations and make predictions.

Compositions of shapes in training images are associated with various entities. For example, two lines that intersect inward might represent the tip of a pencil or the roof of a house.

But when it comes to text and quantities, associations must be very accurate, because even tiny flaws are obvious. Our brains can ignore slight deviations from the tip of a pencil or the roof, but not so easily when it comes to the way words are written or the number of fingers on our hands.

As far as the text-to-image model is concerned, a text symbol is just a combination of lines and shapes. Because text comes in many different styles—and because letters and numbers are used in seemingly endless permutations—models often don’t learn how to reproduce text effectively.

The main reason for this is insufficient training data. Compared with other tasks, AI image generators need more training data to accurately represent text and quantity.

Tragedy at the hands of AI

Problems also arise when dealing with smaller objects that require intricate detail, such as hands.

In training images, hands are usually small, holding objects, or partially occluded by other elements. It has become challenging for artificial intelligence to associate the word “hand” with an accurate representation of a human hand with five fingers.

As a result, AI-generated hands often look misshapen, have extra or fewer fingers, or are partially covered by objects such as sleeves or purses.

We’re seeing a similar problem with volume. AI models lack a clear understanding of quantities, such as the abstract concept of “four.”

Thus, the image generator can respond to the “four apples” cue by learning from an infinite number of images containing a large number of apples, and return an output of the wrong number.

In other words, the large diversity of associations in the training data affects the accuracy of the output quantities.

Can artificial intelligence write and count?

It’s important to remember that text-to-image and text-to-video conversions are relatively new concepts in artificial intelligence. The current generation platform is the “low-res” version we expect in the future.

As the training process and AI techniques improve, future AI image generators may be more capable of producing accurate visualizations.

It is also worth noting that most publicly accessible AI platforms do not provide the highest level of functionality. Generating accurate text and numbers requires a highly optimized and customized network, so a paid subscription to a more advanced platform may yield better results. (dialogue)

(This is an unedited and auto-generated story from a syndicated news feed, the latest staff may not have modified or edited the body of content)


[ad_2]

Source link

Related Articles

Bitcoin Mystery Unveiled: Craig Wright’s Conviction Highlights Integrity in Crypto World

Bitcoin Mystery Unveiled: Craig Wright's Legal Setback Marks a Victory for Truth and Accountability Craig Wright, a computer scientist who has long claimed to be...

Nissan and Honda Unite: Forging a $52 Billion Automotive Powerhouse for a Bold Future

Nissan and Honda have announced a strategic alliance, combining their strengths to form a $52 billion automotive powerhouse. The move marks a significant step toward...

Vanuatu Shattered: Deadly Earthquake Triggers State of Emergency and Desperate Rescue Efforts

Vanuatu Faces Crisis: Devastating 7.3 Magnitude Earthquake Claims Lives and Shatters Communities The Pacific island nation of Vanuatu is reeling in the aftermath of a...

UK-India Business Boom: A Powerful Surge in Trade Activity

UK-India Business Boom: A Powerful Surge in Trade Activity Business activity between the UK and India has surged significantly over the past nine months, marking...

Dollar Dominance: Experts Highlight U.S. Policy Overreach as Key Challenge to Global Supremacy

Dollar has been the backbone of international trade, investment, and reserve holdings. The global financial landscape has long been anchored by the U.S. dollar, a...

Latest Articles