23 C
Dubai
Wednesday, December 11, 2024
spot_img

World News | If AI image generators are so smart, why are they having trouble writing and counting?

[ad_1]

Streaks of light seen in California. (Photo credit: Video Grab)

Canberra, 5th July (Talk) We were blown away by the ability of generative AI tools like Midjourney, Stable Diffusion, and DALL-E 2 to generate great images in seconds.

Yet despite their achievements, there remains a puzzling gap between what AI image generators can generate and what we can generate.

Read also | US alligator attack: 69-year-old South Carolina woman was attacked and killed by an alligator while walking her dog on Hilton Head Island.

For example, for seemingly simple tasks such as counting objects and generating accurate text, these tools often fail to deliver satisfactory results.

If generative AI has reached unprecedented heights of creative expression, why is it having trouble accomplishing tasks that even elementary school children can do?

Read also | Israel-Palestinian conflict: Gaza militants fire five rockets into southern Israel amid escalation in West Bank clashes.

Exploring root causes can help reveal the complex digital nature of AI and the nuances of its capabilities.

The Limits of AI in Writing

Humans can easily recognize text symbols (such as letters, numbers, and characters) written in a variety of different fonts and scripts. We can also generate text in different contexts and see how context changes meaning.

Current AI image generators lack this inherent understanding. They can’t really understand the meaning of any text symbols.

These generators are built on artificial neural networks trained on large amounts of image data, from which they “learn” associations and make predictions.

Compositions of shapes in training images are associated with various entities. For example, two lines that intersect inward might represent the tip of a pencil or the roof of a house.

But when it comes to text and quantities, associations must be very accurate, because even tiny flaws are obvious. Our brains can ignore slight deviations from the tip of a pencil or the roof, but not so easily when it comes to the way words are written or the number of fingers on our hands.

As far as the text-to-image model is concerned, a text symbol is just a combination of lines and shapes. Because text comes in many different styles—and because letters and numbers are used in seemingly endless permutations—models often don’t learn how to reproduce text effectively.

The main reason for this is insufficient training data. Compared with other tasks, AI image generators need more training data to accurately represent text and quantity.

Tragedy at the hands of AI

Problems also arise when dealing with smaller objects that require intricate detail, such as hands.

In training images, hands are usually small, holding objects, or partially occluded by other elements. It has become challenging for artificial intelligence to associate the word “hand” with an accurate representation of a human hand with five fingers.

As a result, AI-generated hands often look misshapen, have extra or fewer fingers, or are partially covered by objects such as sleeves or purses.

We’re seeing a similar problem with volume. AI models lack a clear understanding of quantities, such as the abstract concept of “four.”

Thus, the image generator can respond to the “four apples” cue by learning from an infinite number of images containing a large number of apples, and return an output of the wrong number.

In other words, the large diversity of associations in the training data affects the accuracy of the output quantities.

Can artificial intelligence write and count?

It’s important to remember that text-to-image and text-to-video conversions are relatively new concepts in artificial intelligence. The current generation platform is the “low-res” version we expect in the future.

As the training process and AI techniques improve, future AI image generators may be more capable of producing accurate visualizations.

It is also worth noting that most publicly accessible AI platforms do not provide the highest level of functionality. Generating accurate text and numbers requires a highly optimized and customized network, so a paid subscription to a more advanced platform may yield better results. (dialogue)

(This is an unedited and auto-generated story from a syndicated news feed, the latest staff may not have modified or edited the body of content)


[ad_2]

Source link

Related Articles

Clarissa Group Launches First 5-Star Property Clarissa Resort & Spa Mukteshwar

The Clarissa Group has announced the grand opening of Clarissa Resort & Spa Mukteshwar in Nainital. As the group’s first 5-star property and the...

Biden Landmark Africa Visit: Strengthening U.S.-Angola Ties for a Prosperous Future

Biden Ignites a New Era in U.S.-Africa Relations with Landmark Angola Visit President Joe Biden’s visit to Angola marks a pivotal moment in U.S.-Africa relations,...

Syrian Rebels Seize Aleppo: Devastating Blow to Assad Regime Amid Escalating Crisis

Syrian Insurgents Deliver Crushing Blow to Assad Government, Aleppo Falls into Turmoi The ongoing conflict in Aleppo has taken a dramatic turn as Syrian insurgents...

EU Strengthens Financial Oversight: New Regulations to Enhance Transparency and Combat Money Laundering

EU is moving towards implementing tighter financial regulations, reflecting a growing emphasis on improving transparency and combatting illicit financial activities. This push for stricter oversight...

PM Modi Strengthens Global Ties: Successful Three-Nation Tour Boosts India’s International Influence.

PM Modi Triumphant Return: Strengthening Global Ties with Successful Three-Nation Tour covering Nigeria, Brazil, and Guyana. Prime Minister Narendra PM Modi has returned to India...

Latest Articles