[ad_1]
Canberra, 5th July (Talk) We were blown away by the ability of generative AI tools like Midjourney, Stable Diffusion, and DALL-E 2 to generate great images in seconds.
Yet despite their achievements, there remains a puzzling gap between what AI image generators can generate and what we can generate.
For example, for seemingly simple tasks such as counting objects and generating accurate text, these tools often fail to deliver satisfactory results.
If generative AI has reached unprecedented heights of creative expression, why is it having trouble accomplishing tasks that even elementary school children can do?
Exploring root causes can help reveal the complex digital nature of AI and the nuances of its capabilities.
The Limits of AI in Writing
Humans can easily recognize text symbols (such as letters, numbers, and characters) written in a variety of different fonts and scripts. We can also generate text in different contexts and see how context changes meaning.
Current AI image generators lack this inherent understanding. They can’t really understand the meaning of any text symbols.
These generators are built on artificial neural networks trained on large amounts of image data, from which they “learn” associations and make predictions.
Compositions of shapes in training images are associated with various entities. For example, two lines that intersect inward might represent the tip of a pencil or the roof of a house.
But when it comes to text and quantities, associations must be very accurate, because even tiny flaws are obvious. Our brains can ignore slight deviations from the tip of a pencil or the roof, but not so easily when it comes to the way words are written or the number of fingers on our hands.
As far as the text-to-image model is concerned, a text symbol is just a combination of lines and shapes. Because text comes in many different styles—and because letters and numbers are used in seemingly endless permutations—models often don’t learn how to reproduce text effectively.
The main reason for this is insufficient training data. Compared with other tasks, AI image generators need more training data to accurately represent text and quantity.
Tragedy at the hands of AI
Problems also arise when dealing with smaller objects that require intricate detail, such as hands.
In training images, hands are usually small, holding objects, or partially occluded by other elements. It has become challenging for artificial intelligence to associate the word “hand” with an accurate representation of a human hand with five fingers.
As a result, AI-generated hands often look misshapen, have extra or fewer fingers, or are partially covered by objects such as sleeves or purses.
We’re seeing a similar problem with volume. AI models lack a clear understanding of quantities, such as the abstract concept of “four.”
Thus, the image generator can respond to the “four apples” cue by learning from an infinite number of images containing a large number of apples, and return an output of the wrong number.
In other words, the large diversity of associations in the training data affects the accuracy of the output quantities.
Can artificial intelligence write and count?
It’s important to remember that text-to-image and text-to-video conversions are relatively new concepts in artificial intelligence. The current generation platform is the “low-res” version we expect in the future.
As the training process and AI techniques improve, future AI image generators may be more capable of producing accurate visualizations.
It is also worth noting that most publicly accessible AI platforms do not provide the highest level of functionality. Generating accurate text and numbers requires a highly optimized and customized network, so a paid subscription to a more advanced platform may yield better results. (dialogue)
(This is an unedited and auto-generated story from a syndicated news feed, the latest staff may not have modified or edited the body of content)
share now
[ad_2]
Source link