Art “AI” tools, why they are not AI at all, and why they will never replace artists
Recently, several trending art “AIs” have cropped up — from DALL·E to Midjourney to Stable Diffusion, with both plenty of social media coverage and a fear that they could replace artists for particular kinds of work.
This article is to explain two things:
- These “AI” tools are not AI at all
- They won’t replace artists, not for any purpose, nor in any capacity
I’ve split these two points into separate sections below, but both are very much related to each other.
· These “AI” tools are not AI at all
∘ How do art “AI” tools work?
∘ Training a machine — “machine learning”
· They won’t replace artists, not for any purpose
∘ The problem of context in AI and machine learning
∘ Without context, true “AI” art tools cannot exist
These “AI” tools are not AI at all
Defining what counts as “Artificial Intelligence” is never easy, but when it comes to just about anything that has a public profile on social media, it’s not AI — it’s something much more primitive, conceptually speaking, a form of data analysis rather than intelligence. To understand this better, I will explain what precisely each of those things entails..
AI, by virtue of its name, requires intelligence. The well-known Turing Test is one way of defining what is or is not an AI, but it’s limited by only considering text; how can we test an art AI? Ultimately, all “AI” tools, in any field, can be tested quite simply by the statement below:
Can the AI learn things about the field it operates in, without being instructed by a human about what it needs to learn and where to learn it?
There is not, as yet, any AI which can satisfy this test. All of the things people think of as “AI”, so far, do not have any ability to learn for themselves or to “think”. For example, let’s look at what tools like DALL·E do.
How do art “AI” tools work?
What tools like DALL·E and others do, in a basic form, is to take huge amounts of input data provided by a human (such as a researcher), and perform massive amounts of computations on it to find common features or attributes of those images. For example, it might find that:
- Artwork labelled as “Che Guevara” on Google Search is often predominantly in red, white and black colours
- Swimming pools are more often accompanied by daylight than dark lighting
- Andy Warhol paintings and derivatives are often in many colours and distinctly separated out into regions
The tool has not “learned” this. It has found a correlation in data, that it will now use when computing a new image based on what you ask of it. It has zero ability to learn from this knowledge; the only way it can learn is through training.
Training a machine — “machine learning”
Training, in the context of machine learning, means a human telling the machine what was good and what was bad about the data it captured. For example, if it generates two pictures of a horse — one with the horse’s head missing and one not — the human can tell the machine that one of these pictures is good and one is bad. The machine, given this extra data, can use this to try and avoid making images with the “bad” attribute again — but it has learned nothing on its own. A human has added data to it, nothing more.
With extensive training, a machine can be taught — simply by a brute force process of teaching it rote information — to be quite accurate. An excellent example of this in real-life action, that you probably didn’t know was a machine learning tool, is CAPTCHA.
For users, CAPTCHA simply looks like a way of stopping spambots from accessing websites. In fact, it’s more useful than that: you are training a machine to understand hard-to-read written or printed text.
With millions and millions and millions of users solving CAPTCHAs every day, machines that try to read printed documents automatically (Optical Character Recognition) can make use of this massive amount of data to improve their recognition abilities and digitise physical content and digital documents that aren’t in text format.
Image-based CAPTCHAs are exactly the same; by solving them, you are helping train machines to distinguish objects in images.
The point of going on this tangent about training and machine learning is to highlight that no part of this process is “intelligence” by the machine; every single thing it learns has been directly fed to it by a human. Calling that intelligence would be as crazy as having a teacher drill Einstein’s E = mc² equation into your head by rote learning, then claiming you invented the theory of special relativity. Of course you didn’t — learning the equation didn’t teach you how it works, or why, or anything else. All you can do with this information, without intelligence, is brainlessly apply it to the given variable values.
Neural nets, while sounding fancy and complicated, are simply a complex method of machine learning; a way of structuring and interpreting data, and associations between data, in a way most digestible for a machine. They also do not contain any intelligence.
Art tools claiming to be AI are, at best, examples of this; they do not know anything except what is directly fed to them. This is why they work best with existing types of art, and why they rarely deviate from either looking like a giant mashup of random pictures or a weird photobash of an artist’s work.
They won’t replace artists, not for any purpose
First of all, current “AI” tools are nowhere near complex or good enough to be useful for anything; the best they can currently do is act as a minor inspiration source for human artists, in the sense that they can act like a condensed form of Google Search. Google Search frequently throws up unhelpful images when looking for references — as it isn’t designed for that purpose — but art tools are more streamlined and are better equipped to handle specific requests for reference material.
The second, and far more important point, is that machines do not as yet have any understanding of context.
The problem of context in AI and machine learning
Conceptually speaking, you could argue that humans are not much different from machine learning. We, too, don’t understand anything unless taught — we can interpret the data around us (e.g. that bee stung me, therefore I now know it can do that and should avoid it), but we cannot actively “learn” except through the data we are given or taught. Human decisions can be compared in a very similar manner to machine decisions — we run algorithms on the data we have and come to a decision.
The reason that — for the foreseeable future — humans and machines cannot be called “similar” in practice is because of context. The amount of data required to properly understand context of any given thing is absolutely gargantuan, and far beyond any current machine learning tools.
Memes are a fantastic example. Understanding a meme, and the significance of any element within it, requires understanding its context; let’s look at some examples for fun.
Regular internet users will immediately recognise what this meme is parodying —the endless “one simple trick” adverts claiming to have revolutionised beauty products or whatever else. For a machine, however, this context is lost; it doesn’t understand where the humour in this meme comes from.
It could try to, for example, look for similar existing images, or be fed them directly by a human — but even then, it would only know that images with this pattern exist, not why. It has no ability to understand the reason behind it, the time in which the images were often made, and therefore no ability to make new images of this type that will fit within the context.
This meme is an even better example of context. Many humans who aren’t regulars of the internet or popular culture will not understand this meme at all: it refers to Batman, Captain America, Rick Astley and Thor, but without already knowing those things, you cannot have any idea what this image means.
It wouldn’t be impossible for a machine to scour the internet, looking for things matching these images and guess based on that, but firstly it would likely get some of them wrong due to bad data, and secondly it would still completely fail to understand the context of this meme: that Rick Astley’s section of the image is meant to provoke a very different reaction to the other three, as it does not display a weapon and is intended for humorous effect. The amount of context required to understand this is so vast that even summarising it is hard:
An understanding of the Rickrolling phenomenon, Rick Astley’s foot-tapping in the music video, the iconic nature of Batman’s logo, Captain America’s shield and Thor’s hammer, the fact that the shoe is not in fact a weapon, that the microphone is not a weapon, that all of these are weapons wielded by an owner who is the subject of the meme rather than the weapon itself…… (list goes on forever pretty much)
Those of us who use the internet a lot spent a great deal of time absorbing this context, but the machine has not, and it does not know where to look. A machine that watches the whole Marvel Cinematics Universe series is not going to understand why a character is iconic or which part of them is iconic. It would need input from a human about human morality, about power, and many other concepts to even begin to do that.
A great deal of the context required to understand an image is unobtainable by simply scouring the internet or any data on it. This is because that data is not on the internet; context often involves factors like cultural body language differences, expressions that mean different things in different places, statements that are only understood with reference to a particular time and place, and so on. As a result, a machine trying to ‘learn’ these things cannot do so because the data is not recorded for them to learn.
As a third meme example, this portrays Donald Trump as President Shinra from Final Fantasy VII. Without some understanding of that game, its storyline and the depiction of President Shinra within the game — plus an understanding of Donald Trump and his politics — a machine will not be able to understand the reasoning or humour behind this picture. Politics is a very difficult field for a machine to understand, due to the many implicit references and contextually-dependent meanings present within it, and a machine is not going to be able to find all of that context or begin to understand it without a huge amount of human input. A researcher looking to do that will likely end up retiring before they can finish.
Without context, true “AI” art tools cannot exist
One day — if there is a world where robots walk around on the streets or humans have all of their senses recorded 24/7 for some giant machine to decipher (in itself, a more or less impossible feat) — it is *plausible* that machine learning and “AI” could become comparable enough to humans to be considered intelligent. That time is nowhere near here; we are not remotely close to it yet.
That isn’t to say that art tools like this are useless, simply that they do not have any comparison to artists. They are good experiments, as a visual way of showing the progress of machine learning, and they can be useful for various kinds of image recognition technology and for generating inspiration for artists to some degree.
Machine learning in other fields has been incredibly useful (medicine for example), as it is an efficient way of computing solutions to things whilst a human directs it, but in reality these are just advanced algorithms, not “intelligence”.
The most that these art tools can become, for now until the distant future, are ways for artists to ease their workflows; for example, to help with brainstorming ideas, or for generating effortless visuals of rough landscape or atmosphere ideas for a client to look at. They will never be able to replace artists in any capacity.
What about replacing artists for concept sketches and thumbnails?
This also isn’t going to happen.
The only thing art tools like this do is provide an easier way, effectively, to search for existing content: it’s a lot easier to use one of these tools to search with text, like “dark, ashen battlefield with birds wielding swords” than it is to find a few artists who draw those things and mash their work together (translation: plaguarise it) for a thumbnail. That’s the only actual benefit it offers.
There’s certainly companies who will use this to skimp on having to pay an artist to do it — but those companies aren’t hiring artists to do this stuff in the first place, so there aren’t jobs lost. They’re just transitioning from stealing other artists’ work in the open to using an art tool to do it for them. The art tools are certainly going to make that problem worse, but they’re a symptom of a larger problem of plagiarism in these scenarios, not the cause.