Google introduced an updated version of Bard (now called Gemini), its competitor to ChatGPT and other imitative AI systems. It is apparently a solid step forward — Google claims that is beats OpenAI’s system in 30 of 32 of the standard AI benchmarks. In competitive programming, for example, it is better than 85 percent of programmers as opposed to 54%. But that got me thinking about where these imitative systems represent a potential step backwards.
Competitive programming bears very little resemblance to the day-to-day programming that most programmers do for a living. In some senses it can be harder, but it many others it is easier. Competitive programming is a game, essentially. A series of well-defined problems with clear solution states. that is not, to put it mildly, the environment that 99% of programmers work in. However, these tools can still help programmer productivity because what they are good it — repeating what they have been fed under specific prompts — constitutes a lot of what programmers do. There is a lot of boiler plate code in most significant programs and generating that code is a legitimate way to help programmers program faster. This is why ChatGPT and its ilk is good at student essays, for example — they tend to fit into a specific format with specific rules about what is acceptable and what is not and so they are relatively easy to reproduce. This easy reproduction of material it is already exposed to, however, is one way in which AIs may be something of a step back, at least conceptually.
Imitative AIs calculate what should come next based on what has come before. this makes them not really capable of creation, but it also is the root of why they lie (Oh, I’m sorry: hallucinate). An imitative AI system lies because it knows that, for example, a citation should go in spot X, so it creates one (remember, these are calculating the next letter, not actually generating material based on any understanding of the material. Gemini and ChatGPT don’t know squat about, well, anything. They just know what people who wrote about the thing did in the past). Or it makes up facts, because those kinds of facts were present in the kinds of material you requested. Coding AIs have the same sorts of problems. Since they imitate other code, they have the same likelihood of introducing security errors found in their training data, for example. They imitate, so they also imitate our flaws. We used to have a different, perhaps better, conception of computer derived helpers: expert systems.
Expert systems are similar in output to imitative AI systems — give them a question and they gave you an answer. However, since the answers were based on expertise, not volume, in theory they would provide you the right answer almost every time. When applied to narrow areas, they could be assured to be a boon. But that is the problem — you need to know the answers beforehand and that meant the systems tended to be narrow in their applicable domains and relatively time consuming to build. That is why so much effort has been placed into imitative and other learning methodologies. It is better, the thought goes, to have a machine that can learn a little in many areas than one that knows almost everything in a very narrow specialty.
But it is? Yes, the outputs of the AI systems can seem amazing. But in proactive, they require constant babysitting to endure that what they produce is usable and correct. Is that an appropriate trade off? I honestly don’t know. The concept of a machine system that is broader than deep has its appeal. But I think we may have lost something with the abandonment of the idea that we should be certain, as reasonably certain as possible, that the system we rely on for help are actually helpful. How valuable is a computer playing doctor when it cannot actually get the question correct?
Right about now, someone will chime in with the notion that these systems are better than humans. But they aren’t, not really. Since they require oversight to do anything important, they aren’t really an improvement even if the stats are occasionally better than average. A doctor that makes a mistake can learn, actually learn, from it, or can be punished. An imitative AI model evades responsibility and cannot change until its training material has been updated and fully integrated. They are not the same, and they don’t have the sma elevle of flexibility and accountability.
At the end of the day, AI companies have bet that being partially correct much of the time is more lucrative than being absolutely correct almost all of the time. And they may very well be correct. But I think we have lost something by narrowing our ambitions so sharply. Automation can help people, can improve things, but only if its focused on people, not profits. We seem to have given up on that notion and we are all going to be worse for it.