The Antedote
Posts
Reasoning About Reasoning

Reasoning About Reasoning

The art of adequate answering

Soban Raza
December 18, 2024

"An alleged scientific discovery has no merit unless it can be explained to a barmaid."

— Lord Rutherford

1. Good for Business

If there’s one thing that can be said, it’s that language models are now of major business interest. And why wouldn’t they be — they provide an efficient solution to producing streams of text.

Want a chatbot that helps customers browse your e-commerce website? Pop in a language model that knows your website in-and-out, it’ll be a virtual guide.

Want to summarize a slew of reports without going through them yourself? Feed them to a language model and have it do the job for you.

Virtual assistants are a role that language models fit like a glove — Cortana from the classic video game Halo would most definitely be realized this way.

The Antedote

Subscribe for regular doses on AI agents & LLMs.

Even the average user is getting in on the action — it’s now common to see people on the internet band together and use language models to translate foreign books and comics that no one’s too interested in bringing overseas.

LLMs are essentially the modern printing press, in that both will irreparably change the way information is handled and distributed to the masses.

But it’s not all sunshine and roses, as many of you are aware. There’s a major roadblock that prevents us from using LLMs as we please.

2. Something Wicked This Way Comes

That’s right — it’s reasoning. Despite the fact that these language models come up with good-sounding answers, getting them to reason about them well is easier said than done.

Well, why? Because we don’t know how reasoning works at all! And the ideas we do have, we can’t easily check!

“But my scenario doesn't need reasoning, why should I care?” I hear you ask. Well, language models can fumble even the most mundane of tasks — in OpenAI’s recent ad campaign tweet showed o1 provide woeful instructions for building a bird house. The punchline? The tweet was meant to highlight o1’s reasoning ability!

“The sad thing about artificial intelligence is that it lacks artifice and therefore intelligence.”

— Jean Baudrillard

It’s too early to forget the impact Bard’s assessment of the James Webb Space Telescope had on Alphabet’s share prices!

While cognitive science may be good at modelling perception, a complete scientific theory of reasoning is still a mystery that eludes us. We don’t even have consistent definitions of reasoning in the first place.

To avoid opening a can of worms mired in philosophy and neuroscience, let’s restrict ourselves to something simpler — if we ask a language model to give us an answer, can it justify it without erring?

3. Plato’s Man

Let’s consider a hypothetical example of what we mean by reasoning in this manner.

Suppose you’ve tossed the following school-level word problem your language model’s way:

”Jack has a bag of 42 marbles, of which ‘n’ are red and the rest blue. Jack takes out a marble and tosses it aside. He then takes out another marble and tosses it aside. If the probability of Jack picking out two red marbles in a row is exactly one-fifth, how many red marbles did Jack start with? Is your answer reasonable?”

Chances are, your model won’t provide a solid answer. We’ll leave it to you to actually figure out the answer and if it makes sense or not.

An illustration of Diogenes bringing a plucked chicken to Plato in response to his definition of man as “featherless biped”.

Now of course, some models are getting better at doing basic problems of this kind — however, they’ll crumble when met with anything just past this in difficulty.

There’s a reason for this — it’s called the coherency problem. The gist is that because modern-day language models focus entirely on predicting what word goes next, they haven’t internalized any reasoning ability.

This problem is exacerbated by the fact that many language models split up reasoning-problems into very small pieces, thereby requiring more text to respond.

5. The Elements

You might argue that relying on math reasoning alone is a poor indicator — sure, we’ll concede. So now let’s mix in the following:

Mathematical reasoning — how a model performs on math problems.
Logical reasoning — how a model uses logic to deduce something.
Causal reasoning — whether a model can accurate judge a cause-effect relationship.
Temporal-spatial reasoning — how a model uses time and space to reason about a context.
Scientific reasoning — how a model uses scientific principles to reason about a problem.
Moral reasoning — how a model uses a set of moral principles to address a dilemma.

Of course, there’s far more kinds of reasoning tasks that can be described — these are just the tip of the iceberg.

An illustration of the trolley problem, one of many dilemmas that get thrown at AI models to test for moral reasoning.

Unfortunately, the problem we’ve described earlier with our word problem can manifest elsewhere as well.

As the AI regulation debate picks up more steam, it befalls on us to make sure our models can back up what they suggest.

Say you’re using a language model to review health records for insurance policies — if you’re going to let an algorithm make such a critical decision, you better know why it did so. Major US insurance firm, UnitedHealthcare, is already under fire in court over claims of using algorithms in denying coverage.

The EU’s General Data Protection Regulation requires the right to inform users how their data was algorithmically processed, and France’s loi numerique entitles citizens to information on the implementation of algorithms processing their data.

And we still have ways to go — with matters like healthcare, we’re exploring uncharted waters. In these unregulated areas, it would be prudent to look into strategies for assessing how our models reason before the legal hammer strikes down.

6. The Road Ahead

This newsletter is actually the first in a three part sequence.

The next issue is dedicated to practical methods for dealing with reasoning evaluations for a white-box (i.e. fully exposed) model.

The one after that will do the same for a black-box model.

Of course, reasoning research (and by extension, interpretability and explainability research) has ways to go — however, it’s imperative for businesses trying to make use of current language models to put in the effort to deal with reasoning.

If you’re interested in AI agents for your organization or product, visit our website at antematter.io

Reply

or to participate.