A test post

An example post from a Jupyter notebook
LLMs
prompting
logic
Author

An LLM User

Published

February 2, 2024

A visualization problem for LLMs

  • Read the following and see if you can answer the question:

There are three boxes in a stack. A pink one, a purple one and a green one. The pink one is in the middle of the stack with the purple below it. What is the color of the box on the bottom of the stack?

Most likely you answered purple

But it is possible you first said green (don’t worry if you did it is quite a common response!)

How do we do it?

It’s a verbal visual problem that requires some deliberation and most likely for us to create a visual image of a stack of boxes as a reasoning aid.

Notice also the complexity of the language used in the prompt to describe the scene. The use of natural language as opposed to a formal/logical calculus does hold the potential for ambiguity (multiple interpretations) and inexactness. However, it is the most normal/natural, efficient and broadly accessible way of communicating such information.

Notice 1. The use of anaphor or reference. - Three BOXES -> a pink/purple/green ONE as opposed to a pink box, a purple box and a green box. - The PINK ONE is in the middle ... the purple {BOX} below IT {the pink box}

We tend to try and avoid this kind of repetition to make our speech more coherent.
  1. Although the description in language may not be entirely concise, there is no real ambiguity as to how the boxes can be arranged. It must be: | Box | Stack position | | — | — | | GREEN | top | | PINK | middle | | PURPLE | bottom |

How do LLMs response to the prompt?

GPT3.5 Turbo

Had some difficulties and answered green

Llama2 (chat 7B version)

  • Answered correctly (purple) and used some so-called chain-of-thought reasoning

Bing Chat (MS Co-pilot)

This LLM is GPT4 with some Microsoft specific tuning and RAG (search) additions.

It get’s it correct:

The color of the box on the bottom of the stack is purple. This is because the purple box is below the pink one, which is in the middle of the stack. Therefore, the purple box must be the lowest one. 🟪

But when you follow up and ask it to create visual representation of the box arrangement

PROMPT: Can you create a visual representation of how the boxes are arranged?

It responds

RESPONSE: Sure, I’ll try to create that. Here is a visual representation of how the boxes are arranged:

And uses DALL-E3 to create 4 different attempts at a visual representation:

NONE OF WHICH ARE CORRECT!!!