Look into the machine’s mind

the data
Using the chatgpt api, I ran the same completion prompt “Intelligence is “ hundreds of times (setting the temperature quite high, at 1.6, for more diverse responses). Given a text, a Large Language Model assigns a probability for the word (token) to come, and it just repeats this process until a completion is…well, complete.

semantic space (behind)
Each text (a prompt completion or a sub-sequence) has an embedding: a position in a 1536-dimensions space (I call it semantic space, or s²₁₅₃₆). For each response there’s a trajectory through s²₁₅₃₆ that corresponds to each sub-sequence of words, example: “Intelligence is “ → “Intelligence is the” → “Intelligence is the ability” → “Intelligence is the ability to” → … → full completion.

Because I cannot visualize a 1536-dimensions space (yet), I use a popular technique called Principal Components Analysis that tells me, for the set of points I have, what are the most important (principal) dimensions, and allows me to rotate the highly dimensional space so when I look through it, projected into only 3 dimensions, the points are scattered as much as possible. It’s the best (linear)possible reduction of dimensions. In fewer words: it compresses a highly dimensional space into few dimensions while preserving as much info as it can. More or less the same as when for drawing something you choose a perspective (you rotate the object), so it provides the most relevant information. I call this new space s²₃, and it’s what I visualize.

What you see in the cube is a tree of trajectories that bifurcate. All start with “Intelligence is “ and progress towards longer and less probable sub-sequences of responses. It’s a different representation of the same tree being visualized on the right (both visualizations communicate).

The tree visualization (right)
Visualizes all collected completions. It also represents the calculated probability of a word following a text (because the sample is small, this is only a good approximation for the initial levels of the tree), so “Intelligence is the “ will be followed by “ability” ~75% of the times, at 1.6 temperature. If temperature was lower this probability would rise, until achieving certainity at temperature=0.

By hovering a word, which corresponds to a point in a sub-sequence, you can see in the cube the trajectory from the prompt to all the completions that start with that sub-sequence.

Try other prompts:
· Chatgpt is
· Best thing about AI is
· When
· Santiago Ortiz is (yes, this is a selfai. What I found interesting is that it’s ~50% truth ~50% bs, and it feels like it describes alternative versions of my self in the multiverse)
· My dream
· Tell me a story:
· Intelligence is

references
Simulating my friend Philippe, where I explain embeddings, and how they are used to run semantic search and to find the proper knowledge from a corpus to use it as context for LLMs prompts
A deeper explanation of LLMs, next token prediction, temperature and embeddings, by Stephen Wolfram
English by degrees the original Next Word prediction model by Claude Shannon

moebio for more experiments and data proyects

Tags: , ,