Look into the machine’s mind
Posted by Armando Brito Mendes | Filed under LLMs, visualização
Uma web app capaz de explorar os vários caminhos obtidos da resposta “what is intelligence” do chatGPT
the data
Using the chatgpt api, I ran the same completion prompt “Intelligence is “ hundreds of times (setting the temperature quite high, at 1.6, for more diverse responses). Given a text, a Large Language Model assigns a probability for the word (token) to come, and it just repeats this process until a completion is…well, complete.
semantic space (behind)
Each text (a prompt completion or a sub-sequence) has an embedding: a position in a 1536-dimensions space (I call it semantic space, or s²₁₅₃₆). For each response there’s a trajectory through s²₁₅₃₆ that corresponds to each sub-sequence of words, example: “Intelligence is “ → “Intelligence is the” → “Intelligence is the ability” → “Intelligence is the ability to” → … → full completion.
Because I cannot visualize a 1536-dimensions space (yet), I use a popular technique called Principal Components Analysis that tells me, for the set of points I have, what are the most important (principal) dimensions, and allows me to rotate the highly dimensional space so when I look through it, projected into only 3 dimensions, the points are scattered as much as possible. It’s the best (linear)possible reduction of dimensions. In fewer words: it compresses a highly dimensional space into few dimensions while preserving as much info as it can. More or less the same as when for drawing something you choose a perspective (you rotate the object), so it provides the most relevant information. I call this new space s²₃, and it’s what I visualize.
What you see in the cube is a tree of trajectories that bifurcate. All start with “Intelligence is “ and progress towards longer and less probable sub-sequences of responses. It’s a different representation of the same tree being visualized on the right (both visualizations communicate).
The tree visualization (right)
Visualizes all collected completions. It also represents the calculated probability of a word following a text (because the sample is small, this is only a good approximation for the initial levels of the tree), so “Intelligence is the “ will be followed by “ability” ~75% of the times, at 1.6 temperature. If temperature was lower this probability would rise, until achieving certainity at temperature=0.
By hovering a word, which corresponds to a point in a sub-sequence, you can see in the cube the trajectory from the prompt to all the completions that start with that sub-sequence.
Try other prompts:
· Chatgpt is
· Best thing about AI is
· When
· Santiago Ortiz is (yes, this is a selfai. What I found interesting is that it’s ~50% truth ~50% bs, and it feels like it describes alternative versions of my self in the multiverse)
· My dream
· Tell me a story:
· Intelligence is
references
Simulating my friend Philippe, where I explain embeddings, and how they are used to run semantic search and to find the proper knowledge from a corpus to use it as context for LLMs prompts
A deeper explanation of LLMs, next token prediction, temperature and embeddings, by Stephen Wolfram
English by degrees the original Next Word prediction model by Claude Shannon
moebio for more experiments and data proyects
Tags: bifurcações, chatGPT, word network
When Your Vision and Hearing Decline with Age
Posted by Armando Brito Mendes | Filed under Data Science, infogramas \ dashboards, visualização
Bons gráficos de linhas com ajuste de curvas
By Nathan Yau
If you want to feel like you’re getting old, visit an optometrist and have them tell you that in 6 to 12 months you won’t be able to read things up close and you’ll need bifocals.
For most of my life, I had good vision without glasses or contacts, but in my mid-30s I noticed the basketball score on television looking kind of blurry. I had astigmatism. Just a little.
My prescription didn’t change for years. Until recently. My optometrist hit me with the news that most people start to have trouble reading up close between 39 to 43 years old. I had to look into it.
The following chart shows the percentage of adults who wear glasses or contacts, by age, based on data from the National Health Interview Survey.
Tags: ajuste de curvas, gráficos de pontos, idade, perda de audição, perda de visão, velhice
Airfoil
Posted by Armando Brito Mendes | Filed under infogramas \ dashboards, lições, visualização
Excelentes animações sobre fenómenos físicos como o fluxo de ar em asas de avião ou noutros meios
The dream of soaring in the sky like a bird has captivated the human mind for ages. Although many failed, some eventually succeeded in achieving that goal. These days we take air transportation for granted, but the physics of flight can still be puzzling.
In this article we’ll investigate what makes airplanes fly by looking at the forces generated by the flow of air around the aircraft’s wings. More specifically, we’ll focus on the cross section of those wings to reveal the shape of an airfoil
Tags: animações, física, fluxo de ar, visualizações
Common Age Differences, Married Couples
Posted by Armando Brito Mendes | Filed under Data Science, visualização
Bons gráficos de alfinetes e de dispersão com outlier
By Nathan Yau
Through pop culture, it sometimes seems like it’s common for there to be a wide age difference between spouses. How common are the age gaps, really? These are the age differences through the lens of the 2022 five-year American Community Survey.
Tags: casais, Estat Descritiva, gráfico de alfinetes, gráfico de dispersão, idade, outlier
Why Line Chart Baselines Can Start at Non-Zero
Posted by Armando Brito Mendes | Filed under Data Science, estatística, lições, visualização
Uma boa demonstração, com gráficos dinâmicos, de como os gráficos podem ser enganadores
By Nathan Yau
There is a recurring argument that line chart baselines must start at zero, because anything else would be misleading, dishonest, and an insult to all that is good in the world. The critique is misguided.
Tags: enganador, gáfico de linhas, gráficos, linha base
Full Of Themselves
Posted by Armando Brito Mendes | Filed under Data Science, visualização
Um relatório de tratamento de dados muito bem explicado
An analysis of title drops in movies
by Dominikus Baur + Alice Thudt
A title drop is when a character in a movie says the title of the movie they’re in. Here’s a large-scale analysis of 73,921 movies from the last 80 years on how often, when and maybe even why that happens.
Tags: análise de dados, filmes, IMDb, visualização
NBA Apps and data Database
Posted by Armando Brito Mendes | Filed under data sets, estatística, visualização
Uma lista de sites com dados, visualizações e apps sobre basqueteball
Sravan January 10, 2024 [NBA] #apps #shiny
This database has a list of apps and websites related to NBA Data and Visualizations.
Tags: app, basquetebol, dados, graficos
Switching Jobs
Posted by Armando Brito Mendes | Filed under Data Science, visualização
Bons gráficos, bastante originais…
When people move to different jobs, here’s where they go.
By Nathan Yau
Tags: gráfico de barras, gráfico de pontos, jobs
1,374 DAYS: MY LIFE WITH LONG COVID
Posted by Armando Brito Mendes | Filed under Data Science, visualização
Uma boa estória com excelentes gráficos
By Giorgia Lupi
Ms. Lupi is an information designer who has been experiencing symptoms of long Covid for over three years.
Dec. 14, 2023
Every morning, I wake up in my Brooklyn apartment, and for two seconds, I can remember the old me. The me without pain, the me with energy, the me who could do whatever she wanted.
Then I’m shoved back into my new reality. As I fully come into consciousness, I feel dizzy, faint and nauseated. Pain pulses throughout my body, and my limbs feel simultaneously as heavy as concrete and weak as jelly. It feels as if a machine were squeezing my skull, and extreme exhaustion overtakes me.
These sensations have been a daily occurrence, with few exceptions, for the past three years and nine months. In the morning my boyfriend will be the one making coffee for us. He will run all of our errands. He will cook and clean. He now does all the things I used to do, the things I can’t do anymore.
Tags: COVID, covid longa, covid-19, dados
Young Money
Posted by Armando Brito Mendes | Filed under infogramas \ dashboards, visualização
Bons gráficos de áreas
The jobs of young people with higher incomes and what they studied
By Nathan Yau
Income tends to increase with age, because more work experience and education tends to lead to higher paying jobs. However, young people can also earn higher incomes. Using data from the most recent 2022 American Community Survey, let’s see what those people studied and what they do for a living.