Gimy, could you please start reading the works of Emily M. Bender and Timnit Gebru and put your feet back on the ground. You're doing the "overinfatuated with the latest tech hype" thing again.
I don't know if that's any arguments or projection in this. The tech is useful right now, it's fully integrated in my work, and it has boosted my prpductivity, like right now. If I have use for a tech isn't being over infatuated? Or are you skeptical because it's new and you are annoyed by new things instead of trying to understand it?
I was on the bandwagon way before GPT was a thing, it's the reason I start that thread, so not about "new tech", it's stuff I have been thinking and researching for a long time, from the stanford parser going through emily short's work on narrtive design in IF and chatbot like Suzzette and Rozette using chatscript and former AIML stuff.
Heck even in generative art I was following the works of AARON by Harold Cohen and EMILY HOWELL by David Cope. That's like super old news.
So when you say latest hype, i'm like: "so you out yourself as a newb?"
Like I literally started the batch by warning about the actual ability of the tech, people try to imagine it's that oracle ai or database query, that's problematic.
I shared article along 2 main consideration:
- Stuff you can do and verify on your own LOCALLY, without relying on big tech gatekeeping
- Stuff that give you clues and inform you about the internal working of the stuff.
For example:
- People keep saying it's predicting the next word, that's slightly inaccurate, the slightly is quite important. I focused on sharing informations on embedding because of that "slightly", embedding are black box that summarize the semantic classes of a token (as representation of words), the system predict the next embedding, that is the whole "set of semantic class", that's a lot more data than just the next word, which mean there is a "velocity of semantic" that is maintain through the generation. The attention mecanism is similar to the standford parser (or parser in general) in which we associate a token to other, EXCEPT that th" "slightly" add a twist, through the multi headed attention, it's attending to whole set of semantic class and filtering them through the "multi head". Then it recursively apply that system to result and storing stuff in the FF layer just after the multi head. Once you understand that, you get a more precise idea of why the model is working. It cannot store data strongly like a typical chat bot or sql query, because the system rely on generalizing pattern, so any memory is overfitting, and hallucination are over generalizing, the overall output looks a certain way that seems like it's doing thing similar to regular chatbot, but we must refrain from that. The system usefulness is in the semantic manipulation, not in the database consulting it's used as of right now.
- Same for that frakking metaphor about image generation, people repeat "it learn to remove noise" but they don't actually know what that mean and why the noise is added in the first place. Noise is a way to jitter the learning of semantic class to combat the sparseness of multidimensional space, which allow the system to learn a gradient toward the classes, which allow to navigate the latent space toward desirable state.
You won't find these explanations online.
The issues is that I discovered neural network on my own, as a kid, when I was trying to make smarter NPC by enhancing affinity system, then online in ai forum, people told me that was neural networks.
I want to make a course that demystify neural network away from the statistical lingo. For example latent space become a lot more understandable if you have a toy model that map onto known concepts, especially latent space. For example a color to text, or text to color model, because RGB is basically an embedding space we all understand, we can represent word as position in a the rgb semantic space that is only 3 understood dimensions (6 simili obfuscated if we use the HEX), and mapping HEXCODE to RGB to other space like HSL is similar to what's doing on with neural net.
I'm not enamoured with a new tech, it's literally what I have been pursuing since I was a literal kid.
Also do you realize, everything I share, I read it through it, read alternative, and selected to put it here for one reason or another, often I post multiple version of the same thing, because people can choose the one they understood better, and also having the same thing told multiple way help understanding better.
//---
Emily M. Bender and Timnit Gebru
So I looked into that, and I'm like, there could not be more tone death post to anything I shared.
Why would assume thing things I'm not saying and just further your cognitive bias?
I'm literally mad, that's borderline illiteracy. Just because the ambiant discourse is going place, like anthromorphism, doesn't mean the opposite reduction is smarter.
For example: I remarked, along many other, that the tech is able of "theory of the mind", that's a leap to say it's like human, just that it has this property doesn't imply anything like that, it's like saying plane can't fly because they don't flap their wings. AS A DESIGNER, "theory of mind imply" compression of data and context sensitive understanding, that's cool in itself, and it probably don't map on how "theory of the mind" map to human behavior. It doesn't have to flap to fly.
And that the tech has inherent danger, just like any other tech since the invention of fire and incendie, (danger that are based off who create and use the tech, therefore who are left deciding what's the identity of the tech is. Also equivalence doesn't mean similarity, that's toddler based understanding.
I mean that's as shallow as condemning gas stove on the basis it's a bomb in everyone's house, while there is a more nuance discussion about other threat it poses vs the advantage it afford, against other way of "boiling water to make food".
You would thought that as a black person, I'm aware of the circle of bias and threat in technology, and that would be an incitative for me to actually study it in order to not be taken aback when it's misused by having BASIC literacy of its functioning? Run on sentence for effect, i'm exasperated about the discourse about the tech. If it's useful, use it, it is, so I use it.
//----------------
I mean it's like when relational database was created and people were like "we will reach human level intelligence with that" and people was disappointed calling the tech crap, then it became known as SQL and bring whole industry up, even though it didn't bring the stupid lofty goal of "gay communist utopia" star level of society, it was a useful tech worth studying, and guess what you still has to learn about it when you go in computer, its used everywhere, more so in the internet era.
Basically there is 3 metrics to know if a tech will even moderately change society, at some level:
- Does it modify relation to labor (the 3 type, mechanical, cognitive, social).
- Does it modify how we handle logistics (organization, planning and distribution of things)
- Does it modify how we communicate
LLM, even at their current level allow to upset how we are organized along these metrics. Even if they aren't able to be like human brain or human capable (for example they aren't capable to form complex planning that engage in discontinuous thought), an army of cheap junior moderated a single manager can do a lot along these 3 axis, it's not like human don't make error, which is why we use "mixture of expert" structure, aka scientific consensus.
I remember a time in TIGS, on internet in general, in which we had substantiated debate about stuff that didn't involve "no u" and quoting influencer rando on twitter because it cajole my fear. Like what is the argument? is it an opinion or it based on observation of facts and counter facts? where is the literacy, and the historical reflections?
I'm just mad, as you can tell by the number of successive post. That arguments was so dumb and so not on the level I'm used to. I'm out the playground, I'm not a toddler.
/rant
//----------------
BTW if you want to roast me, here are a bunch of more or less pursuit I have:
- Fully automated indoor farming, by recyling human waste (to harvest NPK nutrients) to replace a fridge in appartement
- TRASH (Tiny Robot, Analog Spring Humanoid), based on removing digital computation to reproduce the ATRIAS papers and taking inspiration from SLIP (spring Loaded Inverted Pendulum).
- YAGNNNI (You Aren't Going to Need Neural Networks for Intelligence), rolling back to word embedding to better do cluster of interpretatable statiscally derived semantic from corpus, avoiding the ELMO jump to machine learning into black box embedding. Currently trying to figure out how to derive polysemy and homonymy of tokens, and probably experimenting with current embedding to derive subspace of the embedding for interpretability, does the embeddings support polysemmy or the LLM FF layers decode it back from context? What gain we had if we decoded polysemy and homonymy before the LLM learning step. Also given that layers of GPT LLM are sparse, is there values in using recurrence of weight? And what about XOR at the neuron level? What are other metaphor to explain the vector space at the token level to make intuitive?
- NUCLEATION theory in the process of intelligence through simulation, in which I explained how AI diverge from human due to how the Nucleation is done in the emergence of the ability, and what it does tell us about human, that's where I draw the line through ACTUAL arguments, if we admit (you probably don't) that there is equivalence with the PART of the brain (equivalence NOT similarity, it's a thought exercise), then AI can't be think like human due to the nucleation process.
Basically we have to assume that part of the brain (probably the neocortex) as a structure that does "equivalent" processing, that ie predict the next "representation", in human it's fed by internal stimuli (emotion, goal seeking, body sensation, etc ...) and external stimuli (the senses data), which give it a singular coherence (unique perspective), and these representation are on a feedback loop.
Meanwhile LLM are feed representation (the training) of representation (the text) created by these stimuli (text written by human to express their experiences), that's a second order, maybe a third order representation of human mind, since these comes from multiple sources of experience, the nucleation is not singular it represent "multiple state at different time from multiple sources", it's not experiential and has no sense of feedback, it also has no volition, no emotional reaction, and has no feedback stimuli. So while it looks like it act human, it's a simulacra (which is impressive by itself) that's emergent of the process. The big property of this unstructured collections of experience is the ability to instance agents along a learned state to behave in a certain way through (partial and inaccurate) simulation. That's a very useful property that's the main revolution, not the fact that's it's potentially or not like the human mind.
That's also the main danger (those state are unstable and unordered, phase shift can occur, like we already observed when the behavior drift to manic state), and it's something nobody has literacy to discuss as far I know. Though discussion about the potential agency of AI is a close but crude approximation (see the Mesa Optimizer problem by Robert Miles).
//---
You know stuff nobody cares about, because anthropomorphisme go BRRRRRRR, people who say the tech is not human are the first to anthropomorphized, because they push human quality they then denied, it's the "it doesn't flap it can't fly", that's not different than the basic AGI lover.