Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411365 Posts in 69352 Topics- by 58404 Members - Latest Member: Green Matrix

April 13, 2024, 04:38:40 AM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)Procedural resource dump
Pages: 1 ... 24 25 [26] 27 28 ... 30
Print
Author Topic: Procedural resource dump  (Read 138561 times)
[email protected]
Guest
« Reply #500 on: February 27, 2023, 09:08:41 PM »

Those are examples of a human writer writing all the dialogue, which tends to railroad the story. I think there are better alternatives.
A story is railroad by definition, else it's not a story. Anyway, every world is finite, I create stuff with chat gpt, and he goes directly for tired trope, so even generative ai that has read all the internet won't escape a limitation. The proble is that meaning IS repetition, so if you want something meaningful, you will limit yourself in any way, else you will have to accept noise as valid creation.

Now we're getting philosophical. Finite universe vs infinite universe. Either way, it's bigger than a human brain. I haven't used ChatGPT, but I have played around with AI Dungeon, and it went a lot of places I did not expect. It did repeat itself a few times (such were the limits of GPT-2) but I could type in any random shit and it would make something happen in the story that I psychologically connected to what I typed.

Another thing to consider is dev time. I think Neon Genesis Evangelion is great all-encompasing story when you take the collective works into account, but if you were to sit down and type every line into a game, you would die at your keyboard. Some of the best VNs have hundreds of thousands of words, but not millions. There is a strong case for generative models here.
Logged
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #501 on: February 28, 2023, 05:58:58 AM »

Well the parser wall is what is specified in Doug Sharp prediction quoted and linked previously. Ideally a more solid solution would be a hand made generative system that help the machine learning system to track data and supply directions, with the machine learning translating the data (taken as templated prompts) into machine enhance generation, and then take the player input and its generation, then parse it back to the generative hand made system format. Anything share about story generation will help make such a system more solid.

But also:
https://emshort.blog/2022/04/09/what-does-your-narrative-system-need-to-do/
What does your narrative system need to do?



Quote
But improvements had to work around a looming brick wall. While a normal game would have a simulated world model that kept track of characters, locations, or inventory, and contained logic for movement or combat, GPT’s text algorithm had nothing of the sort. It was in essence a clever black box: words went in, and new words came out. Nothing was kept track of or simulated in the traditional sense. Earlier AI-driven text games like those from the Oz Project or Versu still had structured assumptions and procedures that humans could refine, customize, and tune; but the millions of learned parameters in GPT models were not accessible or even necessarily understandable to human operators. So while adding an inventory system to an in-progress parser game would be relatively trivial, teaching the same sort of concept to a GPT-driven game—other than via the tedious repetition of real-world examples—was almost impossible. Most of Latitude’s workarounds boiled down to clever ways to shove more text into the black box along with the player’s input, or to understand more about its output: a quest tracking system added in fall 2020, for instance, added a separate machine learning model trained to detect strings of text that indicated a goal had been achieved, like “at last, you have claimed the sword!”

https://if50.substack.com/p/2019-ai-dungeon
2019: A.I. Dungeon
http://ai-adventure.appspot.com/

https://escholarship.org/uc/item/9gq229h4
RESPONSIVENESS IN NARRATIVE SYSTEMS


https://emshort.blog/2021/10/05/mailbag-macro-to-micro-ideas/

« Last Edit: February 28, 2023, 12:24:36 PM by gimymblert » Logged

[email protected]
Guest
« Reply #502 on: February 28, 2023, 09:09:19 AM »

Reading some sappy article by some literary journo doesn't make me feel pumped up to go make the next AI wrapper. Oh those poor writers. I was thinking sequence generation might be useful as AI for an individual character in a 3D world, but if that character has nothing better to do than talk, it's indistinguishable from any hack circling the drain on a blog nobody reads.

Anyways, this looks interesting for its compactness:
https://github.com/tmikolov/word2vec
Logged
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #503 on: February 28, 2023, 12:27:34 PM »

Anyways, this looks interesting for its compactness:
https://github.com/tmikolov/word2vec
You should have looked 2 pages ago (24) that's the grand daddy of LLMs (large language models) like GPT models.

There is also material that allows you to decipher and build your own model by understanding how all it works.
Logged

[email protected]
Guest
« Reply #504 on: February 28, 2023, 03:31:48 PM »

Anyways, this looks interesting for its compactness:
https://github.com/tmikolov/word2vec
You should have looked 2 pages ago (24) that's the grand daddy of LLMs (large language models) like GPT models.

There is also material that allows you to decipher and build your own model by understanding how all it works.

Illustrated word2vec? I am finding it a very interesting read. Maybe I'll have some hope of understanding the math involved. I'm still not clear on the advantage of a Transformer architecture over older systems like word2vec, but I like a compact C library.
Logged
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #505 on: February 28, 2023, 04:07:28 PM »

Anyways, this looks interesting for its compactness:
https://github.com/tmikolov/word2vec
You should have looked 2 pages ago (24) that's the grand daddy of LLMs (large language models) like GPT models.

There is also material that allows you to decipher and build your own model by understanding how all it works.

Illustrated word2vec? I am finding it a very interesting read. Maybe I'll have some hope of understanding the math involved. I'm still not clear on the advantage of a Transformer architecture over older systems like word2vec, but I like a compact C library.

Word to vec is just a way to represent word meaning for a computer, transformer makes actual use of that meaning to generate back words, more precisely, it looks at wordtovec sequences to create new word to vec sequences, that are then decoded back to human level word. Wordtovec was just the first, evolution are GLoVE and BPE, which does the same thing but better, ancestor where ngram and bag of words, they all find PRIMARY meaning with the statistical relationship of words.

Transformer find BEYOND secondary meaning, as in, they extract meaning, polysemy, class of word, then topics, style, grammar rules, from group of word. Transformer can be seen as parser engine which then generate back, that is it detect meaning of words, then the meaning of the context (text meaning), then generate follow up context by mixing topics, style and grammar and finally choose the group of words to implements these meanings. Here "detect" mean it has memorized them.

Basically Transformer solve the big ass problem of hand generated contents (HGC), "the dictionarry problem", that is in order for HGC to works, you had to figure out all the minute rules and data, just to process something as simple as "what's up doc", then anticipate all possible answer to that. That's teh par transformer automate, you just give them text (all of internet) and it will figure out the dictionary all by himself.

Now the issues is to find way to find way to interface HGC with LLM.
Logged

[email protected]
Guest
« Reply #506 on: February 28, 2023, 05:02:18 PM »

Quote
Now the issues is to find way to find way to interface HGC with LLM.

To give a speaker personality?

I probably don't know what I'm talking about, but let's assume some hypothetical language embeddings only embed word context in a phrase. Then, you could write a handful of example sentences using a word, and overwrite the model with the newly generated word embedding. So if your AI exists in an alternate universe where Apple produced clothing instead of computers, you could give the overlay:

"Apple started making sweaters in 1935."
"There's nothing quite like the feel of Apple denim."

etc.

If you completely overwrote specific embeddings, you could control the use of the word.

Maybe you could search a pre-trained model and delete all instances of a particular word. Then you train the word into the model again using a limited corpus that you wrote.
« Last Edit: February 28, 2023, 05:10:42 PM by mse » Logged
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #507 on: February 28, 2023, 06:17:48 PM »

Quote
Now the issues is to find way to find way to interface HGC with LLM.

To give a speaker personality?

I probably don't know what I'm talking about, but let's assume some hypothetical language embeddings only embed word context in a phrase. Then, you could write a handful of example sentences using a word, and overwrite the model with the newly generated word embedding. So if your AI exists in an alternate universe where Apple produced clothing instead of computers, you could give the overlay:

"Apple started making sweaters in 1935."
"There's nothing quite like the feel of Apple denim."

etc.

If you completely overwrote specific embeddings, you could control the use of the word.

Maybe you could search a pre-trained model and delete all instances of a particular word. Then you train the word into the model again using a limited corpus that you wrote.

Chat GIMYMBLERT:

You should look for prompt engineering and fine tuning.

The gist:

Prompt engineering is when you give specific prompts to get desired result, the LLM act like a simulator, so you can set it up by asking him to instantiate a role. AI dungeon starting paragraphe is such a prompt. Because it tries to predict next "context", if you start with dialogue, it will continue with dialogue, if you start as a story it will continue as a story, etc ... So if you start with apple is a cloth company, it will fills up the blank.

You can add hidden prompt to the user to get desired results. That's how we interface at a basic level. You can also do second order prompting, like, take the user input, with a prompt that ask to analyze it, use the result of the analysis to create an answer, show only the answer. Since the LLM also have some high level understanding capacity, you can show it an example of data, then ask it to do data the same way, so it you show him how to translate text data into json file, it can do. That's call few shot learning.

Fine Tuning is when you train an already trained language with a subset of data, such that it mimics that data better, but using the skills it learn from general training. That's what GPT mean, generative pre trained model, emphasis on pre trained. That's what AI dungeon 2 did with text from online site, I forgot the site, such as it produce text closer to style to d&d. That's also how chat GPT works, they did a question answer data set they trained in the networks. Using fine tuning for world is a bit overkill and doesn't solve the problem that, for the ai to work, it rely on knowing everything, which mean it has chance of drifting from your intended behavior.

The apple sweater things only need few shot learning through prompt engineering. You can try with chat gpt and you will see. The problem of such approach is that there is a memory size, called context size, every text generated and prompted are simply added to the context log. Everything that doesn't fit the log windows is simply forgotten.

To go around this, we hide some part of the context windows, use for prompt engineering data (a bit like a custom context ram), and append user inputs, you are still limited by the size of the context and you are limited in what you can input and generate in what remain. Some of the data you can put in the hidden context is a summary of interaction, but it is a lossy compression.

A more sophisticated interfaces would be to have a whole architecture that have the full log, and use secondary process to analyse the log after each input by running specific context of the LLM used only for analysis, used to generate data to fill the context ram. But that's a lot more costly to run (multiple instances of the ai (cost resources) or multiple sequential run of the ai (cost time)).

Note that these model have a mix of what's call over-fitting (memorizing data verbatim) and over generalization (make up data to fit a pattern, also call hallucinating). It's not easy to just erase embedding because you might mess with the ability to remember things or to simply process things. The LLM works because it is able to generalized data, using grammar to generate text is a kind of "making up things", the model don't make a difference between facts and rules, generalization is when it learn a rules, and facts is when it store stuff.

Most data within the embedding is a mixed up of the two, if you mess the embedding you might mess a rules that destroy its ability to "reason". In general, LLM should not be relied on for hard facts and data, because it might turns them into templates and rules, and makes things up you don't want. Which is why you need to couple them with other system that do that better.

« Last Edit: February 28, 2023, 06:29:33 PM by gimymblert » Logged

[email protected]
Guest
« Reply #508 on: February 28, 2023, 06:41:48 PM »

For a personal assistant or something missing-critical, you want to nurture the AI with best practices, a huge volume of data, and some hard-coded checks. For entertainment, especially with simpler pre-GPT models, I think you could delete words. The alternative would be to create the entire corpus of its knowledge manually and train on that.

What I'm getting at is the uniqueness of individuals in time and space. The Internet has no nationality and exists in the present, so things trained on the Internet are very homogenous. If you trained a language model on only Chaucer and Shakespeare, it could generate medieval speech for a fantasy game. If you wanted it to use more modern lingo, you could use prompt engineering and hope GPT or whatever sticks to your description of "Lord of the Rings with slang", but it might "remember" something it read on Twitter and break immersion. Hence the need for smaller language models.
Logged
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #509 on: February 28, 2023, 07:10:44 PM »

The generalization ability is emergent FROM the diversity of input, it's been tested on smaller model, or even bigger model, that the model generalize, aka learn to proper use language, if you show him enough data. Else it will just repeat the data. Also smaller model don't have the same ability to generalize. And actually bigger model are much better at staying on track than smaller model. You aren't gonna have chat GPT with only small model, at least until we figure out how to make the model smaller by finding new techniques.

But in page 25, you have a step by step, very easy tutorial about how to train a small GPT model on shakespear, if you want to try, the code and model is also on github. Since that's simple, assuming you have the GPU for it, try increasing the data size and the parameter size on your own. Neural network aren't hard, and there is many model you can download from internet or training data to learn and test.
Logged

[email protected]
Guest
« Reply #510 on: February 28, 2023, 07:23:05 PM »

I wish there were a way to do tests like that with a small transformer library in C/C++. Neural networks are hard. Backpropagation? I don't understand it. Softmax layers? No idea. All these starting points are for research rather than production, so just like VR, I'm not going too deep into it until the software is ready for my use cases.
Logged
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #511 on: February 28, 2023, 07:40:03 PM »

The video I shared explain all of that, especially if you look at the channel video, It's really not that hard, it's almost trivial once you get past the occasional jargony.

For example backpropagation generally mean just substracting 1 or 0 or some number, but it's hidden behind big word like derivative. Neural network are big old sum and mul, the math is literally justa grocery store tickets.
Logged

[email protected]
Guest
« Reply #512 on: February 28, 2023, 07:50:15 PM »

I actually made a neuroevolution library some years ago. Pushing data through a network is easy as pie, but the genetic algorithm is too slow. So, I wanted to implement backpropagation but fell flat on my face due to the jargon. The guy lost me in the video, and he kept pushing his other videos like a 30-minute intro wasn't enough. Big math words make me lose interest.

dlib rocks if your AI does anything with images, but only its baseline features work if you want to process text. There were a couple PRs related to transformers, but "grouped convolutions" on the CPU never got finished and that PR was last updated in May 2022 and never merged. So much for that I guess...
Logged
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #513 on: February 28, 2023, 08:13:20 PM »

I guess I should one day make my little NN isn't so hard tutorial...
Logged

[email protected]
Guest
« Reply #514 on: February 28, 2023, 08:28:35 PM »

I guess I should one day make my little NN isn't so hard tutorial...

That will be the day we all become AI experts.
Logged
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #515 on: March 02, 2023, 05:52:10 PM »




 LLaMA: Open and Efficient Foundation Language Models (Paper Explained)
Quote
Abstract:
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/
https://scontent-mia3-1.xx.fbcdn.net/v/t39.8562-6/333078981_693988129081760_4712707815225756708_n.pdf

TL;DR we can have small model that run on basic compter (like stable diffusion) with the power of GPT3, if you can SIMPLY train smaller model with 2048 a10 GPU during a month with trillion of words on available free dataset. Smaller model need to be trained way longer than bigger model to reach similar performance, but they eventually do, which is teh point of teh paper, train longer to get performance in text inference.

The good news, is that we can probably find a collaborative way to reproduce the results.

also:
https://github.com/oobabooga/text-generation-webui
https://github.com/FMInference/FlexGen


and
https://analyticsindiamag.com/here-is-an-open-source-rlhf-implementation-of-llama/
Logged

gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #516 on: March 03, 2023, 10:34:13 AM »




 LoRA vs Dreambooth vs Textual Inversion vs Hypernetworks
Logged

gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #517 on: March 05, 2023, 05:27:13 PM »




The Disturbing Art of A.I.
Logged

[email protected]
Guest
« Reply #518 on: March 10, 2023, 04:10:52 AM »

Quote
Smaller model need to be trained way longer than bigger model to reach similar performance, but they eventually do

A human baby can learn a language through occasional verbal communication over the course of 2-3 years. A baby is illiterate and can't process trillions of words. However, the hominid brain probably had about a million years to evolve spoken language. Research has shown that the structural shape of artificial neural networks affects performance. Maybe the solution is to train or evolve a "blank" that can be quickly filled with a particular linguistic dataset of modest size.
Logged
JobLeonard
Level 10
*****



View Profile
« Reply #519 on: March 10, 2023, 06:43:22 AM »

Retraining one model on previously trained models is a common tactic in ML, if I understand correctly. I have a friend who had to train ML detectors for interpreting acoustic data and used models that were previously trained on visual input.
Logged
Pages: 1 ... 24 25 [26] 27 28 ... 30
Print
Jump to:  

Theme orange-lt created by panic