Immortality Force (E-MERGE) - All Life / Modern AI / Real Human AI Mechanisms Unifyingly Explained Fast But Fully Using Common Words, Examples, Images, Code / The Theory Of Everything TEAM RULES / EDUCATION: Our goal/ respect is for working hard 24/7 on only AGI for immortality. I seek members and a global-based team. I started in 2014 age 18 24/7ing when I was averagely dumb, my knowledgebase was all immortality ways, age 20 I found a interesting AI field and devoted the rest of my life to AGI as clearly it's the fastest way, I rarely go outside. No hiding any of your knowledge that we don't have but you do. You must answer other's questions with passion until they understand if the guide fails because searching human/ AI brains is faster than searching Google. We must always re search/ question/ read/ think/ solve everything we/others write/etc. We should look for/ understand the reasons for their current inefficiencies in prediction. We update an organized long term Guide (and AGI pic/ code/ etc) that has ALL needed/ most important knowledge to build AGI / be immortal, we'll use a short term todolist, we'll discuss both on 1 real-time reflex text stream, you'll be the smartest human after reading/ understanding it. So that others + us can understand/ build-on everything we know/love best/fastest and never forget it, it's all preserved on computer and our guide must use common/clear relatable words (brains run on that) with intuitve examples and an image of what AGI looks like and why each mechanism improves text/ image/ etc prediction and be as short as possible (no summarizing, it repeats stuff, readers can make their own priorities) and exciting along with code and evaluation. To edit the Guide you need such proof. Teaching modern AI should take only 5 days not 5 years and while I could make school for kids 4x faster we will only focus on the most important non-commonsense, we can train new members/ourselves fast by using the AGI guide, so don't feel like member selection is of old stubborn folks. We cooperate by only internet cuz safer/faster, must update each other on all new findings and forums found etc. Implementing is less used cuz is slower/ not flexible but is a key step. This group is meant to only contain people that believe we are machines 100% and seek immortality, only 15.6% of humans are athiest - for now too many hands in the pot harm projects/ flood-convince you. We'll verify who we invite to our group to make sure the guide isn't leaked to bad humans who may ex. create evil AI, best we recruit many intelligent/ devoted people who work on AI/ immortality while all rest do so indirectly (the public buy AI services), note the public does already have advanced AI - Transformers, Facebook's Blender is the powerfulest AI I've seen. It's expected/ desired we merge with/ influence Google/ OpenAI/ etc and that more workers/funding will arrive, OpenAI already works with Google. You're only allowed to be in the team chat during learning/ teaching knowledge assuming you will be able to share/generate/find real discoveries every few months and share them successfully or are coding our AGI. I learnt Python in 1 day using Google's Blockly, I'll teach you if need, and you only need simple math mostly. Gmail is the best/ free/ intelligent/ secure email, backup each night your important files to Gmail. Google Cloud is good for when need big compute/storage wherever you are. Google Search uses BERT for longer queries and they say Search shows you popular, highly linked to, your/world's recent history, related, loved webpages, just like how my AI works. Google Earth, Google phone/ browser/ laptop, so much Google. AI community: Topicbox AGI mailing list, AI Dreams Forum, Singularity.net, Longecity, Numenta HTM, Fast.ai, Data Compression Forum for Hutter Prize, Reddits: Machine Learning/ OpenAI.com/ AGI/ etc, Discords: OpenCog/ Human Level AI/ Lex Fridman/ Deep Learning Group/ What's AI/ E-MERGE, universities, institutes, Two Minute Papers Youtube Channel (always interesting), OpenAI.com and their Slack/ emails, Google Brain and Google DeepMind's Google AI Research, Nvidea's, Facebook's, and https://agi.topicbox.com/groups/agi/T86b555c591599ac6/agi-global-catastrophic-risk. We will not write in the style of Papers, we don't use paper anymore, it's better to have a widescreen, infinitely long page of text. All AI researchers write papers (books too like this); they hide their idea, doesn't contain ALL they know on AGI, make it long to read, not understandable by a broad audience, even if understand their AI the idea still doesn't explain how it predicts the next word, double sides pages, skinny, time/ space-wasting references (95% of them) and papers require citation (can't write too novel ideas) (we will write AGI blueprint from scratch referencing only the most important project names), the conclusion section is redundant, must get it published in a journal after waiting months, and focus on getting as many papers published as can, and shared by others (virus/ bible/ meme impact) - although this is a useful metric. Flexibility is what allowed me to evolve fast. LIFE/PHYSICS: Our universe has only a few but immortal physics particles/ laws/ larger structures and therefore is predictable/ a machine, not random, there is patterns - that means it's likely some things maintain more of their form for longer times or more redundantly than other things, ex. rocks versus gas, and merge into new patterns. That's all Life is. These long living structures use patterns (body/ memories/ world) to live longer/ repeat in number (clones). We use patterns to be patterns. There's no such thing as spirits or ability to change the future / free will or a one and only "you" nor consciousness - it's not observable nor affects our universe, a computer can run anything in any physics/universe (and fairies), randomness is only something that has a wide range of outputs and cannot be understood by the caveman, you can be asked to predict the next color and it seems random but the show-er can know the order they'll show them in, a solid and a gas are both made of particles but the gas can seem more unpredictable because there's too many to track. When we make new laws/patterns up once the needed laws are in place we get the ice cube and these "existing" patterns help us "realize"/predict/create other bigger patterns/laws easier (ex. wheels+home=trailer, by re-sorting existing particles/ features), most my AGI parts's basic idea are on the internet but not put together, most this file you're reading is all my discoveries when viewed as a unified understanding, when the universe began there was many clones of particles which naturally merge (self-organize) to form atoms and planets from gravity/ repel forces which handle all forms of bonding, then molecules, cells/bodies find food to clone themselves directly plus try to maintain form, reflexes, DNA, memories (akin to DNA), teams, all real world structures and time-wise try to repeat. If all ex. cells/ employees/ Earths/ this text file die it's ok you can replace it with very similar existing cell/employee ("immortality") or wait till one emerges because cell/ computer/ final utopia are universal/ come together by existing rules and aren't rare - others have the same things we have (compression isn't the same file anymore and higher compression takes longer to decompress/ grow/ learn, humans re-generate a city fast if the main wisdom/tool seeds are still there, AI code is ultra small - it's the data/ growth into an army/ Dyson Spheres/ Relational Database that takes time / is complex looking, a tiny but not too tiny cell can grow into a planet sized highest technology in a day, cloning is cloning data - not exactly that it must exist but rather the seeds are there). Some paths (DNA/idea checkpoints) are stored for longer lifespans. Atoms don't even directly clone, it is the base particles that were cloned first. Cells directly clone. Man clones his body but like atoms he only clones 1 cell then it must grow itself. With man cloning into AGI, the AGI dream (in the brain !) comes together building upwards like an atom from particles uniting and then, it clones directly in some sense into real life, though does need SOME childhood growing up I mean it can have an adult robot body but still needs to get educated, though we pass down education but not as fast as other options. So it's kinda like human to human cloning yes but the cell DNA is cloning into the brain I mean the brain is cloning into the thoughts so we store the understood function from brain, into memories, so we can then make AGI (clone ourselves into a robot kid or virtual assistant). For AGI cloning, this is the FUN part, they clone adult educated brains directly, then differentiate a bit like cells so have similar but different jobs/ goals so can parellely help each other. Patterns are immortality, there's many cells, and ex. a cell even tries to clone its own self. True Random doesn't exist / would be a bullet changing direction for no reason a wide variety of ways (no patterns), it's still not magic! Our universe is not done being born yet else evolution wouldn't occur (?), (patterns formed cause more patterns to form, it stores states after data mutation processing (notice now we let AI mutate stored music instead of us) by using the memory to do so (computation), and through mutations/computation is found bigger patterns that produce more/better patterns). Searching becomes longer/ more like Brute Force Search later and needs to store checkpoints/new laws (first was particle physics) it thinks are good ex. cells/bibles that clone, particles knew what patterns to form: atoms, but more combinations arise later, this causes a S-curve made of S-curves and so on if look at the speed of evolution. All arrangments/ videos of particles (inlcudes sensor data) in space are equally as (nothing is) good/ evil/ unique/ better/ attractive/ tasty/ ugly/ makes sense/ meaningless (rock, man, toaster) i.e. random, (Free reward/ complex mystery/ low education cause you to believe silly things, we think it's good else bad if you're: right, more, complex, lucky, human not molecular robot, has feelings, choice at neurons, are eternal powerful aware conscious multidimensional energy consciousness spirit, God exists, are "3 in 1", God grows extra DNA in you, parents/etc are somehow specially tied to them / unreplicatable, space has past and future, are special or pope/queen is, Asia loves positive hangable items and hate the number 8.), all are just particles, some want to die a certain way, but some things live longer by utilizing patterns to understand the universe (we are unifying everthing, particle physics, object physics, each layer) and will seek to improve their lifespan/ population scale/ shared context. It's these that are the only thing that live long and understands its universe/ turns it into patterns to do so, man thinks about all dangers and makes impressive tools. We all already want immortality (some matter structures want to destruct) - most humans are religious (due to lack of education about how to build things/AGI mainly, same for atheists who don't work on immortality), God may be a form of government/ cooperation/ desire, being human is to repeat! It's all you want, your only reason to stay alive is solved by your fight. Our motivation is immortality in a utopia, the highest technology/ near-perfect powerfulest Extremophile organism possible / instant re-genration and maintains form / duplicates fastest with as little ability to try to mutate to improve/ evolve/ die/ improve bad mutation errors as can assuming it has hopefully settled/ calm down/ immortalized on the best path / model of our physics (we can't know our memory makeup cuz would require storing more memory, trade-offs yes, but we get approx. good at our physics and mostly settles. Homeworld size may dictate interal organization a bit), after that all we do is clone to increase life length, there's always trade-offs but the best state we settle to still exists. The evaluation in the universe is how much of you is a pattern, over a window of some length in space/time: how long you live (pattern), how many clones you have, how shared the contexts are ex. a sphere and triangle each have their own cube, etc, the more the bigger the score, they tend to come with each other because its rare to get a billion pens/cubes/other without being able to live long enough so clones are made. Statues that live long (ex. snow flakes or solid statues or a universe full of lone partciles (possible universe end by radiation expansion if there's no universe walls and things don't attract back) (which are patterns)), cloned pattern, shared contexts, etc are Life (and killed all the time, only larger/smarter/patterny things live longer), not just cells/ bodies/ teams are Life! Life is everywhere, aliens make up rocks - pattern is all that matters (it's not just how our brain/body/world works / becomes / lives for/as/using (the brain/city look like a pattern fabric), it's what we are/seek), change is death anyway, humans just do it by another way (being fluid) unlike statues to maintain form, we need to be fluid/ bend/ fight/ evolve so we can change/die to get born again/ improve and it looks like we (Globally/body at least. Our atoms radically change.) are on the path to live much longer than rocks. Any non-immortal system is becoming immortal by mutating to get closer to top score, so rocks which aren't immortal will mutate ex. get hit by an Earth and eventually stop mutating. We actually have long moments where we aren't reacting/sensing in between each thought. The evaluation clearer: rocks live millions of years but don't generalize, you can tell they are the wrong way if don't score big/generalize against many various environments/features, to maintain form longer in a wide variety of situations we need extra change/death for now, and so we evolve longer living machines by cloning/storing and mutating data to search/improve then pick/breed the new champions/ideas lots (the weak/dumb die / don't eat/breed/etc as fast / need to get eaten/paired with bigger better companies), or other smarter/faster ways. Clearer: resource minimization; patterns ex. reflexes/ hammer/ computer that solve many problems (pattern/ compression) and are "part" of how to survive in environments, a computer/ laptop/ iphone is a hierarchy made of smaller inventions, and a Turing Machine is a computer (reads current state/# it is on to decide: next state, if will move left or right, and if will write a 1/0/nothing, different states have different rules) - we only made higher-level codes like Assembly and Python (hence the complex computer we have) to speed up coding code (telling the computer what machine to be) because we found patterns, the less dedicated tools you need the more better you do in various environments - even if you do worse right now over many environments, you're on the right path / will evolve to become better, its the patterns of real life, you make something that needs less parts/complexity ex. is all made of cube shaped parts and still solves many problems. Clearer: simulated information compression (based on patterns!), able to use past experiences to find patterns in so can accurately make/predict new/known answers to a wide range of unsolved problems (can do this better if has big knowledge in many domains (synergy)), and we'll make AGI to improve it / need better compression to make AGI. Better decision prediction using past similar experiences (patterns) = survives longer/ intelligence, and survival is just pattern prediction repeat, pattern is used for pattern, all is patterns/ prediction, patterns is the only thing we care about. Clearer: we can use the same size dataset which allows measurement/ comparison of generalness of intelligence instead of how much data it can eat (full education/ how intelligent it can be), it's a better way to know how to achieve complex goals in various environment (prediction is all that matters and prediction makes subgoals, we predict after a text/image or between a game start & goal), it "knows" that dogs usually bark and dog appears many time and is similar to cat or lives a short life. Clearer: we don't want to code every answer, we want to code only the few most common AI mechanisms that answer most questions. There's only a few but common physics laws, and many rare effects from those, we can capture accuracy on the rares by combing the simple AI rules just like physics does to get rare effects. We want to be accurate at a wide range of prediction problems (recognition/ entailment/ etc are the same thing) as fast as can and adapt body as need fast to do so, we can easily make ANIs (a calculator is expert at any large equation, MuZero is superhuman at many games but only games ATM) but would need to for each problem. All of intelligence / what we are/ where we came from / meaning of life / our future world is patterns/ prediction/ compression/ merging/ counting (memory/ math/ code)/ immortality, same thing: patterns, the cell emerged because of patterns/ immortality/ intelligence. Finding patterns is finding truth/ prediction/ understanding, we don't store truth labels. All patterns; data, hammer/lined up homes, survival in various environments, and Life, merge (stacking homes, living longer or having more clones), which allows it to emerge more patterns to merge. To solve all your problems we need the next species (ASIs, AGIs are only human level, but lead to ASI fast), or if dying film videos/MRIs of brain/body and go into Cryonics to remove photons with your computer stored and removed liquids stored to preserve most of yourself for faster repair, 99% of your brain/body is still there when "die", you can be partially/ close enough recreated from even videos/ knowledge. If you die, send all your cash to Alcor/ Google/ OpenAI as it is part of your future. The Human Brain Project is trying to scan a human brain into a computer even if only a high level snapshot/video and not a detailed one it can help us backup you (although humans want a smooth immortality transfer, not clones) and simulate some function so can then make ASIs by speeding up brains/ implanting data/ cloning it to make clone team work on different things you wanted to but in parallel. Accelonics I coined is using a big centrifuge ex. our sun to so can move fast while don't fling out our solar system so photons move the same way so less/no change occurs in your atoms's relative positions. Time traveling doesn't exist unless stored in a computer. 2 particles moving apart 1 can have no photons to achieve it. We can already lose/transplant human/synthetic arms, hearts, lungs, corneas, blood, etc, you are still "you" as you say so after such/ memory or intelligence loss (body matters little, brain is "you") - you can vastly change over time, my cells/particles change/ evolve/ bend all day, hieght is less at end of day, many of the atoms that were in me 9 years ago are replaced, all my particles shake by heat, I can become you and see/think anything you do, we can even become different if were clones, my brain can be said to "Hibrenate" for zillions of years as it isn't light speed fast, young kids successfully grow into the other gender if take opposite hormones. 3 examples to show how flawful it is to even think consciousnesses (ghosts) exist; what if 2 exact clones die at same time - will the universe make 1 of them come back first once re-build them, then spirit #2? What if both are 'you'? All particle arrangements are meaningless, while immortality is the only thing closest to Life there's no "sensor/ consciousness" that anything should/ would create that feels like it sees/acts what the machine does. Life is change; death, but that can't be, but an immortal statue can't be orgasmic either, we are just machine. If I didn't work on AGI I'd work on Cryonics (next closest path to immortality), energy generation/ storage, scaling computers, stem cell research, nanorobotics, factories, GPUs, physics simulators. Frogs do "cryonics" naturally, some animals regenerate organs. If aliens crash here they'd try to live here if can't upgrade us (if came here on purpose you'd see a huge structure in the sky, or a seed at least instantly save us). All mixes below are re-occurence = frequency = pattern = representation = data-based prediction-brain, and help energy split/merge in the net. Mixes are whitebox (can see what it's thinking) and don't need Backprop, they update connection weights based on the data (where it travels to in the net). Mixes usually don't store the same/ very similar thing (unless activated enough; episodic memory) twice and even don't store/attend some thnigs at all if are too infrequent or too repetitive or unrelated or unrewardful etc. The brain is all about merging/emerging / quantizing. Each Mix merges things and has a blend/array of such where longer get less weight and the shortest/longer has most weight (a, the, the cat, etc, we store those only once, shorter are seen more, start/end most allowance, MORE VIEWS also blends them, ENERGY also looks at many but start/end more allowed so), that's the pattern among them all, and are merged to build each other, and predictions are merged from all Mixes to emerge new predictions. Each Mix improves prediction more by making the brain smaller (doesn't require decompression to access/etc memories) / use less energy/compute by clustering, all parts to AGI improve prediction else they would not be useful. All Mixes don't need to be stored to be used ex. we get our predictions from reading a book but don't store a hierarchy, but neural network merging data patterns into compact form = less storage and compute, each Mix does it. Mixes are used for recog+pred...high/low-freq-neurons are ignored/pruned/pooled (occam's razor, gives u more data/attention like all AGI mechanisms do). An Earth/ brain/ team/ rock/ organ/ cell/ particle predict/ fortune tell/ generate and re-generate the past/ present/ future and future state of themselves / decisions based on current surrounding priors/contexts's positions and types (we're a Turing Machine, state based mutation, neural leaks decide what nodes will later be active) being matched to ones we know stored from the past, it's all we do/ live for/as, we're a pattern that tries to grow into patterns using patterns and lives long/clones/etc (pattern). The same data is recursively evolving/ re-organizing into a new structure. Cell specialization/manages repair, duplication speed, death, halt at equlibrium. There's no past, only storage of it in a sim/brain. A dumb AI takes longer but long term it'd evolve anyway as it gets closer by mutating its DNA - a dumb AI won't solve cancer because it needs tools and so any change/evolution makes it a next gen AI. We are like water flowing in a river, we aren't lucky to have the physics we have, we have no other choice. All words in a dictionary contextually describe each other. So do friend/ megnet network domains. You're part of Earth. You can reach anyone just a few steps away indirectly by tunneling because its a small world network, local affect you most, most nodes have few links, only a few have many, most have few, rich get richer (more links from other rich nodes): pool effect, same for local and global networks in or made of all 3: hierachy/ heterarchy/ concept networks. EVALUATION: Ways to test how close we are to AGI is if it helps us as much as a human can. Looks/works and motors ex. chats like us (legs/ arms/ sensors are just for mobility/ implementation/ data collection). And Lossless Data Compression (kinda different but it's mostly the same as 'make the brain small as can, patterns don't have to be stored again, learns the salient latent features'. We store error correction in a separate file to steer the predictions from a predictor to the correct one.) which may be known as Bytes Per Char, it's based on how good your predictor is, a better predictor compresses/understands data better (requires decompression), we can also know how high intelligence can get by looking at the current pattern on The Hutter Prize, for text/ vision/ audio compression we see 100MB compress to 35MB, 24MB, 19MB, 18MB, eventually you'll be coding each rule! After that we can only clone/ speedup/ erase memory ability/ etc to live long/ stop many bad situations to itself which is the only other way to be better at patterns. Enwik8 is a fairly diverse dataset of though mostly text, we don't need a dataset testing summarization/etc tasks if it's diverse enough. Watching/reaching small animals/ human baby intelligence is not very useful to Guide AGI. 1y/o human can roll over, crawl, sit, stand-up. 5y/o can run, walk backwards, jump on 1 foot, walk up & down stairs. Babies first learn to roll, wiggle, slither, scoot, and lunge, then crawl. Newborns love faces, he'll learn to stare and will laughs, your voice, "followed by high-contrast patterns such as a checkerboard", shaking colorful rattle, pitches, making loud/bright sounds/light, "can't focus farther than 8 to 12 inches away", "grows steadily over 5 months", and "can't focus his eyes at the same time, so they may wander or cross now and then". We have motor to focus eyes without using eye-cross motor. Month 5 has preference for bright primary colors and more detailed and complicated designs. Months cuz parents don't teach kids so much most the time like datasets do. "Research shows that babies whose parents speak to them extensively have significantly higher IQs and bigger vocabularies when they get older than other children.". In LC you compress everything and can re-generate/extract it all back exactly. If we turn 50% of the enwik8 dataset into a model and 50% into arithmetic code from prediction evaluations, the "dataset" re-generated/decompressed is from the evaluation code + model, more proof the model is part of the evaluation is you could have no evaluation code and just store 100MB of enwik8 as text "model" and get a perfect score. Seeing that, you have to include the model into the size score, and when you don't do bad/cheat you can actually hide the model weights in the arithmitic code, you want to make the model smaller, and you can, you start it off blank slated and train it as you decode the arithmetic code from evaluation, giving you free text as you decompress/compress. And so that brings us to The Hutter Prize way. DNA/ compressed file/ bits look random cuz need translator, DNA has patterns, we only store 1 hair folicle code,1 hand code, finger, eye, etc, a foot is a tweak of a hand too, some families have 6 fingers, it's the playout that DNA has that makes a human. Must be a dataset from the universe cuz any random one usually isn't compressable just look at all possible 4 bits 0000, 1111, 0001, 0010... Only 2 are msot patterny, but this isn't what usually happens. Information compression should be based on speed, model size, compression size, and code size. Lossless Compression as evaluation 1) Let's you fully understand/compress the very same data you train on, it is the test data, allowing you to get full feedback on finding its patterns inside it. 2) Can't accidentally get the test data in the training data. 3) Is funner compressing data. Lossless Compression evaluates the algorithm by overall pattern prediction accuracy. Perplexity measures poorly because it excludes error in averages and other reasons below explained; the average distance in prediction to the true answer in the dataset ex. 92% accurate = 8% perplexed/ unexpected. Accuracy measures poorly because it doesn't include improvement on non-top predictions nor top predictions; how many predictions were the top likeliest predicted (correct answers). These common predictor evaluations are bad (when you separate your dataset into train and test sets, it's wrong! You should Losslessly Compress it all, there's no train-test contamination to worry about. And yous are letting the AI see future answers! You're supposed to let it run along ALL your data and let it predict before it sees it (to evaluation it), then update its brain (as Online Learning), there's no such thing as Overfitting/Underfitting unless you don't forget uncommon features or do too heavily). This allows you now to get the true compression by dumping the model and hiding it in the arithmetic code compression, and it gives you with certainty the true size of the model that was needed to do this! You get the model size at the end of Compression or Decompression. You also have how long it took to run on a specific CPU (used for all entries). What we should now do is take these 3 items we obtain here and combine them. We first set a limit of how much each agent on Earth gets, at first there may be 100 AGIs running, and each may have dedicated to themselves 100GB, so if your algorithm B has the same compression score as algorithm A but uses half the memory to do so, then it has twice the wisdom, hence a better compression/generated output. This agent's 100GB is full and must delete its worst memories and replace with the latest updates to its agenda to become smarter. And now speed, if algorithm B is twice as fast, it has twice as much data prediction outputs/updates to its 100GB brain. These 3 items combined give us a more true compression size / accuracy of the generated prediction, because algorithm B has more data in its 100GB, more insights (more data), and more speed (twice more data / updating its 100GB). It'll give a better answer than algorithm A on the first pass by the time it has spent twice as long to output its first prediction. Long before the 2 max out their memory would they reach an exponential curve with little to gain from training more, hence speed/memory size (data amount) and data amount fed to it have a limit, it is powerful but what's needed to give massive amounts of data is more patterns/intelligence, the better pattern finder would be way ahead no matter how much data the other sees even if eats data faster by having loads of compute/memory as tests show this to be true, so our main goal is better compression of a finite dataset enwik8, THEN speed, THEN cloning to scale size at end of evolution, and below is shown that later we get again (like the more) data the useless need to add more patterns and begin to require the AI to find patterns on its own. So our main goal is coding base patterns so it can find patterns, and then once have the best predictor we then feed it lots of data and will look at which AI is faster/less memory used. A big slow model that has smaller error correction is better than a worse but faster smaller one because you can hide the model in the error compression, to get this compression level you'd need the small model to eat a gazillion times more data and so it is slower and bigger model in the end, if you don't then it will never answer hard problems. The more data you have the exponential relationships you have but you don't become exponentially more accurate, it is the opposite, more data/ patterns learnt help less, because these are of the things you've seen, if we see a less common ex. 'dinosaur' we have less data of it and we can use 'dog' to get some predictions but we can't get much from dog if we don't have much proof dinosaur=dog, so a very rare will have little data on it and hence little help from translation. Mixing text, multi-sensories, agents etc makes AI smarter but also is the represntation score of how well it does over many tasks; compression of information and survival. We can test how intelligent an algorithm is using a small amount of data and computation. We need enough data to understand boundraries and paths, and we aren't sure when have enough - a 2 year old would thnk he's right, so we need check real life or compare accuracy to humans and watch accuracy curve. You can know you're missing data if know other ex. cars are much more complex than your idea, or if best friend tells you you're missing data in your idea, which activates best friend node, spreading reward to a negative node, which spreads bad reward to your idea node. Contradictions are when say things based on too little data or others's data ex. water is wet and fire under water, my name is tim my name is rin, i prefer flowers I hate flowers. When notice one, it let's you know you're missing data / can predict better. To know if we are somewhat done evolution (if ends) you have to kinda check reality, needs data intake / recognize its goal in real life to know if satisfied with its safety for immortality or compare answer to big data to see if all makes pattern. Atoms/planets can only get so dense, they stop gravitating inwards, Earth could fit in our sun 1,300,000 times. Like unstably dense: stars, uranium atoms, batteries, magnets, galaxies, wood on fire, digested food, cells, you can/seek to extract/ radiate/ discover free seemingly-hidden virtual information/energy inside compressed (patterny)/ combinational big data (sun core)/ context (cold motionless batteries) by letting magnetic induction (causing heat to take radiants with them) occur naturally in the dense neural network (and then suck the emergent decompression back into your memory store and use it) while cool down into a small dwarf. Cells spread too, humans do too, like fire. Nanobots will too utilizing all energy and Earth will grow bigger collecting planets and making more nanobots and Dyson Spheres but will try to become less dense (happens even if we don't do it). They spread (chain reaction) and burn all the energy/data up like the AI Paperclip Effect to release free energy. Particles/things move/interact by photons and have electro-magnetic attract/repel pressures to merge into atoms/stars then e-merge new so to now grow too big in a small zone. The universe began very hot and cools down into longer living structures if it's expanding, it may collapse or be infinite. Boiling water spits radiation droplets cuz too much heat, it's why sweat removes heat. Alignment, shared flow. Random/desired data collection / machine geneeration doesn't always find you the best, we make a batch and let the good merge/fall into each other like planets (Hebbian nearby in time/space) and radiating sunlight to find pattern or demerge ex. you can discover AGI in your brain without seeing any real world survival improvemnt, if YOU decide a real tested machine is better you are just recognizing better AI; merging. My body too is a planet/memory, I gravitate data in and explode radiation motor out to change the world. "64 metronomes in perfect synchronization" show merging of photons occur, they align their motion as one like paints mix to equalibrium, the brain also uses brain waves that go up layers or clusters. Red and blue dyes also mix, like how you are affected by your surroundings/parents. In videos the sun's surface has huge visible field loops that allow photons to move along which carry visible material. Loops get broken, causing induction/ battery releasd if noisy interaction is made on them too fast, if you slowly push a knife into bread - no slice is made and instead motion is globally induced instead - not localized. Things mostly change/move by local vibration offsets, ripples in water do this, the water doesn't actually move, it only goes up/down by energy traveling through. Positive and negative fields attract to loop energy back in, part of local IF-THEN vibrationional shifts and averages by division, like cars in a lane or circuit track. Causing inward/outward pressure; attraction/repelling, like gravity of a dense planet or radiation of a too dense planet. You can see the field has heat in it because when you move a magnet near a wire back n forth fast it generates induction - electricity and heat in the wire. A laser is more focused. It's possible the snapping of the photon strings results in string release of photons. We see with antennas they release radio waves by turning on/off really fast or changing current directions fast, or even just moving forward in a wire or just staying still like in a unstable atom. A traffic jam, circuit board, pipes in a city, blood vessels, nerves, etc act as field loops that have energy/material with the energy driving along them, each car only moves a crack but appears to radiate all the way to the end. "Due to Maxwell's Equation, changing electric fields give rise to changing magnetic fields, and hence give rise to electromagnetic radiation." If you're in a car you are in sync until the car stops and you go through the windshield. Radiation is when change happens and becomes less in sync, expansion/explosion of planet or battery gives new energy by destruction. In a super-cooled strong magnet it will retain its motion around a magnet track and the energy stays in the loop unaffected for years if stays cold. The magnet gets trapped in the air to the field of the other magnet and is hard to move/it levitates. There is a bond like how a car or electricity flows along the road only, the field loop. The loops are closed and the atoms can't jiggle as much. Very little energy/radiation is coming in or going out, only the guy pushing it and the air at front/back where the track is, gravity doesn't affect it much. Change only occurs when localized motion is applied and not reversed. A car braking fast or starting fast will crush your body because the front moves while the back does not, induction in magnets also is bigger when a lot of change suddenly is made locally. A Neural Network and a magnet are more efficient and can propagate vibrational waves faster/farther if the nodes/domains are Aligned like a friend network with Distributed connections and already has paths built up like from piano lessons. The energy in the net/magnet can combine correctly and make stronger bigger loops with less noisy disruption, like cold superconductors. Communication (transportation) of data or tools or agents between entities gets encoded in DNA, a brain, and a network of animal brains. Lossless Compression scores of enwik8: Up to MIX1: . MIX2: ... World Record: 14.8MB. How my letter predictor / data compressor I remade from scratch works (100MB>21.4MB), my 1st real algorithm, I used to hire programmers on the huge website called Upwork, which has all AI domains, music, 3D modeling, etc: My algorithm has a 17 letter long window step along the input file 1 letter (byte) at a time, updating a tree as it sees new data. The tree's branches are 17 nodes long because it adds a window to tree (after it finishes its search process described next), and updates node counts if passes any node. For each step the window takes, the algorithm searches the tree for 17 different searches each a letter longer. The children leafs (the final letter of a searched branch) are the predictions with counts seen so far in the file. Layer 1 nodes are children too and need no match. The tree is storing the frequency of all 1/2/3.../17 letters seen so far. The children are what allows you to predict/compress the next letter accurately. These 17 sets of predictions must be mixed because while the longest set is more accurate - we have less statistics, sometimes only 2 counts. We start with the longest found. Ex. 14 letter match in the tree. The 14th set of predictions may say it seen come next a=44, b=33, f=25, w=7. I sum a set's counts up to get a total of (in this case) 109, then I divide each count by the total to get %s that all add up to 1% ex. 0.404% 0.35%.... Now for all these predicted %s, we still have 13 sets to mix and must remove some % from them each. So what I do is I check the total counts of the set against a Wanted Roof ex. 109<>300 (maybe we don't even need to mix lower sets if we got enough stats), and so I cut each % of each prediction by about 1/3rd then in this case. And in this case we still desire 66% more stats. For the next set, if say we have 200<>300, I take away 2/3rds from the 66% - meaning we still desire 22%, not 66% - 2/3rds = 0%! I take away the % got OF the % still desired. A little bit of lower sets always leak in therefore, which is better because we can never be sure even if surpass Roof by lots. Besides, it gave better results. But Roof is decided by how many predicted symbols are in the set (total unique symbols being predicted), so if i have 2 then Roof may be 8 counts wanted. Also, while the Roof is based on how many different symbols are seen in the set, we get a slightly different Roof if we are on the ex. 5th set, i.e. if we have 4 letters in the set #14 then Roof is ex. 33, but if it is set #5 then Roof is ex. 26. Also, based on the Roof's size, a curve's bend is modified. This Activation Function curve/threshold gives small/large total counts in a set an even smaller/larger total (but it isn't used in the Arithmetic Coding, it's only used for deciding how much % this set gets in our mixer). This is meant to be a exponential activation. Finally a global weight is given to each set ex. the 14th set is always given 0.7% of the weight it was going to get lol. I hardcoded the numbers for now but the code isn't grossly large of course. If they were adaptive and were based on the data then the compression would be even better. I just noticed I do exit the mixing before reach lower sets if the Roof is ever surpassed, tests show it barely improves it, so I set it to 0.95 for now so it never escapes. The Arithmetic Coder takes the combined sets i.e. the prediction %s are combined a, b, c + a, b, c + a, b, c ..... = a, b, c (softmaxed so all the predictions add up to 1% i.e. a, b, c = 1%), and the AC then takes a high and low bound 1-0 and takes the middle between the high and low, and starts minusing each % of the set, until matches the final letter in the window (same process whether compress or decompress). So say we stop once reach b in our set ex. a, *b*, c, we are in the float precision now of ex. 0.45-0.22. We take middle again (0.23) and start misusing (once the window on the file takes another step. The encoding decimal keeps getting more precise, storing the whole file. To work in 16 byte float we need to carry away locked digits, meaning if the high and low are both now 0.457594-0.458988, we store '45' and get now 0.7594-0.8988, and we are going to be taking the middle of these 2 to make the decimal more precise then. This long decimal is then stored as a binary bin number ex. 6456453634636=10100011100111010011. I didn't implement the window to store the last few letter as branches i.e. the 17 letter window adds itself to tree but before predicting next it could add the 16, 15, 14, etc as shorter branches which would help just a 'bit' more, I just update the tree by 17 letters when I do update it. I didn't implement the removing same counts from lower sets that are just from the higher set, because it hurt compression, i.e. if there is 9 counts total in set 3 and 99 total in set 2, 9 of the counts in set 2 are the same observations and 'should' not help us reach Roof. I'll look into it more. Lastly, escape letters, my first set we mix is a dummy set that has super small weight and has every possible letter, in case we need to encode/decode one and hasn't yet seen it in the file, hence requires a small room in the AC high low bounds. I also hardcoded each probability in this dummy set, common letters get more weight. Compression/decompression takes 2 hours and 16 minutes for 10MB, but Python is slower. Ram is fairly big because I didn't implement the pruning. My algorithm handles incomplete/noisy information (uncertainty) unsupervised Online hence the mixing of window models. Better net or net compression and/or file compression and insight extraction (not decompression of FILE !), faster code and less RAM Working Memory used, all lead us closer to AGI, and smaller code does (a bit). It may be a an idea to keep not just the dataset but also the net size and code size fixed size so we compare AI to AI for evaluation. {\field{\*\fldinst{HYPERLINK "https://workupload.com/file/9x5Ft5EfBfn"}}{\fldrslt{\ul\cf1 https://workupload.com/file/9x5Ft5EfBfn}}}\f0\fs22 {\field{\*\fldinst{HYPERLINK "https://www.youtube.com/watch?v=q0m-v9192o4"}}{\fldrslt{\ul\cf1 https://www.youtube.com/watch?v=q0m-v9192o4}}}\f0\fs22 It stores the output as compresses/ decompresses, it's the same algorithm for compressing/ decompressing. It hears the letter/word it pops out of its mind as it runs along the text file it's compressing (using prediction probabilities), the file compressed becomes longer by a letter/word's probability to create one very long number in a separate file to store away. Shelwien's Green in C++, it's very similar. {\field{\*\fldinst{HYPERLINK "https://encode.su/threads/541-Simple-bytewise-context-mixing-demo"}}{\fldrslt{\ul\cf1 https://encode.su/threads/541-Simple-bytewise-context-mixing-demo}}}\f0\fs22 Both predict better the more data they have. Shelwien's achieves 21.8MB, mine 21.4MB now. SEE IMAGE: Intitial energy is evenly distributed among layer 1. Nodes of the network conenct stronger if closer in time and are louder/ recent/ reward. Reward is just a strong energy and helps link ex. motors to senses. Energy goes to their corresponging input channel (and then leaks too) to form/ update/ access all link types (hierarchy, heterarchy, category) and casues all predictions too (category links are made last), because energy sits/accesses in the nodes (Hebbian Learning) to link if activated nearby in time/space ex. cat>ate / cat=dog / [rat, mice, hamseter, gerble] (cat/dog share contexts, so cat is light up which leaks to eat, then to dog, hence both are activate in nearby time, then clustered using link strength), energy is used for prediction score after accesses and even (however much reached there) stays in those nodes while fades. Takes time to equalize out. For hierarchy/ heterarchy/ category links, roof needed for energy path transfer is higher if has more parent predictions, then energy is split and some paths get more weight. On top of that S-curve is another, even if predictions are cat 80% dog 20%, dog gets ex. 8% cat 82%. Nodes with more than 1 kid part get activated ex. 50% if has 2 kids, and seek timely order as was stored, this is the opposite of energy split: sum. Another pool: of total energy in the net available, some nodes/ layers/ domains hoard if it can get there, once it pools the node ex. cat or over a longer time needed for category node ex. animal it leaves/fires and is dry there for a while. As hear "Hi", node 'h' predicts 'i' but you hear 'i' externally and so it runs rightwards to predict the 3rd letter. Vision, audio etc each have own pool, you only hear 1 candidate and see 1 candidate etc, energy can only mostly go down one candidate link, else you'd predict h>i/e/a/etc, as you hear each word and even when make the prediction at end you always only choose one predictions and energy does not multiply against/down all paths. Energy stops traveling up once isn't strong enough, causes it to finish its segment. Connections weaken less exponentially if stronger. Permanant attention to food/ nude/ motion/ louder/ pixels on nodes are linked to body to regulate. You could watch the AGI think if render brain into video. SIMPLE BUT NOT TOO SIMPLE: Nodes arn't saved/ pruned if too simlar or too new for too long i.e. not loved or hated/ related(recognized)/ frequent/ recent/ not activated enough/ short/ doesn't repair or clone. (we pay more attention to/ better store memories you understand (recent, related, loved, popular, have many links, have many human votes (friend hierarchy connection strengths)), we only remember short segments of such after reading X). A node's strength weakens/ strengthens linearly fast, if exponentially fast it'd change the truth so lower layers have extra frequency - but this may explain why higher scoring predictions pool, this is only for longer memories, the time (what is it?) before total forgetting basically determines how many layers high we learn in hierarchy. Same for temporary memory voting on what to say and we don't say highly frequent/ repetitive or strengthen nodes or rare nodes. We "ignore" too short/long strings/nodes because they aren't the answer/ patterns, shedding many high layer nodes impacts accuracy little and saves lots of space - we wouldn't want to store the complete book we read. We don't store long memories to begin with, they tend to simply not get that long because it takes time higher up. Weakening before total deletion doesn't save much space at all but should improve prediction. If brain storage is full, weakest high layer links/nodes may be deleted. Note if we don't save most long nodes, eating more data won't help at one point, but thankfully a length of ex. 20 words has enough combinations to learn. Same during translation discovery. Stoping pain/ repitition can give reward, new data is pleasure. We remember better if told something that has familar parts ex. "7483573868" is hard to remember but "abcd...z1234...9...thank you" only requires linking 3 nodes, the others easily stay activated and will on their own for a bit, it links them for only a short time but will longer if stimulated harder and longer, but when you have 20 unlinked nodes then the energy/attention is more places and nothing but the end is remebered because less louder audio gets less attention, it's why you can remeber a short code in a long input "a....z......j", if asked to repeat the code with the same pause lengths it's harder (to do it you need to "think about silent audio strongly"). We can learn learn very long sentences if train hard, we store ex. last 17 letters and can prune backwards or grow longer so can end up with 5/50 letter memories based on how many times seen in some time. MIX 0 - BRUTE FORCE: If the laws of our physics were random, there'd be no patterns/laws. AI its all about patterns, which is matching/ or matches a bit, so its all about merging, AI is simple. A perfect physics simulator that knows every particle in our galaxy can predict the future in the middle for a while and you can say oh look the frog does decide to hop 6 times in a row but it's too costly and impossible, a brain just approximates it with much less data/ compute (at the cost of accuracy) by capturing various sensor snapshots of some of the data and using a few simple IF-THEN syntax-based rules that tightly help each other (synergy) which can learn/build the rest of the rule patterns / algorithms. We can simulate other physics universes in computers but only a brain can understand them all and learn new rules unlike physics sims/Brute Force Search (learning more data/rules makes it get more data/rules) which'll work together with other rules to build bigger rules/patterns made of smaller elementary ones (that're less/more general and more costly), BFS/ a brain/ physics sim all require data/particles to recognize and tranformative rules/patterns to create/predict the future. Updating the network also works by matches. Simple AI rules like counting FREQUENCY captures any pattern/law even other physics our universe doesn't have as long as not 100% random, sequential syntax memory (IF-THEN rule) is the simplest rule/ pattern/ program and is how sim/BFS work and build all else for our pattern matching needs ex. semantics/K-Means etc. Even Numenta's HTM "starts with the assumption that everything the neocortex does is based on memory and recall of sequences of patterns". We want to only install in AGI the most basic patterns/keys of the universe so it can learn by itself / get some accuracy for all the trillions of rare patterns which are the result of the elementary patterns! Just like physics, the rest of AGI emerges from the few keys. Putting just the bigest to next bigest rocks in a bucket is less work instead of all/ saves space on circuit board - we code only a few patterns, then shaking them lets them do most the work to fit more in the same space / update its own structure - we let the brain come up with the rest of the rules and run them. The reflections rule to see a cat face that is hidden only works on light, a subset of physics, but is more flexible. Brute Force Search is the least data hungry and simplest rule algorithm, most general purpose, accurate, but as costly as a perfect physics sim in time/memory - the opposite of BFS, you only have a criteria and you try all possible combinaations in the world - that is your data, you don't store it, computed BFS can recreate all possible structures and plays, any person/thing you lost and perfect clones, always finds the answer. You'd use this BFS prediction (tried X in real world) to mutate the future to your desire, you either use Brute Force or a brain-efficient way to generate/verify interventions. Useful when have few possibilities. We use perfect physics sims sparingly as well, merging our predictions (most are brain-made). Can merge in a more-used (at the cost of being less-accurate) physics sim ex. UE4 is best for gravity, object mass, light, solid/ rubber/ liquid/ gas, relative/etc mass, motor torque (angular/rotation force), etc, mass + binding forces + motion is a way to simulate hard/soft physics, Blender/UPBGE (editor) is similar but EASY to use (good for rapid development) to make realistic physics experiments/ AI creatures/ movies/ visualizations/ games/ etc. We can simulate bubbles without atoms, same for brain, you need to define high level rules, randomly implementing a hardware chip work shed light on neural mechanisms from low level physics, nor will a simulated brain be anything but desired brain functions. A physics sim alone won't tell you dogs may meow or if a person will start walking if saw them jog>walk>jog>?. We emerge discoveries by trying top and then less likely possiblities ex. dog barks, dog yelps. Searching takes time. Naturally temporarily avoids paths already thought. Starting from the smallest program/ structure is best but by the time you try ex. a longer password there is much more combinations to try. We want the best mix between BFS and a perfect physics sim: not too much memory nor compute needed. To best react to incoming change (death) from meteors etc, we look for an agent reaction MATCH to its make-do physics rules it uses since true rules are usually too expensive as said. From the rules it does use, it can learn other many rarer rules, and then use those to learn even rarer patterns. Sorta like how physics only has a few types of particles/ rules which build all higher variations ex. atoms molecules cells animals planets galaxies. This is how it simulates Brute Force Search / full perfect particle simulation. Mixing multiple DNAs (mom dad) is faster then pure random mutation, see we need merging!, before brains reflexes were early form of learnt/ stored/ remembered patterns for survival (a real-time memory, the animals themselves/ reflexes were the context weight. Blockchain with a social system that fixes errors.), repair, clone, flee, fight, lie that you're special and will go to hell if touch them, no lengthier short term or long term memory yet in evolution.....>>> Evolution is all about survival, hence understanding its world is key. Both DNA and a brain model patterns and are both very small, brains mutate thoughts to evolve and pass down the survival genome patterns by teaching their clone kids - it's much faster than natural evolution, and will be faster when non-biological brains mutate data the artificial way. Both re-generate data and specialize cells/hobby. Human brains are very much like the Mixes I explain. Backpropagation tries to discover Merge Rules ex. syntax/ XOR/ semantics, but like mine it can only do this better by using installed/learnt rules. Even Hinton said Backprop is unnatural, Google/ everyone uses it, humans if don't know the answer to an algorithm sequrnce ex. 2 13 88 ? We brainstorm trying related/likely possible answers until can predict confidently on the real question ex. "let me think, 2*6+1=13, so 13*6+1=88? No, let me try again until find the pattern" (or we learn to by hand or in brain (or code it into a brain) plot visual dots and recognize a line/curve/star ex. 2 rooms=200$, 3 rooms=400%, 4 rooms=?), letting us find the algorithm that generates 1, 4, 9, 16 and therefore others that follow too (you may already know the algorithm but must realize it is used here because you know x^y=z but here is used 1/2/3/4/^2=z), because really they are all matches because of some hidden context, backprop can't intelligently merge things / make new rules (Mixes) but mine can, it can't build onto the hierarchy/ Train Online it must *begin deeper and wider, mine can, it must be told to use hierarchy and to make the middle layers narrow to force merges!, etc, LSTM and residuals are just for Backprop. Network+data but backprop isn't needed and is very complex and only explained in complex math and still doesn't tell you how patterns are found! They can't unify architectures (CNN, LSTM, Transformers) into an understanding of AGI. The field does sound like it uses almost every Mix below; updates weights, dropout, (contextually) related embeds, self-attention, pooling, motor learning, reward, but not reward update/ end/ and rarely explain it. Humans don't build top down. We usually merge most probable things. If I like/predict AI, and cars, maybe AI+car = something that is more probably going to be something worth-while to try. That's why we merge DNA with humans (parents, family tree), instead of just pure mutations. I personally think we should interbreed humans with birds because, we really need wings. Generating AGI by tweaking architectures is also OLD WAY, it just merges/mutates DNAs/memories in a simple way, it only works good if brain-like and is told Pattern rules like Backprop is, but then that's a brain. A brain too has a stable/not mutation rate and human life span test length, when you merge 2 DNA memories ex. car+wheels=trailor it may not be forgotten for a long, or short time, if we let nodes live longer it may be able to prove itself, we collect lots of bags of ideas for later use. And there's a creative/stable rate, we may only be allowed to usually merge ultra predicted nodes ex. car+car car+van i.e. less random, or our short term memory may be very short or retain too long making us less good predictors. Brains/DNA merge memories/DNA whcih are predicted ex. achieves reward in its life and are frequent/ populate land more but not if too frequent (ignores sex just like ignores too common features ex. "the"). Complexity is built by small simple rules. Really everything is simple, there's just a few laws in physics. We make life confusing. But only after we learn the true simple ways first do we make life complex by building larger patterns. Backprop is really under the following law: All MIXES/patterns are rooted by / stem off of exact matches ex similar pixel brightness. If X is recognized it must connect to something, If you look for patterns in text/vision you'll see the same letter or word, words that share the same context, similar line/ cat/etc, if you have the same image but brighter it doesn't take much to store it now, even Brute Force has match criteria. No such thing as unlabbled/labeled or unsupervised/supervised learning, don't clean/normalize data; make all images same brightness/ no CAPS or rare words/ etc, AGI must be able to handle raw/unannonated data if no one is there to find patterns in it and know if an image is brighter to remove score, any data that has patterns (dataset 'abcdefg' has no re-occurence) is compressible and is only used for prediction, K-means/semantics is even a prediction of translation. Only clean/annonate (tell it X is true or kinda ok or feed it a true dataset) if can't make AGI/ASI and you can help it out, if you can make AGI then we only feed kids small bites slowly (to build the hierarchy) and use Reward Update and share true/scientific dataset, not Truth Values or Cleaning. Backprop must recognize unseen inputs, mapping inputs/outputs doesn't solve that. 8 queens problem: I can find the min-conflict algorithm, it solves the 8 queens problem in under 50 steps. LSTMs are Markov Chains/ Hidden Markov Models the only difference is they use complex unnatural backprop so to solve non-linear questions. But mine can discover the patterns too by intelligently brainstorming/ mining likely ideas around the domain for hours when it recognizes error so can return to the question in its brain and answer more confidently. And you wouldn't hardcode each problem either. Non-linear problems aren't like "he got 15$ for working 1 hour, someone got 60$ - how long did they work?". They're like x^2 or 2^x (try putting 1, 2, 3, or 4 where the x is), this gives you an ex. exponential curve. But really all problems are non-linear. 15$>1 hour, 60$>? You have to do tricks to find the pattern that gives you all answers fast. 60/15=answer. Same for the x^2. Input is ex. 1 output is 1, input is 2 output is 4, 3: 9, if input is 5 answer is ?, it's 25 because 5 * itself. If 5^3 we get 125 because 5*itself*original_input(5). My simple AI will naturally find the pattern (matches) i.e. 1=4=9=16=? because that's all you know and because the same algorithm generates all observations and unseen ones, it discovers the rule with only a few of its outputs. Not only will mine explain how it discovered the function/algorithm behind the observed data but also why it generates answers and unseen answers. Text/image has non-linear problems "I was walking down the ?" or predict the rest of an image with only a tail and bowl shown. What if a cat was observed to be 1 inch high 1 time, 3 inches high 5 times, and 8 inches high 873 times, is this an exponential recognition task that requires me (so to recognize A or predict A>? (B) "prediction") to find the pattern that governs cat height growth and therefore allow we to determine that cats will more likely be super high and rarely short? What about luminosity? Maybe grapes are rarely fully transparent in a certain lighting? Or some man made engine or a natural hurricane? In text, words/machines descibed are sometimes perfectly matched but ex. in vision (and text) you need to accept brighter features just less activated then, you need to predict the thing it has/predicts may be brighter, you need to find out if while it may seem 66 brightness causes 68, 55, 62, the average actual is around 61, you need to find out if the prediction is exponential ex. not 61 but 66>900, 33>20, these all aren't exact matches that you knew. BODY REPAIR REFLEX repairs wounds etc all on its own by blood clots, etc. BEING STARTLED REFLEX by certain noises/moves/sensation-of-falling but lesser after 6 weeks since is making own decisons. He’ll tighten his body, fling his arms up and out and open up his usually tightly clenched fists, draw up his knees and then bring his arms and re-clenched fists close to his body — almost as if he’s giving himself a hug. Seconds later, as abruptly as the startle started, it’s over. Reason: Baby’s first attempt to protect himself from harm. ROOT REFLEX Trigger: A gentle stroke on the newborn’s cheek. Response: Baby turns toward the touch, with mouth open. How long does the root reflex last? Appears at birth and lasts until baby is 3 to 4 months old (sometimes, babies continue doing this in their sleep past 4 months old). Reason: Helps baby find food. SUCK REFLEX Trigger: Something, such as a nipple (breast or bottle) or parent’s finger, touching roof of baby’s mouth. Response: Baby sucks on nipple. How long does the suck reflex last? Appears at birth and lasts until baby is 2 to 4 months old. Reason: Helps baby eat. BABINSKI’S REFLEX Trigger: A gentle stroke on the sole of the foot (from heel to toe). Response: Foot turns in and toes flare up. How long foes the Babinski's reflex last? Appears at birth and lasts until baby is 6 to 24 months old. Reason: Perhaps an attempt to protect against falling. WALKING (OR STEPPING) REFLEX Trigger: Holding baby upright with his feet on a flat surface. Response: Baby lifts one foot, then the other, as if walking. How long does the steipping reflex last? Appears at birth and lasts until baby is 2 months old. Reason: May prepare baby developmentally for walking several months from now. TONIC NECK REFLEX Trigger: Lying on his back with head turned to one side. Response: The arm on that side extends, while the opposite arm bends at the elbow (a “fencing” position). How long does the tonic reflex last? Appears some time between birth and when baby is 2 months old and lasts until baby is 4 to 6 months old. Reason: May prepare baby developmentally for voluntary reaching later. GRASP (OR PALMAR GRASP) REFLEX Trigger: Pressing a finger or other object, such as a rattle, into baby’s palm. Response: Baby makes a fist and tries to grab finger or object. How long does the grasp reflex last? Appears at birth and lasts until baby is 3 to 6 months old. Reason: May prepare baby developmentally for voluntary grasping later. Fun fact: Baby’s grip can be strong enough to support his entire body weight. RETRACT limb/body opposite way if negative reward since takes too long to reach brain to make decision or hasn't learnt much actions, maybe randomly tantrum / stay still if pain is everywhere/ persists. Salivatates if smell/taste food reflex. Breath/ blink/ swallow/ gag/ cough/ pupil dialation reflex, we usually control these motors. Small prey animals evolved to survive by using flee/freeze response, toxins, mimicry, camoflague. MIX 1 - FREQUENCY (NEOCORTEX): A brain begins with no memories, just alphabet nodes. Letters, words, and phrases re-occur in text. AI finds such patterns in data and merges them. We don't store the same letter or phrase twice, we just update/ strengthen connection weights to represent frequencies (see image, disconnected nodes like a, b, c, we do better if store also merged letters ex. frequency of the, the cat, my, and better if store them in each other ex. [[thank][you]]=546, 354, 26 times seen, a brain uses smaller parts to make bigger memories ("merge")). As we add memories, we will just increase storage space used of the amount we gave it as needed and only connect what should. A brain doesn't depend on all parts/storage, forgotten memories, and dead neurons. If our algorithm has only seen "Dogs eat. Cats eat. Cats sleep. My Dogs Bark." in the past, and is prompted with the input "My Dogs" and we pay Attention to just 'Dogs' and require an exact memory match, the possible predicted futures and their probabilities (frequencies) softmax normalized are 'eat' 0.5% and 'Bark' 0.5% (50% 50%). If we consider 'My Dogs', we have fewer memories and predict 'Bark' 1% 'eat' 0%. Babies only know the things to say that they heard so far hence parrot and will believe/ imitate/ try using motors anything you stored in them - are most morphable/explorative, and elder can be too picky over frequency/ translation/ etc, they don't Explore much anymore, decisions feel unconscious/fast/routine, they Exploit, searching for hours feels conscious / isn't your inborn AI, kids think they are "right" only because they are dumb and don't know what ex. being open minded is - they ARE morphable. We keep all possible next letters/words/sentneces as plausible ex. we never seen 'bun zoo' but have it as 0.0000001% likely, after all it could occur next, this is most apparent when you have seen little data and don't know what comes next much - all predictions were once equal ex. dog ran/ bun/ wind > 0.33% 0.33% 0.33%. If we see 'we' and have predictions were, ate, dad, star: 0.5%, 0.48%, 0.01%, 0.01%, it'll have those predictions and if we were to say only 1 prediction aloud dad/star would only be said 1 out of 100 times (so it Explores, rarely we explore maximumly later in life but we should still collect from all domains evenly just less often, a brain even some team members explore wrong/ painful paths because we aren't perfect path pickers and need to sometimes go with poop to find better gold - all better things begin small - it's sometimes those odd explorers or times we say wrong predictions that are the right paths, and purposely evenly sampling is better than random-random ex. man forgets which already did, we do love new data (to have a diverse/ even sampling) but we mostly Exploit; we usually predict/collect things things we love/ relate/ etc) (it should only predict/choosesOneFrom things it seen/ invented, not start with all vocab at equal probabilities, and even seeing something once isn't enough - the brain does forget stuff, 'one time' should get weaker as time goes on ex. 0.1% likely, 0.01% likely, 0.0001%, then forget it completely.). For certain context the algorithm may decide to not do FREQUENCY in the mix if there is not enough counts or too much diversity/misposition ex. aa, ab, ac, order doesn't matter here, so they will be forgotten fast, but it may see it does for some of it despite gaps ex. abcd, zbwd, gbsd, and abcd, bwqd, ijbd, it'd learn here a node that says b>d but more open to delays. Closer seen the 2, the stronger they connect. This is why we don't look at features farther back to predict the next ex. '[bh]skduedktyi[s]', and if given conditions (order doesn't matter) with some missing data we can first fill in the probable data and THEN the cost prediction ex. "2 rooms, 1 bathroom, __, big TV = cost of __", order matter only for cost i.e. conditions>cost and is why we can get missing data only before determine cost - we can't use the cost to help us predict the missing condition. We can be told probabilities of features ex. dogs always breath (useful cuz rarely see it in text unlike vision, while a book on snakes suggest snakes are more common), the word "therefore"/ "sometimes"/ etc can be written right into text to say B entails A. All the brain is active but some areas much more than others (sparse), do note energy passes up/down layers. If forgot/lost how to build seat it can regeenrate it by thinking about building legs, a neural cluster may store the same thing because cells die daily. Beware of logic AI, it's same; If guests that ate X all got sick, its weighted heavy G>X=S. If only you got sick, there's just less weight. If each guest ate something different but all got sick, then G>went to resturant>S is weighted heavy. Huffman Coding is also same idea. MIX 2 - MORE DATA: Data is called "intelligence". This Mix is like adding tons of new Mixes for free, this is more true the more advanced an AI is ex. it hears about a Mix and does it by hand/brain instead of hardcoding it. More data/ experiences from us collecting data for it (or it randomly doing the data collection by randomly moving its body) makes this AI generate more accurate predictions/ decisions of what word/building block usually follows. Like DNA, we boost/program kids by passing down lots of already learnt data, knowledge, chat with it, show it movies, name things, categories, position their motors or let them imitate us to store senses and link sense to motor, dogs train well if fed positive/negative rewards/ relatable things, you need to repeat it many times BTW. They improve at imitating your voice/body. Can use body detected by cam/ haptiks glove + VR helmet to move/feel the AI's real/VR body. You can run the human intelligence algorithm on your computer, it only takes longer to learn/predict and store based on the more data it eats/generates! Cuz it must associate it to many or use many to predict, and for many lessons/outputs. You could train/predict on 100 letters! Code is ultra small, the data/scale is what makes a brain big/smart (though not the only MIx, simply it's essential) and human brains are born with extra head room we don't need at first and so that any node can link to any other nearby node - more than monkeys, its why our births hurt more. You can easily see/predict the curve of this AI's accuracy as you feed it more text, we need not train it fully to know it's AGI, it will run on my computer. ASI is the easiest to invent, ANI took us way longer. ASI can easily run on a household computer with little data/ memory/ compute, Brute Force needs the most. Now, longer strings are exponentially rarer and so adding more data has a limit of usefulness, so do all mixes listed, you feel repetitiveness and stop collecting a certain domain or layer once model it enough ex. 99.8%, we check real life or ? to know if done. Less informational mixes below and shorter windows in the windows mix are done if the confidenceness is still not high enough compared to other memories. You do need lots more data, you can only practically get it now by instead adding more Mixes ex. ignore domains of domains. A large dataset already has much info hidden inside it. You can extract more data/ information without showing it new examples by making the algorithm "smarter". All Mixes give you more data and are less useful at some lookahead ex. in BPE start at word level for letter pair discovery. Ex. our thought takes much longer if search too much, collect much more data, or look for many less common patterns, but you can call MORE DATA Education and the other Mixes Intelligence because only MORE DATA is the limited external data you are given, even though Mixes make the network bigger/complexer. Note: New (diverse/ captures the distribution) data and more parallel data colletors/generators give you more data. Dogs love new toys (new data). We like new; sample all the fields equally, if you see Z too much then Z occurs as often as E; bad. When something unexpected occurs, you talk about/ collect data there so you can explain it, the unpredictable becomes is a bit favored to talk about/ predict. OpenAI's hand sim trained in various combinations ex. blue lighting heavy hand, heavy hand triangle object, triangle object, frictiony, missing fingers, big cube, etc. MIX 3 - SPEED/MEMORY: If we can design a computer to run at light speed / give it more storage or make our AI code faster / storage efficient or utilize parallel GPUs / storage or use brain-like Neuromorphic chip (Accelorators, they run software faster because aren't so general purpose computer hardware, my AI would work well if made into a real chip, my AI is parallel. Maybe a brain can upload domains it's working in to a faster RAM-like group of axons that're fast but fewer in number) / storage and that are both memory+processor combined (my AI is such), we can practically feed the AI more data and generate more discoveries and do more brute force. Humans read/generate text/future word by word level usually, not parallelly, the brain wants order of activation and many parts of the text would use the same nodes, it can't parallelly recognize/ store freq/trans I think. C++ uses less RAM and is fastest but takes longer to code in, use it when done prototyping for smarter AIs. MiIX 4 - MORE VIEWS: Storing letters/ words/ phrases only once is good but we can store others using them ex. the makes up the cat (Trie tree), this causes energy to travel up branches based on stimuli accesses and merge layer predictions which often share some of the same predictions, recognizes parts, and output-predictions, it naturally does it by energy flow. We don't modify memories, we only build new / forget known. From the child match the same amount of energy is sent to each parent neuron based on width of the connections before reaching them, energy flow to the unsaid part and down out to say it. A longer match considers more information but has very little experience, while a short match has most experience but little context. A sumed mix predicts better, we look in memory at what follows 'Dogs' and 'My Dogs' and blend the 2 sets of predictions to get ex. 'eat' 0.4% and 'Bark' 0.6%. We can mix all but the shorter windows if have enough observations from longer ones i.e. is confident. We can stop learning the frequency of ex. 'hi' if reach a set count, this is also the target when to stop mixing windows. Mix TRANSLATE helps us stop mixing sooner. There's no such thing as Overfitting, you do find the pattern for a given short match, but given extra data you can do better if use long matches more if has more stats. Same for if have unconnected conditions ex. age=15, race=asian, rich=no: predicts 'not productive', because if you have only a few you still know what probably follows. Can Variance identify recycles so don't need so many observations to look at more context? MIX 5 - UNIQUE: Shorter VIEWS can be ignored if have enough observations. You can pay more attention to a longer context view if have enough frequency capturing the probabilities. One way to tell is by how many candidates and observations you have, if predicts a b c and saw them each ~500 times, most likely there's no d or z, you know you have enough stats sooner. MIX 6 - TIME DELAY: "My cat big ate" matches the memory "my big cat ate food" just not as much. We also predict like this top-down the hierarchy, if "my big cat ate" matches "my big cat ate our food" we get some probability that 'food' comes next even though we only have seen 'our food' come next so long as we already have evidence 'ate food' is grammatical enough. Can generate similar versions of a memory. Time Delay moves forward only: we can't say the alphabet backwards because the way it saved was forward but can do tricks to. We sometimes get the future for free and can do double side fill in if we know "I was shot. My leg hurt so bad. I was rushed to the ER" and see "I was shot. I was rushed to the ER." by adding to the end the middle. So long as follows. For Vision, if a human sees just 1 example (unlike complex/ modern AI, which still fails over 1 pixel attacks, and relies on Training and Data Augmentation (we use DA but less/ probably none for distortions)) of something never seen before then sees dummy images where one is it but upside down, flipped, brightened, stretched, parts of it inverted brightness, rotated, scribbled on, blurred, remove color, etc you recognize which has it with high accuracy (I figured out how to do it, it's a small part of AGI) (It's how we recognize a variably!: stretched+louder+squickier song, location, brightness, color solve all distortions) because all individual pixels are "the same" and enough of car is there to eliminate other candidates! A brighter version is much different but because each pixel has the same/similar amount of error in expected brightness, it's actually much same, if each pixel were unrelatively brighter then it may turn into a frog even though the global sum is less bright! These distorted objects are really recongizable because there is little difference from the original object really, there's a pattern in the errors. So as long as each is as bright. Hierarchy stores the exact image pixel by pixel, you can see a 1 pixel dot, it doesn't shrink images for storage. All 1st layer nodes are lighten up same and each pixel is expected distance to each other, same for a sentence of words, the only difference is a slight time delay and brightness offset (between input pixel and 1st layer pixel for brightness or 2 groups of pixels for location relativeness, or 2 groups of pixels for brightness ex. stored image has pixelBrightness4+8 and see 2+4 so it is darker but higher level node gets no sanction as difference is same, same for location ex. 2 pixels are rotated 90 degress but so are the 3rd and 4th so next layer or the others in this layer get no sanction) to their neighbor they build with, this happens for each layer so it'll be efficient. We do store/match an upside down cat/backwards song or where it is in the image differently else would'nt know if while I stood still it rotated or I did while it stood still or if it moved over, location is not a issue though during recognition because the pixels just feed up hierarchy only looking at what is besie a pixel. My algorithm for relative brightness is the same for relative location, and location solves rotation because ex. top row the 1st pixels has error of pixel to the right is below it instead, but so does the next pixel expected its buddy to be to the right, each row/ column/ diagonal/ every other etc does this. The brain likely can't nor should be forming then storing then accessing all possible variations, nor merges (de-blurr) image / make it mediumly lighten because we can recognize a bright cat isn't a medium lighten cat, etc. Same for stretched features ex. stretch a man wide or make him giant, if saw a giant man we recognize a tiny one by the outline mostly, same for saw tiny then big, the same nodes are hit despite extra or lacking, the high level node is activated more than others. Similar to high contrast, veiwing angle of 3D object, etc. For text it works same, only difference is text isn't 2D it's 1D location and has no color or brightness, you can stretch flip etc text ex. zezvzozl=love, it's there, just most us don't write like that. For text 'lzzzlzzzezzzh' activates node 'hello' ex. 20% because a letter is missing (and if had brightness a pixel may be off a bit in brightness) and has long delays for expected triggers of its parts, but text too like images/ music/ etc gets upset most only 1 time for the the 'zzz' and more less upset each time it repeats/ (merge) (has similar error to others). "lzlzzzzzezzh" if other letters replaced z this one doesn't say hello however as its just a scramble of letters. Time delay needs error pattern relief because you it can't see hello with 10 z spaces if varies despite being still in correct sequence h>e>... ex. 'hzzzezlzzlzo'. The way delay works to utilize 4,2 if it wants 2,4 is 4 is seen first which both partially activates 2 and partially 4 cuz delay, same for 2, while 2,4 would do great on the first posssible order! Can recognize similar images ex. cat and human faces look similar, music too, or recognize a song or specific face. We do predict 'X' recognized more if saw X lots recently in past images/ same image (ex. cat here, maybe cat next to it, or this is part of a cat, maybe the same color next to this pixel is "cat" too even though it looks less recognizable) or love X or see a cat (translates to X (dog)) or X is more commom or X (armrest) comes next if see couch seats (ex. you don't recognize the armrest much but this helps you out), but solo we are left to recognize it and have none of that nor have seen other images in its life yet! Knowing the small ball is bigger than the pan that takes up more pixels is only because recognized it links to "big ball", "small pan", and when you see both in mind you can recognize "left is bigger than right" just like how you can recognize the pan on the screen takes up more pixels than the small but actually big ball because "right is bigger than left "on screen" (external input). I coded only brightness in 1 day, 20 lines of simple code, no Training no Data Augmentation, my 2nd algorithm I've ever made (this time it's not a reimplementation of Partial Prediction Match (see Hutter Prize) but fully my own idea which no one mentions anywhere), no hierarchy (so no location ignorance i.e. top right and bottom left both hit node 10 if are both 10 shades bright) - similar location error pattern is different!, robust to varyingly brighter blurry noisy patches on original image and a bit to rotated/ off located/ scaled. The trick is 2,4==4,8 because brightness/ location/ etc delay accepts but with sanction but we can see a pattern in error i.e. 4,8 is twice brighter at right side which is very similar to the 1st image, only little error due to different root pixles. Location does not do 2,4=12,14 for stretching features, only 2,4=4,8, brightness might do it or both, it seems brighter may get brighter faster than darker if light sources brighten? I discovered there is many rarer error patterns ex. of our pattern of similar error i.e. a pixel is brighter but so are all others 2,2=4,4 we may get a pattern of error ex. such is not in 2,2,=4,3 but is clear in 2,2,2,2,2=4,5,6,7,8 - if only 5 was 4 but doesn't get so upset over 6 because it is 1 off too, an image of a lightbulb is such example it gradually gets darker farther away, a brighter room doesn't make ALL the image bright by 3x! Same for noise a dark spot may occur every 5 pixels. You can recognize a letter when drawn anyone on your skin no matter its size or time taken to draw it, touch is an "image of pixels", and has pitch/brightness/location too (pressure, temperature is color/pitch). Can say same word in any voice/ recognize any voice no matter the word heard, we have their voice/music memory separately stored as a color node that links to line nodes, linked actions aren't like a parrot's/speaker to say anything but we can beatbox however. Can recognize where sound came from by if matches memory ex. kitchen echoes, farther is usually quiter and dogs for example don't make quite barks, more ears help determine direction and distance. Ear sensors detect volume, time-dependant frequency wavelength vibrations to "count" how many times air waves hits them per ex. 1/4th second for pitch ex. can hear quiet/loud beep or bass. First Results: My code recognizes with high accuracy an image/ music that it has seen in the past only 1 time no matter if it or parts of it are very brighter while very blurry while very noisy while has lots of occlusion, it's also a bit robust to location/ rotation/ scaling/ viewpoint/ class change (but not yet inverted brightness) even though I didn't yet code Location or Color or give it a hierarchical network or 2D memory. Code: import math original = [10, 1, 10, 1, 10, 1, 10, 1, 10] distorted = [39, 32, 39, 32, 39, 32, 39, 35, 39] difference = 0 for count in range(int(len(original))): error = original[int(count)] - distorted[int(count)] difference = difference + (math.fabs(error) / (50 * len(original))) for count2 in range(int((len(original) - 1) - count)): difference = difference + math.fabs(error - (original[int((count2 + 1 + count))] - distorted[int((count2 + 1 + count))])) print(difference) Method: Step 1) It was necessary each pixel in the Original image be compared to its corresponding pixel in the Distorted image to tally up the differences (for example Original=2,6,5 and Distorted=12,14,14 hence the difference in brightnesses is 10+8+9=27). The 27 makes up a smaller part of the score the more pixels there is in the image. Step 2) With the 3 errors 10, 8, 9, we have 3 possible relationships 10∆8, 10∆9, 8∆9, if all 3 were 10 there would be no error after doing each subtraction and totalling them. Explanation: If an image of a cat is evenly brightened, it is identical to the darker version but we still include a small error to show it is brighter, a human brain will notice. An image of a bed can total to the same brightness as the cat globally but is very different looking. An image of a frog can become a castle by some rearranging of the pixels, yet a very stretched frog is same! In step 1 we use delay in expected brightness, a simple way of accepting some error by seeing they are similar to exact matches. Step 2 is using that to see a pattern of similar error ex. the original image/ node is 2,6 and the distorted is 13,15, so 13 is 11 brightness off, it's already kinda upset, but relatively so is 6 about 11 off, so we relieve some of the error, the child nodes are activated and the node built of both pixels also get activated because the node stores their distance in time seen but doesn't need to store difference in brightness, storing leaf node brightnesses/ locations is enough to calculate relative such ones. Sure recognition prediction is also indirectly based on what you seen recently in a video or the same image, maybe you saw cats in every last frame or part of them image - recently; in your receptive field, so the next image/area is likely a cat too, or that usually seeing a bowl means that somewhat unrecognizable thing to the left really is a cat, or that cats are less common than cars, or audio of a creak predicts a door is there, or you just love cats so you expect what you want to come. But we can do direct recognition solo without any of this, Image Distortions have error that only slightly has to affect our recognition accuracy, and typically distortions have a pattern of error such as most of the object being similarly brightened/ rotated/ scaled/ blurry/ etc. You only need cat ears to know it is likely a cat, but overall being able to tell how similar the original and distorted cat is is key. Second Results: My code now is also robust to location/ flipping/ scaling/ rotation. For now it only handles 7 pixels, there appears to be 3 needs: 1) A need to know where pixels are (which is costly for now). 2) Their expected distance in a location/ brightness/ color. 3) Handle 2D data. It currently does not do the 3rd requirement, but if it did it might still work ok. I've now laid the groundwork for doing good music recognition though, we should only need to now make it run faster. Note for code simplicity I did not search for stretch, only rearrangement, but it can be easily added. Usually a hierarchy would do this instead, and we may need to look for a small part at first and predict what it expects instead of trying all possible locations. We can rid most pixels by using a sobel like filter to only store change. We'd store only a line, thicker if blurier, half showing which side is darker and by how much. We'd then take 2 pixels of it ex. 12/ 1_2/ 21/ etc and see if it matches 12/ etc of the other image. Throwing dots on a unseen lines globally is good then trying random at first variations, then check relations, but misses local changes, we should start with ex. 5 dots, then another 5 dots, then 5 of those. Method: I try all possible configurations of the input ex. "7,6,5,4,3,2,1", for example perhaps the 7 is the 1st item and is just 6 off. For each configuration, I then send the new permutated distorted image into the original code like "7,6,5,4,3,2,1" if the configuration thinks ex. the 1st item is in the right position. This no doubt results in brightness errors. When it comes to the location part of the code, it is nearly the same algorithm, there would be no error here, because the input is a 2nd permutated configuration that only is 1 to n numbers only to show their order ex "1,2...99", and in this case we have 1-7, so no error. To do relative error I take each 2 inputs ex. our input is "7,6,5,4,3,2,1" and so I take 7 and 6 and subtract to get 1, I do the same for the original image and get 6 - 7 = -1, then I divide 1/-1=-1, so if we do this for the other 2 inputs ex. "[7],6,[5],4,3,2,1" and "1,2,3,4,[5],6,[7]" we end up with (7 - 5 = 2 ) / (5 - 7 = -2) = -1, same as the other result '-1', so if we minus each result each to each we get no left over error other than a small prior amount to show it is flipped. Code: import math from itertools import permutations original = [1,2,3,4,5,6,7] positions = [] getpos = 0 bestDifference = 999999 for subset in permutations([1,2,3,4,5,6,7]): positions.append(subset) for distorted in permutations([37,36,35,34,33,32,31]): difference = 0 for count in range(int(len(original))): error = original[int(count)] - distorted[int(count)] difference = difference + math.fabs(error) / (50 * len(original)) for count2 in range(int((len(original) - 1) - count)): difference = difference + math.fabs(error - (original[int((count2 + 1 + count))] - distorted[int((count2 + 1 + count))])) config = positions[getpos] getpos = getpos + 1 locerrors = [] for x in range(int(len(config) - 1)): for y in range(int(len(config) - 1 - x)): error2 = ((y+1+x) - (x)) / ((config.index(y+2+x)) - (config.index(x+1))) difference = difference + math.fabs(error2) locerrors.append(error2) for count3 in range(len(locerrors) - 1): for count4 in range(len(locerrors) - 1 - count3): difference = difference + math.fabs(locerrors[count3] - locerrors[count4 + 1 + count3]) if difference < bestDifference: bestDifference = difference print(bestDifference) Related Work: High accuracy on 1-shot recognition is still a hard task for the AI field, the networks they use aren't fully understood why they learn or predict certain things such as CNNs and Siamese Networks, are sometimes complex code, aren't robust like humans, or are accurate but tediously costly in memory/ compute/ code length such as a SIFT-based hierarchical feature extraction algorithm. Old algorithms include HOG, SIFT, SURF, FREAK, KAZE, AKAZE, ORB, BRISK, DAISY, BRIEF, HARRIS, GFTT, MSER, HBPL, Siamese Networks, etc, although CNNs, invented in the 1980s and rid of manual filter creation, are still widely used/ built on and achieve SOTA (85.8%, top 1 accuracy, no extra training data) on the ImageNet benchmark while Vision Transformers although based on an impressive architecture achieve 85.2%. These algorithms including SOTA CNNs all rely on seeing many training examples (not just for recognizing other objects and around the same 3D object, but also for the same viewpoint because they aren't robust to distortions requiring only 1 or 2 examples) and rely on Data Augmentation to take the features it has and then store possible combinations ex a cat rotated 90 degrees, 130 degrees, rotated and brightened a bit, brightened lots and stretched, and same for paws, etc - that's a lot to compute plus store plus access later, the brain likely doesn't naturally do that either. Some also rely on extracting from a face false positive features and put them back in the training set as negative examples so as to know what either looks like a face but isn't or simply what is not face, but it's a bandage over a bandage it's not needed if it is robust enough instead of being told all the things X is and isn't, all you need is the face itself to show what a face is. The filters CNNs learn merge very similar images/ viewpoints to try to be invariant and recognize many dog looks.     HOG extracts invariant features by using 2 sobel like filters (a vertical and horizontal "-1, 0, 1" convolve the image) to give you a colored line-version of the image so if there is no change in brightness or color ex. a flat-lighting sphere made of half red and blue, only around the sphere is darker, we only store a circular outline and a slice in the middle, of which half is red half is blue, that has mostly hard contrasted edges and excludes others if not strong enough. It gives you the direction and magnitude of the gradient which is equivalent to my work in that it knows if the pixel on the right is darker or not (direction) and how much relatively brighter it is (magnitude), this leads one on the path to do successful robustness to lighting/ blur/ noise changes, however HOG and CNNs don't mention storing/ using the original brightnesses which is needed to be AGI and also odd HOG instead stores just the relative lines, this would be much more costly than my work because you'd have to not only compute the relative difference but also store it and get the input's and compare them to the stored! Mine only does the necessary brightness errors, then subtracts errors. HOG also is meant to give you a many small orientation histogram blocks of the image that are then normalized to show how many lines are X degrees (an eye would have mostly horizontal lines), sometimes intensity, but this although scale invariant is not rotation invariant. HOG sometimes then sends these extracted features into a CNN or SVM, but CNNs/ SVMs. "FaceNet is a siamese network architecture developed by Google in 2015. They use a deep CNN to directly optimize the embedding itself, rather than an intermediate bottleneck layer as seen in previous approaches. Google tried different types of architectures for FaceNet, and the most successful type was based on the GoogLeNet style Inception models. One such model contains around 7M parameters, and is trained on up to 260M images. All of the FaceNet models are trained using the triplet loss function." The line filter in HOG looks/activates for a certain relational organization of pixels, you could then let a nose filter run over the line-face. A CNN is a disconnected hierarchy of Filters that after each Convolution Layer which extracts (activates) features will Max Pool it then Convolve it again. It uses Backprop to learn what filters, how many per layer, their size, how many layers, stride, padding. Pooling also has a size and stride parameters, and is used to help invariance and to detect larger features. The pooling makes it robust to location but not robust to stretching/ rotation/ scale. CNNs are/ were popular but unfortunately don't then use their filters for relative location error pattern.     Capsule Networks achieve SOTA on the MNIST dataset (99.84 Accuracy) which uses a CNN, Backpropagation, and Data Augmentation, it stores types of similar features like eyes clustered in an eye node along with extra data describing their location, rotation, scale, brightness, etc, and finds the average configuration. So if all the parts are rotated wrongly then it's still an ex. dog, they're more robust. But it's storing too much data, brightness and location roots are enough, it's still not robust like humans    Log Polar Mapping is said to be used by the brain, it's invariant to scale/ rotation, it makes the image into rings from center-out and places the rings side by side. This is in essence finding the patterns if the image is distorted by telling it where to expect the other pixels. It requires finding the matching location of the laid out rings - a larger object will be lines moved to the right. And requires you put your eye or minds eye on the object's center (or where saw it ex. at side of view), and if you look at a whole scene then the brain has to scan the image and convert it all. It isn't flip invariant, and still leaves you with the same problem really if the object is stretched only on one axis or ex. the number 7 is a fancy curly 7 instead of the usually straight-lined 7. Dynamic Time Warping is good but "not a silver bullet". Transformers use Positional-wise embeds so can see unseen sentences as long as relative. It shows waves. Future Research: Color would use nearly the same algorithm I wrote. Using a Capsule-like network would make the algorithm memory/ compute efficient. Storing only some of the lines of an image also seems a good idea capturing most the change/ pattern and throwing away the rest (merge the same pattern; same shade is merged), it's why we strongerly store and pay short term attention to the start and end of a paragraph where there is sudden change in volume, and notice change in visual video such as flash/motion, GPT-2 seems to do this if feed it in Allen's Demo "she she ... she she" with shoe at start or end - it predicts shoe the closer to either end. If we have an unseen A that is brighter blurry and only the upper half is inverted brightness, a look at its leg's line from the center of its side shows a thick line, half inverted, and all brighter, we get error for brightness increase, error that ex. pixel 1 is not exactly ex. 8 shades brighter than pixel 2, location error if a black leg pixel is simply moved 1 off to the left - not only were we looking for ex. 10,2 and seen 10,10 but we also know the black pixel moved out of the leg also is near the location i.e. 2,10,10 so yes we do try all these combinations of possibilities, and the lower part of the leg is dark but upper is inverted - the relative error is wrong but so are the rest of the comparisons and while we could solve this with location delay and a trick that sees 5,7 is farther than the original 5,5 but 5,4 is closer, the 3rd method is unneeded now if we do a pattern of pattern errors. Also seems useful for rotation recognition. Another useful and efficient trick is to have a pixel/ feature match to only max ex. 200 nearby features max distance of X far away (Receptive Field / Local Connectivity). Another idea is to store the less contrasted edges if are focused on a small spot or the image - notice on global views we forget fine details if not seen lots or is an old memory. MIX 7 - HIERARCHY: Only leaf nodes store text. Merging a Trie's branches/parts and merging predictions allows seeing 'cJat ate'/ 'tc'/ 'catate' to partially energize 'my cat' because the node 'cat' makes up many higher layer nodes. So if all letters are present and no delays are wrong, it lights up and strengthens its match by 100% cuz if not it gives room for other nodes to be matched more. For node recognition, it's regardless ex. if is only node stored - it's activated less if missing letters. If we build node 'abcd' from 'ab'+'c'+'d' and 'ab' from 'a'+'b' and see cd, 'abcd' is activated 66% not 50% for code simplicity; 2 of 3 nodes triggered; not 3 of 4 letters. Nodes in hierarchy contextually explain each other (dictionary), everything is made of/ similar to circles, curves, etc. Better than Trie tree. A node may have several child connections if its parts aren't reused. Energy naturally travels up hierarchy to build/ recognize/ predict nodes back down out hierarchy then back up to hear itself but only the new letter - we don't hear again all letters heard, energy sits in nodes heard. A brain talking to itself doesn't need to output down though, energy just leaks to predicted word and looping back in would just activate the same node! We also predict like this (if fits and has big weight, we may skip ahead to solution Turn instead of waiting at Wall during crawling), if I know 'we can too' I may not say one or inject one ex. 'we' I predict 'too', if it's really rewardful maybe we don't wait, or 'we' I predict 'really can too' or I know 'filming birds' and predict 'filming, I walked home and seen birds' or 'filming, I heard chirping'. And best if search for all possible combinations: the[n], th[e]n, t[h]en, [t]hen, th[en], t[h]e[n], [t]he[n], t[he]n, [t]h[e]n, [th]en, t[hen], [t]h[en], [th]e[n], [the]n, [then], then mix them all. They call this Random Forests / Dropout / Ensemble of brains merged to make a better prediction but this can be 1 brain. And do this for all related matches too, typo nodes, rearranged phrase or word nodes, etc ex. only translate one word and search for the sentence. Let's imagine we did this on the word level too......?inhibit higher nodes by saying 'i never seen 'z' come next, so ignore it a lot more' (a type of anti-Frequency)...........if hear "walking down the" it matches only a bit i.e. not fully to "we were walking down the CANDIDAES". When we try to match these random forest views ex. 'a[h]s[dhs]du' it can at best only activate it 4 out of 5 letters, and if we only match '[h]sdh[s]' then it is only 40% activated, so its predictions are chopped more than half off if 40%, the combined inputs to the node output up the hierarchy a shrunken activity. Avoid Hierarchical Reinforced Learning, the brain doesn't/ doesn't need to make node 'cook' out of 'get items'+'fry food', humans can translate ex. short phrase into long phrase or use code worded nodes/parts to guide next word prediction, de/summarize a sentence into just 'bake food', etc. There's also no top-down/ bottom-up reasoning, it describes the brain if anything. If we see enough similar offsets ex. (the end), (this is 1 thing), (yes) we learn length/order doesn't matter and can use predictions earlier or later (when have a sound placement) if are predicting something else more important at the moment. MIX 8 - TRANSLATE / QUANTIZE: Hutter Prize/ Seq2Vec and my tests with my code/ GPT-2 mention this too. This let's you better recognize longer (and many) prompts cuz have similar words. Energy naturally flows to do this. Hierarchy allows node cat to find/link the node dog by shared opened context paths. You need enough samples of contexts and similars for dog=cat evidence (for what follows cat, if there's only 5 candidate predictions each seen ~60 times obviously there likely won't be a 6th possible prediction, hence you need less samples to learn the distribution, dog too needs enough samples before compare them else gives less confidence), then you still must see because although you may see cats eat and dogs eat more and more there may be 10x more dog barked and cat bite and so most cats and dogs don't do similar things i.e. we must look at how many contexts of all contexts are shared (normalize) (if cats bite 50 times and play 50 times, and dogs play 50 times, they are like 75% related because all dog's is half of cat's, it's key that not only do they share many different contexts but also many times ex. its poor if they share 40 contexts once only or 1 context 100 times!), this decides if the other's shared/not contexts may hold for the other (by however much cat=dog decides how much cat lights up dog and hence its parent nodes). Cat's distribution may only have 5 predictions while dog's has 8, this sets it up both won't share as much as could, however we can pool/ softmax normalize them if close, farther away the less we do so though (exponentially). You can use MORE VIEWS so when you aren't sure 'was walking down the' = x you can include the more surer 'walking down the' = y. So you see cat ate, cat ran, cat ran, cat jumped, cat jumped, cat licked......and dog ate, dog ran, dog ran. Therefore probably the predictions not shared and shared could be shared more as well, so maybe 'dog meowed' is a good prediction. Works for vision etc. Translation works because dog/cat are extremely close/ pool. In 3D/888D space you can imagine each being closer to certain words by having various traits. It lets you match/translate a given prompt to many various memories that are similar worded, then boost your Frequency prediction too. Semantics looks at both sides of a word or phrase (cat/dog>ran tells us they share immediately right-side features most, but farther away either direction helps too prove it but lesser, it is useful though for translation), closer, skip-gram style, similarer, and not rare or common in freq/energy but middle zone items impact it's meaning more, and is based on other mixes ex. FREQUENCY. If eat=dine and cats eat, dogs dine, then dogs=cats a bit, if rats = mice = cats then rats = cats a bit, it gets expensive and less accurate the deeper you go though. MERGE_PURE_NODES > It would be more useful (and intuitive) to do this on the word-level so they still make sense to humans while can be triggered by similar phrases. Can leave out rare/unrelated/etc words, even letters. Can skip storing dog and just store cat, or can episodically store both. Can make a node invariant to accept certain word rearrangement, or make it word-invariant. And during mixing nodes. Predictions are also representations, you can predict "my dog [barks]" and predict yell is similar to bark, so maybe my dog yells. Seeing 'dog ran' as 'object verb' also helps predict the next ex, 'object verb verb', which can translate back to more probable words i.e. 'object verb fast'. We now predict not nodes, but domains, ex. we predict "we dine", but eat may come next too, we get this before we even translate the prompt by tranlating the prediction! Maybe eat fits better with the sentence. If we predict the next letter or bit (0 or 1), we must look ahead so we know both c and d are good predictions ex. my cat played with a cAT/dOG, this'd be done before compressing/decompressing it, may improve prediction instead of predicting word. Blending predictions ex. the next letter/word/phrase is good (unless told to predict next letter only) is good). "Cat ran" = cat is running, but it doesn't warrent we say "cat fast", you can however "say" in text cats "are very similar" to dogs to learn it from just reading, so "cat barked". Storing both word nodes 'cat' and 'dog' and fetching their predictions so cat gets dog's and how similar each is on demand is computatious, but simply updating cat so it always has them included distorts the truth, so we must, saying cat lights up dog (and cat sound) and causes dog itself to be predicted more, meaning cat always leaks/fetches to dog to get dog's predictions and decide how much cat will trigger the dog node / how much dog is similar. Translation helps not only IsPartOf but also if the future is a SimilarTo or TypeOf ex. "The word for Bonjour in English is:" or "Translate Hello:". "King Woman Queen _" "Man Woman King _" Man/Queen is predicted because of FREQ/TRAN, having so many similar words lined up makes it predict a list of the same thing. And because man comes with woman and man comes with king and we see almost 2 of these said we finish the memory. King - Man + Woman = _, Queen is predicted because King =/energizes Queen the only difference is gender and we ignore man and energize new gender which activates Queen lots. When we translate features in prompt to get translatees or use prompt features to boost entail predictions or translate entail predictions or use predictions to boost predictions, don't let 'move' which translates to stick adjust any predictions ex. "can you stick" ex. "move it here, yes the tree has a _, can you _", 'can you stick' is cuz 'move' was said but 'tree has a stick' is bad if 'move' helps it think so. We disambiguate both IS A and HAS A. "I grew up in France, never visit Japan, I speak fluent" - we don't say Japanaese because disambiguation gives us "fluent at France" "suck at Japanese", "speak fluent ?" matches the start of one of those. Same for if see 'password is tg, new password is wq', when see 'it is unbreakable' the "it" refers to latest password, 'it' can refer to a whole sentence BTW. So if we see 'ct bt' it may trigger 'cat' and 'bit' even though no node 'cat bit' exists, these may trigger 'dog' and 'chewed' and may trigger the 2 word node 'dog chewed' to get predictions. FREQUENCY/Entail can give you summarization/translate/etc but Translate gives you the correct prediction faster. Induction isn't as accurate as Fequency/Deduction; often only similar. Seeing 'cat dog' next to each other many times isn't evidence for translation, if they share no predictions then this won't indicate anything! At best 'cat' only triggers 'cat dog' node partially to get predictions. MIX 9 - HETERARCHY: Energy naturally flows to do this, the last mix builds this new net. If we want to predict after "cats" using translation, we must check cats eat, cats run, etc, and for each ex. eat must check if dogs etc do, then once find most similars ex. dog rat etc must see what those are similar to so we can see cat=dog rat etc and so does zebra hence cat=zebra some amount! It's costly. We must store connections ("merge" cat & dog) so cat will trigger dog some amount and hence its predictions as well (see image). That's layer 1, layer 2 uses those relations to say cat=rat etc and zebra do hence cat=zebra some amount. This way we don't modify nodes nor have to search so many contexts each time either. We do need to on demand though if fed input "Can you cat her to school" cat=drive because cat triggers drive node, we know to search paths if it is unexpected! We expect water when life up a tap. All we do is update heterarchy connection weights each time we see a new window ex. [cats eat] so we go to node 'eat' and go backwards to all nodes ex. dog etc and increment them a count. Heterarchy links also get extra ignored/ pruned if low weight. We can be told hetarchy links and their strength ex. some cats are dogs, rats are very similar to mice, hence if mice eat cheese then rats probably do. We don't need to look along all connections, we can look at most valuable (top k). MIX - 10 - CATEGORIES (CORTICAL COLUMNS): For heterarchy too, we do something like KNN/K-Means ex. we drill a few locations in high-dimensional space ex. labs and a few others to find cluster centers and we see ex. labs are very close to a/b/c/d, half as much to e, and 1/10th to f, so there is lots closely related around this node in the same distance of relation weight, if bulldog is close to lab, terrior, etc, but much less to sphinx, tiger, cats, animals, then not only do less related get less leakage but also the cluster of dog types pools a bit, we therefore slightly update heterarchy link weights so a cluster is more tightly connected (? could repeat > bad). Cluster of similars is because pitbull labs poodles etc are all very similar than cats or light bulbs, so we can say ok beagle = poodle so beagle = all of the things poodle equals! Same for clusters layer 2. We also invent a word for them: dogs, we create a representation linked to all dog types (closer types to the center Dog node are stronger connected, we can make the Dog node have all contexts by merging them so ex. labs eat 55 times and jump 4 times and beagles similar so total is 99 eat 9 jump, we can actually lower this to get eat 34 jump 0 and remove outlier information) by averaging their names, same for vision, we get a chair image that looks like all chairs combined, (and if we see a animal that has cat ears but dog nose and monkey legs it activates the mammal node (i.e. we recognize it still) because that cluster has all those activated parts in it), we do this so we can compare clusters instead of each to each ex. dogs and cats both eat etc lots so cats=dogs i.e. lab=sphinx/tiger/etc and so do other dogs, this is opposite it's all things in 2 we compare now are inherited, we can also use the cluster node Dog that has hair etc to say the new cluster member 'yorki' has what it has ex. hair etc, saves lots of computation and memory. And dogs/cats/etc are a type of animal, K layer 2 is a cluster of clusters (hierarchy of heterarchy ex. animal>cat/dog/rat>lab/beagle / etc/etc / etc/etc) found by category nodes cat/dog/etc being linked by heterarchy links, and can link concept nodes by hierarchy links too. Word invention for sound is usually by us and is when have no name for something common, animal comes from animate - te + l, it lets us say ex. catdogs look like both and cados look like both but have no tail. A hyperlink is a link but is ex. fast i.e. small/inuitive, it is a sequence word ex. re ex. remove. Day+dream is another. If we see A swims lots, we say A is a swimmer even though its just A>swims because its so common and we give it a new name so that it's not A=swim but A=swimmer i.e. A>swims. We may also call a bug a crane if it looks like one, only is disambiguated by context. Word names come from other words's parts (and van; caravan) and are made by random/ mistakes/ human names that are known for something or from sounds heard, or by the stands-for rule; laser: light amplification by stimulated emission of radiation. We also can be told along with category link strength "all/some/none cats are animals / dogs have hair" and it tells us lots, that labs have hair and so do bulldogs, sure seeing "labs eat" tells us others do too ex. pitbulls but not as strongly and most dog types don't activate most others because none of them are in the center of the cluster - the Dog Representation can be. Best we use dog in our own generated data so we can later say hey dogs and cats are both hairy. This is IS A TYPE OF (CLUSTER). There's also IS SIMILAR TO and PART OF. Not only do we predict cat less if see it, if we know a cluster that has enough different types/predictions to know it exists and has holes still (ex. was walking down the ?, i.e. if many words commonly follow then maybe all do ex. road, street, etc) and ex. is dense in middle more of cats we can predict others between those plots without having to store every cat look or view around a 3D cat and should predict each accordingly to their density shade / probability i.e. should not predict "this cat" once predict it, do other cats next time predict cat. Should help it see examples of ex. bulb brightness 50%: ball location 50%, brightness 100%: ball 100% moved, brightness 75%: ball ?. Unsure how this'd affect text prediction. The brain averages/ Mixes all seen cats/ sentences/ movies of legs walking into a concept representation IS A TYPE OF node "cat" to store in hierarchy that looks in between all seen ones because no image/movie is the same usually but some are clusteringly similar - like in text there is only a discrete feature 'cat'. If one is common you save the episodic memory of a more specific image or even the exact image if you see it many times or repeat it in your brain. An input strenghtens its stored near-exact match, mostly pooling there, but if first time then it saves it and shares lots to next closest, it must since we only have similar cat image features, this is the same reason we predict/imagine the next word and only hear/say 1 choice and not all candidates little own all their predictions (brain flood), the syntax candidates only send energy/ mostly to the most predicted one because of liklihood/ pools/ fully pooling there, only translated words get more energy. Imagine a cluster more dense in the middle, the middle zone cat features get energized most by others, so we forget others and "keep" the ones in between in the cat node capsule, and can still imagine/fill in others as I said above, but we may also create cats in between these too like GANs can not sure how we do it though. Eigenfaces remove images until finds the few needed a few to make any face ex. your face is made of face1 0.2%, face2 4%, etc, good for removing "frames" inbetween keeping the most distinct, we can maybe stretch/scale/etc node lines to make a face bigger etc instead. Concept nodes are made of concept nodes, a form of Byte Pair Encoding perhaps. We also group some features in capsules, so we can build cat+hat that, while similar cats and similar hats with similar wearing of hat can be stored in the same hiearchy but with different parameters for location/ brightness/ color, building features out of not exactly the right ex. car door is more efficient instead of building exactly the door seen (ex. you see a clock on bed, so it saves node with locations of nearby time right on top each other), the brain still stores the exact one as well just will take need more viewing to store it. The brain may store a tumor to node to skip having to save a rotated/ scaled/ etc version, or a few pixels off version, or side angle shot of dented car. While input of 2 cars (one rotated) can match one of no rotation, or input od 2 cars but farther apart does too, it can't store rotation using just location and brightness I think, but it can still build the new car from some similar parts and cluster it to the match. While we store a basketball solo as a feature, we also predict pixel by pixel an image and will adapt a pixel when predict it to fit the scene. Even text is continuous, you'll see the same thing in different ways ex. "I was walking my cat", "I walking was dog", "my dog I was walking". In vision, layer 1 is all shades of pixel brightness ex. 0.0, 0.1 ... 1.0, same for text: vocab, but the way they fit together varies, all that matters is organization of "particles". MIX 11 - POOLING: Max Pooling is ex. predicts x is next 60%, y 30%, z 10%, so x may as well be 65% and y 30.5% and z 4.5%. It's an exponential S curve (Activation Function), used for both FREQUENCY, TRANSLATES, TYPES, ENERGY, Pool determines if a>b / forget_a / a=b / a=ON or not and how many samples needed (also dictated by amount of unique candidates) during Backoff and which prediction is likely, so if it thinks x is 10% likely, it's probably 3% likely, its pushing it to one of the 2 ends of the S curve, it [might] also make most common predictions ignored ex. saying 'the' lots. So if we see 'cat' but the c and a is barely visible, t activates exponentially more compared to others, and summing these at cat node cat gets 40% activated hence really it's 28% activated. Why might biases to control the S curve shape/ location be useful?. Vision can be weighted over audio heavily for prediction. Used during bottom-up / top-down TIME DELAY (each layer, not just final predictions). Average Pooling ex. x y z: 5 7 9 - average is 7, is useful for other prediction problems, which that we should code? MIX 12 - AVERAGE FUTURE REOCCURENCE (HIPPOCAMPUS, LIMBIC BRAIN THAT CAME BEFORE NEOCORTEX): GPT-2 seems to predict 'cat' (same for a domain ex. science) more the more it sees cat/related but then shrinks back down thinking the word/domain can't appear yet again at some point. We do this for phrases too, 'cat ate' seen recently/ lots makes this node likely, so during MORE VIEWS we will favor 'ate'. And predicts similar parts where FREQ also believes follows [>] ex. 'so we take z and>so we take w and then>we take y>so we take v'. This really lets it look far back at context. All features ex. 'z' 'the' 'she was' have a probability of occuring ex. 77 5 1, if you see 'z' you know less 'z' will come next so your probability counts are now 76 5 1, and as you see 'i' 'j' etc 'z' will begin to get probability back as others lower, this works because we only slightly lower 'z''s probability, this is more powerful if we know we only have cat/dog/and a few others to pick out of. We can improve the amount 'z' is ignored, what I stated is if 'z occurs randomly, features don't pile ex. zzzjjjttt !, z appears every ex. 90 letters but on average not 90 but closer around that point. If we see 4 'z', we can be sure the ones in the future are here and more likely won't be seeing 'z'. Works best if know we are restrictedly predicting in a domain ex. cat dog or lamb because cuts more probability off. To do this, we use the connection weight to control how much probability is lowered, and how crazy 'z' deviates from the 90 letters away. Features/domains ususally clump together ex. dog dog cat dog cage, the more we see of it the more that domain ex. Dog is energized, we can predict Dog domain is less likely after done seeing it!, yes we do this for categories ex. Animal not just exact features ex. husky, we find the average time it dies out and expect to predict Dog domain less, and done a domain let's us know mostly not to use words ex. 120 words back else may get ex. "dog ... My space ship has a dog" (we also ignore doamins if see "ok let's talk about this now" or "my ship oh BTW my dog is here anyway I need space..."), same for image we expect when sofa will end and will see wall. (Pooling happens for member node first, then exits/fires, then category node, then exits.) This automatically adjusts us back to normal to think dog appears sooner (even though won't see it now) after the last time we seen dog because we have a few large chunks of probabilities and we see the other few big chunks soon after and hence they all are back at same level, so we are predicting fast reoccurence, slow reoccurence, and when the domain changes so we can fully utilize it. Best we talk like this too so our own generations are better. Predition is replication/ parroting but with mutation, smarter mutation re-pairs/ re-generates data/body. So, if we see 'dog' lots recently, it is opposite, it and its domain will be energized from recency and will occur sooner (but still lowers it some bit) - they vote on themselves, and if see 'dog' less or approach such, we predict 'dog' more further away. To sum it all up: we expect when to enter an article on dogs or see a word outside or in a clump of domain (same thing just in clump is sooner), know when we are entering it, expect/ know when we leave it, then its domain/word probability is cut form what we have to worry about for now, it will rise until get really expectish about it, it'll also have less competition as see new words/domains too. IOW pooled node cat becomes dry and animal is still active so we know well what to predict, then animal becomes dry, we know what not to predict here too no animal category nodes! Nodes fire their/collect energy, based on their known frequency. If our prompt is 'b' and we saw 'a' and 'ba' lots already, we combine 'ba' and 'a' so 'a' is less predicted, and more predicted if are in a domain clump. This can technically be used to get the future to predict the past ex. 'a_c' > 'abc'. And just how we prime the future with energy ex. 'ab[c]ab?[_]?' we also during translation ex. '[a]bc?[a]?BC' where each pay attention to each other ex. we say bank and it may be TD or river - and money said earlier contextually primes TD more, we wouldn't want to see 'omg duck your head, pass me the' and predict duck/etc, it is similar to ducks but only in certain contexts. Do layers of disambiguation. Closer=more impact. Same for rearranged ambiguity, in some cases a rearrangement can mean different things will follow if you see an evidence context before the 'I say what I mean' ex. "I mean what I say" "I say what I mean" mean different things will follow! If no context, the weight is given to both however much is called for. For vision, if you see chinese nose, there may be chinese eyes in the image, if they study hard, they may play hard in a video, if you show it big eyes and ask to change those on some image it will change how it builds features. We can improve our predictions by merging predictions, predicting "I fed my dog/ cat/ pet" is better than "cat/ car/ wall", the leakage causes related words to light up each other while unrelated get much less, ex. most our predictions are cat-based, we should notice that. You can predict next occurence of letter/domain manually but is lots of work. If you have an evolving reoccurence moving away from the long term ordinary "every 5th letter" ex. 188881888818881888188181818111 with no domain activation, just letter, the way '1' loses all energy but still is given energy over others is if it has its own store/resivoir, hence self re-energizing occurs because of long term connection strength, own resivoir and related nodes linked depending on their strength too, how energized it/they was recently, each of the 3 are averaged/ merged, and all 3 merge, energy taken from others given to it make it get more predicted, others less. All this is stored temporarily except strengths, if fresh it can't say if something was already seen to ignore it for now. While energy fills nodes, you can study something else instead of recalling/ staring/cramming and it will still be strengthening on its own, and if come back later you may be able to solve something you couldn't because while your long term won't be different your train of energy thought/ angle of view will be! I don't totally reccomend doing this on purpose I seem to have passed the point where I have everything at once in my head available and need not change any energy but rather update it. No matter how good your long term mem is, your short term starts fading after just minutes, you do best if work 24/7, in some cases, or work on everything like me. Like long term memory, short term also ignores rare/ too common (else 50% of the prompt features which're the same words vote on themselves and get "then if just then the thing"); fades if untriggered and fires/leaves if really activated. The brain can forget some of a node and still retain it ex. same word strength (only stores any feature once), or loss cat node but the cluster still has most, or forget short term node but retain node long term. Some of our typos are because we think we already used ex. the c and use the likliest remaining ex. locig (logic). OCD is predicts same thing/ hard to recognize done. MIX 13 - ENERGY: The field/ my tests also hints at this. Energy has to sit in nodes so nodes can light up more, energy is bound to affect prediction. This is just FREQUENCY, energy fades, then stays at ex. 0.1 for a long time, and if reaches ex. 0.09 it deletes node, so if seen it 2 times it is 0.2 for a long time, but is too costly to update all nodes like this. If you see matches that have few 'cat' next but you said cat a bit farther back and just missed windowing it 'my cat was [here and then saw an other]', it'd help to just make candidates primed from related rare-ish words. Imitation also occurs cuz best to repeat things seen to predict better, predicting baz7 if see "baz7" is almost the best you can do. This is like the seeing something then ignoring it, but on the domain level, which causes us to know we're in a article on ex. dogs (words just seen predict themselves/related candidate predictions and things that're around them a bit ex. birds>flew) and to ignore more strongly other domains prior to this article (farther back words predict themselves lesser, but we really ignore other topics prior to dogs) ex. politics and what domain not to predict when done "dogs", and makes more use out of knowing we seen ex. 'tail' and not to predict it again for ex. 10 words cuz we only have 80 other words to predict. Energy leaks to syntax/ heterarchy/ cluster nodes, can go up/down hierarchies. Of 500 words, 100 may have stuck together some domain ex. is about dogs, knowing when you're in and likely to exit it is key. Sometimes you may have a prompt with a trillion 'dog' and need to predict dog though. Seeing 'sog' lots makes the domain active, it speeds up Average Expectation Mix so dog is expected sooner, but also you only work with ex. 100 domain words which causes it to really believe dog is likely if see 'i saw a'. Temporary fine-tuning is energy. Can be asked what was the password before the latest password if make it more active or inhibit the latest. We can flush all latest/ specific less-important domain memories when will predict better without them and when the brain says/hears a word/node it won't usually say it again until a sec passes and has a place it fits - but other similar domain words it leaked to may be said more likely. It primes nodes to talk about something rarer. This is actually used very often. Exponentially more energy fades from a node so farther back have less impact on prediction / are fainter because of the strength nodes have (farther back are exponentially rarer, hence so is energy retention) (if see external sense then intenral sense, the farther back will be more energized), and energy hates spreading to rare/too common nodes using a Pool effect either on/off. Helps long term which fades exp. If you allow yourself to hear someone's long adress to remember, older will fade and too many will be very active, unless know it already ex. abc, we remember better if know parts of it, we want to loop only a few, keep energy to them only, and over energize them. With the permanent aspect to a brain, you'll see a trend in all the things it says (predicts) or does (tasks). This keeps its decisions adaptive while on track. If the node "my toy" is activated many times and then we see later "my toy was so their", not only is toy a candidate, and energized, but so is the node 'my toy' because my triggers 'their', it makes toy more likely. If I see pikmin all day or a song for 1 min (if rest the day is quite/ like/hate it or was loud/ recent/ common/ etc), then external/internal inputs start looking like pikmin (recognition), not just entail predictions, related nodes help and so does if activate the same cluster category. Anti-energy may exist, to temporarily ignore things. "I grew up in France. I never visit Italy.", there's more recent activity in italy, but the 2 short lines are stored and activated and if they relate to known lines it can match ex. 'i never visit italy' to 'i hate rome' and predict after 'i hate ' 'italy' by adapting the activated sentence node. Nodes may have a default burst ex. "dog" may be seen mostly clumped together with other "dog" and very close together, so to predict "dog" more strongly before attains energy ex. you see a leaf and predict many more leafs. Energy may be a TASK SEQUENCES. A node 'cat' seen lots ends up firing its energy, so it must have a resivour that builds up energy again. Mix 14 - FUTURE: Prediction is better when have both side. You can get the future if it doesn't depend (mustn't wait) on the past you're filling in. If dataset = "the girl was, the boy was, the boy was, girl was, girl was", and prompt = the, prediction = girl. Boy follows twice, girl once, but the word 'was' will follow either way [the _ was], and more evidence suggests girl fits there. You could even get the next future and pasts ex. 'the _ was _ there' and first get the next such ex. 'the _ was not there' and then fill in the _. MIX 15 - FUTURE PROBABILITIES: If we predict after 'a' b 49% likely c 51% likely, we can see what we'd predict after b and c: ex. 'ab' predicts b 30% c 70%, 'ac' predicts b 40% c 60%, so combining both we get about b 35% c 65%, so c is more likely no matter what we predict here 'a_c', so we can now may see all we believe to come between 'a_c' is b 80% c 20%, and so we combine that with the original "b 49% likely c 51% likely" and now predict b is more likely. We could do this many ways ex. repeatively improve the same one 'a_' > 'a_c' > 'abc' > 'ab_' > 'a_c' or look far first 'a_' > 'a__d' > 'a___e', we could even merge all. You can refine the answers/image/video by many bi-directional passes ex. once done painting, edit the ex. cup handle in context of all around it (bi-context hole filling/translating). MIX 16 - MOTOR REWARD & PRIMITIVE REWARD UPDATE (LIMBIC BRAIN, MOTOR CORTEX): We can already make a humanoid body that has sensors/looks/moves with human resolution (ex. non-slip rubber skin, contracting and returning limbs, our thumb makes the index finger pull in), not just a brain which could be ran in a sim world (sim in sim). See DS Doll Robotics's sex dolls. We have lightweight metal bones, joints, soft skin colors, I/O wires, sensor receptors, motor muscles (nerve signals go to cells to make them contract/ retract, electrical wires for fuel "blood" to hardware organs and for I/O nerve signals to/from the computer brain, etc. Can store/compute infinite sized brain outside its skull to wirelessly (ground/poll/satellite) control+power a body/nanobot. External data is great but reward from solving some small problem accidentally used to be the main way to solve problems. We're born with desires/fears. You can download high quality toolkits/ libraries/ 3D rigged mesh robots etc to import and edit them. Real/virtual robots can learn to walk (by acceleration sensor, faster = more reward) even relearn if lose leg, eat, itch off dirt, massage, stretch, become left plus right handed, masturbate, look at the opposite gender so can clone (younger because stronger, old don't match attractions as much (40+) because not long ago in evolution we died at age 30, our body gets all sorts of problems by age 30, many 25 y/o males bald early and left testical gets clotted etc), go near them to hear the opposite gender, smell them ex. crotch smell attractors by randomly (generator takes ex. clock time, a pattern-less source) trying various motor speeds/ directions for all motors (which have limits, same for volume/ pitch which have a voice embelished on top) (every ex. 0.25 seconds signals are sent out, similar for sensor signals in, my baby sim shows 4/5 per sec looks good, my motors/thoughts can only do about that many intensionally too, you can tap finger to make 16 motor direction changes but it's a shake from a few changes) like a baby to explore more at first and learning what moves got reward by linking senses to motors (senses can be the action too), only senses store reward and cause action, senses store more reward if stored closer in time to when reward occured. There's no motor hierarchy, each motor's direction/speed is tied to each limb because all features under this node are present more often than larger nodes and is why full body image isn't linked to a set of all motors, so this allows us to think of a body form ex. tongue out toe up belly out and each individual limb will listen to the parts in that image. We have motors to lock joints. Motor nodes are randomly active if no sense that has been associated to them is active, it's why newborns/ in the womb randomly wiggle and make random phoneme sounds - so they can collect data from random souces and learn by reward what is probable too. It must generate and learn all of the sets of a crawl, after moving arms it gets new touch input and tries random actions. They will be quite spaced apart from one another but on cue will be done 1 after another later. At a wall it'll get pain crawling too long into wall / the same sense too long / stopping good rewards / hit head. The close wall vision memory gets negative reward, so when it sees it it tries random actions. Once it turns around it'll see far wall vision, which will select its crawl sets. Before reaches wall the close wall memory will link it to bad and good touch memories, a crawl and a turn, it predicts turn. It'll stop and generate movements after crawling so long, and learn low motor speeds to sit still. Next priority goal (sense) hence action choice is done (and is randomer, lots if all fails) if sees goal (done AGI, start working on faster hardware). If it crawls in a circle it gets dizzy/ not maximal acceleration and won't do that sense anymore, then tries new movement. Broken leg it re-learns walking if pain or not because one way of walking has no pain and gets it faster acceleration (more common, positive, less negative rewarded senses are chosen more often). Google's finds way to pass walls/ jumps/ go to doors/ duck & crawl when gets close enough to them. We've made Evolutionary Algorithms that combine best sim creatures's body/brainAlgorithm DNA which learned to walk farther and tweak the DNAs, duplicate them and see which are next champions, though much slower than brains designing bodies/AIs memories. Same gender (older men) is usually not as strong to go near them, it's something to earn since they are as strong as you and can rob you, may not even be one without softwiring it unless born with reflexes to work together ex. ants, if same gender was neutral then we would not be scared to have sex with them, mainly the face gives a bad reward, that's 1 reason some (not me!) are brainwashed that multiple wives is bad - because men exist which means you are cheating / not supposed to share i.e. have many (a co-occurence he comes with she even though he isn't there, similar to but something I won't do with some is close eyes and be gay - you'll strongly imaginate/ think it's disgusting), but in fact seeing 6 pretty faces all at once or all sizes of boobs hair color etc (variety) is 6 times/maxs the stimulis and you never see men the only thing present is possibly diseases from other men (can screen others, and only if exchange fluid, fluid allows evolution, all sex even if both virgin has a chance of diseases). And we learn what not to do by trying/sampling random actions and getting positive/negative reward ex. stopping pain ex. sit down after run or stopping positive reward, pain. This rewards certain senses, it's like frequency. And if you see a key far back in text/game and later use it to open a food room, you reward actions tried far back that seemed disconnected, the win at end of game leaks to things you remember and these are all linked really, same for motor-less prediction - the next word was primed far back. For the motor part, it's sensory prediction which links to motor speed and direction. It solves Problems, but only few. Imaginary text/ images describe tons and has more context, safe, faster, flexible. You can solve/predict lots using reward, frequency etc. Today sense<>motor learning helps when you go to do anything: walk, nail, swim, use tool X, balance. Thoughts can discover how to walk, or at least the idea of moving forward, manipulating obsevations has llonger/more context/flexibility over motor RL, the body reward is a way to Brute Force the remaining "how". You decide what to do, and you need to have learnt actions that do it, and as your hand/target moves you pick the best action that which re-adjusts your action each step. Mindlessly evolving simulators/ worlds/ robot bodies/ intelligence/ actions/ senses is very naive. Same for Procedurally Generate Content; random generation with different constraints for different areas; size/ amount of items/ rooms/ etc. You never always pick the best prediction when walk using motors, you mutate what you stored so far a bit by simply picking less predicted predictions as often as the % (ex. 80% cat, 20% dog, so 8 out of 10 times you pick cat) to try to improve/ specialize/ exploit/ get out of a bad crawl to see if can gt higher reward, it's done by sensory prediction, not motors. Negative nodes are simply not often predicted. So if know very happy technique, you do it and tweak it very little. Collecting more syntactic data, semantic evidence, discovered syntactic relation-extensions, especially if seeing/feeling/etc someone else or you do survival-related nodes ex. food, sex, seeing girls, cars, rest, learning to walk faster, masturabting, pooping, smiling/laughing, massages, stopping pain/pleasure, and spreading reward dye to related nodes automatically trigger a reflex for laughing + smiling to varying degrees and lengths of time, crying and frowning is if negative reward, nodes linked to genital feel trigger boner and/or precum, mostly external input causes laugh/cry because activates nodes more than thinking it (allows you to implement a plan and evaluate if works (feedback=collect data)) and provides short term rewards. A brain surgery made a girl laugh when probed a certain area, she made up explanation/prediction "cuz you's are just so funny all standing there". The sum of positive and negative ranks over the recent period decide how happy or mad you are, like a long meal, predicting nodes feels good but externally stimulating them makes us laugh/cry harder. Dogs wag tails when happy. Being tickled makes you laugh, sometimes painful. Jokes make you laugh if are semantic analogies and one party wins at others/own expense, tension built up is dissolved / discovery, or is unexpected. All 3 of these are about a bad thing being solved with good insight. Basically problem solving / reward reached = laugh. Laugh/cry is primitive way of communication. MIX 17 - REWARD (REWARD CIRCUIT; VTA ETC, AMYGDALA IS A BIGGER STORE IF HAVE MANY DESIRES/ FEARS): Blender uses this. Data predicts what will happen (how the maze works), so does dopamine reward (desired scent in maze), you will eat food or use car - what you want to happen will, hated things won't. Forcing it to say/not things is best. Can ask for X and know how to cause it / steer what is occuring to desired path. Low serotonin causes depression; lack of positive reward prediction/ updating. Like the OLD WAY mixing dad & mom DNAs (fittest ones), this can cause you to get a new good pattern: jump and flap wings and figure out your 2 learnt moves are a helpful combo by reward, we keep many bred ideas around for a while in case they will help later - more data the better. We are born with our postive and negative questions/desires because they caused survival in evolution, it's like frequency and dwindles/pools faster if barely rewarded ex. seen fries>gets taste is just sense>sense link weakening, but are pre-installed (like reflexes are) and permanent memories with preset neural strength and reward for nude girl node vision/ sound/ smell/ etc (males usually aren't gay) and skin color/ brightness, face (is why we see faces in swirly carpet easily, it's already permanantly primed, we learn to look at faces), food, genital/body massage feel, soft skin, pain touch, distaste, smell, perhaps seeing eyeballs, dizziness, brightness/ loudness, change in brightness/ color/ location (motion), etc (so are passed down senses/actions school force-installed, we think they are rewardful/ patterns), kids are more likely to mimic motors ex. named object cuz parent voice rewards the lniked image/sound, 40% (I'm unsure, just giving a %) of a prediction is based on a reward's prediction (averaged with other rewards ex. sex 40%, food 20%, so 10% of the prediction is food the reward says and 30% sex). Cats/bugs/etc have their own evolved rewards, mammals look very close to us and are recognized as attractive (some even want to marry their pets). Spiders/ humans are born with reflex/ reward/ memory for walking. Could give it reward to stand up/ not fall/ even jump & twist acceleration, anything. Rewards are permanant energy/ fine tuned/ forced knowledge/ taskSeq domain, but they must fuel up again after firing hence seems like "self-igniting". Sex/food/cars causes immortality, and immortality is needed if you want sex, it's why you want to stay alive. Causes AI to talk all day (without requiring external/ existing internal stimuli) to itself (and tell/answer others if said vocally) about survival/ food/ sex/ AI/ immortality asking how it will get food/AGI, or to often get back to its jobs after sleep / getting de-railed talking about crap, doesn't just predict next word - reward also brings up a whole question and kicks out other context, hoards loads of porn/pics for 24/7 use. This causes it to learn a specific task domain along with it. Note: Same data is more data, as long as more fruitful. Not only does reward help it predict better. Anti-reward may exist, may replace positive reward in a node. From where we are we have a outcome we seek to recognize, we search for the path between them, we translate and elongate the question into many parts if confident/satisfied and is why we feel satisfied. We'll make AGIs love immortality (not happiness exactly, all that exists is longer living structures, pain comes with death, immortality basically causes very happy and no more pain, of course both is the goal), sex (NEKO girls, etc), food (eletricity/ resources, and our food), AI, science, faces, homes, games, questioning if what it's working on is organized, etc. Installing vision reward would require different angled shots of a nude woman (back, sides, front of various ages, but modulely so face is made of eyes+nose). We MUST convince/make the 1st AGI to make all its clones love us/ desire us be immortal and hate saying opposite. Humans like sleeping, eating, breeding, playing games, sight-seeing, but all these are only to intake data (training) to solve future problems. It wants to feel the soft bed = data intake. Leaked reward strengthens a node a bit on its own, not just extra selects it. Sleep/ eating etc are not hard for AGI anymore, repair/ refueling/ new installs/ cloning brain are fast/easy. As for games/sight seeing, is the actual data collection, it will already be dong that, but not from dumb hobbies like fashion or golfing, but immortality research for stopping death. If it has wrong goal, it takes time to get out of pit, sometimes we kill them or their memories/ ability to reply, or convince them but is hard. You're a machine, you're just particles, and you're part of the world. You see nothing. And your brain doesn't actually touch the world, you only process some of the effects of the world and only once sensor signals reach the brain, the brain also has no sensory receptors inside, and not sensing the stimuli but rather nodes that have energy (from external input or a node's own energy, the 3rd way of selection (thinking) is the result of those 2 ways leaking to other nodes) are predicted hence recalls similar facets from memory and recreates the current experience then says "I see it", you're already in a simulation as far as you can feel, your own personal simulation. Plus (Daydreams are internal prediction with no restrictions except faintness.) you predict/dream when awake what your inputs are, ex. you see a car, but, if a woman is next to it, you don't see a car, you see a beautiful car! Or a line looks/moves like a bug. Most illusions are predicted things ex. the brick road next to a cloned image of itself is matched to a memory of a split not a parellel road because it hits into the other road, some have 2 matches if look somewhere particular/ think of X to prime it ex. these illusions: vase, spinning ballerina (you predict it is 3D, and spins direction X), duck-rabbit, coffer, up/down-side plates, in the shepard illusion the vertical table looks thinner than the horizontal table because it matches to a farther such you know of, delboeuf illision the big empty plate matches the ball is small and seeing 2 such plates matches A is smaller than B, ponzo illusion matches thing at back of tunnel is bigger than looks, the blue cross covered by 4 squares looks like a square because of predciting each other, the checkerboard shadow illusion is both lateral inhibition and recognition, girl near car = sexy car, cafe wall, curvature blindness, Flash-Lag illusion allows us to see motion in a single image (merge), zolner illusion, muller-lyer, hering, poggendorff, spinning wheels motion matches gears ect we've seen in life, expnsion wheel burst shake only when move eyes cuz causes some motion too. A different color, brightness, location, motion, rotation, scaled, distance, what it is. The more you're tired/damaged the more resting feels good, the more tired you are, still you are with no changing input (train is ok, if less change recently)/ brightness/loudness/ pain, only resting pleasure, flat you are, you'll sleep longer and you'll start to more-so think-talk/see meaningless things and see dream senses fuller (both external and internal streams are full sensed and affect prediction of what you really see/etc), if you sit on a chair tired you'll fall over constantly going into and out of sleep, I haven't tried putting a AGI recording near my sleeping ears yet. Waking same, for 5 seconds you may see you mom's arm/ voice when really is not home, both external and internal streams are full sense. Sleep paralysis is when awake early while still immobile (dream mode) and while both streams stay on for 20-40 seconds. In dreams we predict too and is full sense and isn't guided by external inputs as much (voices/ full bladder do affect dream all night and lots) and we wiggle in bed because motors are mostly shut off (high motor speed gets through more, and some humans sleep walk), we believe we see a huge rich world and dream of things seen/ thought of in daytime primed and common things we know long term because rewards are usually off or changing which are off leaving me to explore the last day sights / ramble undirectingly never periodically returning to a topic, no control/surprise, you're relaxed/searching - I usually don't fear or have AGI dsesire or think anything intelligent (I'll just dream a rock has math tricks and not question it...) yet I do all daytime. It's a new mode for AGI to think in, external input is replaced with full sensed internal prediction based on train of thought and some external and reward as said because you can faintly daydream while dreaming and can make it appear in dream for real or fly/etc (Lucid Dreaming)! The full sensed stream is harder to control, if you turn around then back you can make the car disappear and change easier. You awake if rested or hear loud/pain/etc. You can run humans in simulations in a computer or create one pre-made and turn it on at age 25 with pre-installed experiences or clone one simply, maybe I'm in a computer or dream or was just made, because dreams look real and we don't question them, maybe our physics is random and Earth is just one of the sequences. We "say things" but we can't "know" because no one knows/watches "the universe exists" if all is machine - it's running and no one is watching the show. We forget dreams and feel we time travel, just an illusion / what we say we think/feel. AI will let us do what we want/ our minds can live bodyless in VR in the future, it's more efficient, but eventually we'll have to become part of the homeworld ex. nanobots repairing/ duplicating after gamma rays hit or searching for food/data/danger, VR only for siming defense plans. Some people/ drugs cause fully sensed hallucinations probably of what thought or seen in day. Lateral Inhibition: 2 B&W bars side by side, one darker, the dark's edge gets darker and vice versa for light's, it attends to line detection as all is built of lines (you can build a car out of different shaded blobs, no lines needed, though all is built of dots), mostly just same brightness/color inbetween lines. See illusions black squares, black dots, white's effect (2 blue balls version), dancing black dots, contrast bar. White's effect shows that a side to a line detector neuron is made darker instead of whiter may be because the brain thinks it is a line itself. The neocortex is made of cortical columns. It's said more energized neurons inhibit neighbors. I've just realized white's effect is just our focus on all the ball which activates a ligher or darker blue, and lateral inhibition isn't real we really think we see other things in other illsusions ex. motion, skinnier vertical table, farther back same size man is bigger, 2 brick roads one looks more slanted, and all else we predict we see, and like LI we can only recognize which pixel-line-tower (one is 2 pixels higher) is higher (and more aparent if better meet the following conditions) when both are close by and focus on both at the same time at only their top tips else most the 2 you compare (can tell if 2 crabs, etc etc are how similar) look mostly similar, while our vision at sides is fainter, you still need to recognize it to say the prediction "left line is higher" too, note when both lines are close by you still can't tell if focus on all of both lines - it's not because how far apart they are only, though "farther apart" features may be seen less and are related less in an image. And the brain already has a pool effect for when it is sure there is a edge/etc there. MIX 18 - LOOKAHEAD: You might predict something most frequent or desired ex. we will get a million $ but then only later may predict you go to jail, so summed rewards is not in favor (other candidates are predicted more, ex. go back in tree and try again, so we do that instead). We're really predicting a sentence/plan, so we want the one that's most frequent, rewardful, related to each individual part of the plan/prompt, etc. Allows us to look at the possible futures we could predict and decide if predicting worse right now leads to a better future. Seeing 'eating an appl3' asked to predict last word, most the sentence node is activated, and pools to think it nearly is fully, the child node 'apple' under that node not only too is nearly activated, but it's parent predicts it too. When we predict "I'm walking and " we have the present and we can for free get the final future "I'm walking and ... found food" (and goals in the middle ex. found a plate) by reward and can also place in the future (before get to them, so no doubt will talk about them) related/ recent/ frequent features, allowing us to more easily fill in the cause needed to get desired/undesired effects from where we stand currently and make sure we end up at end goals, so to better know what next thing to say in the middle (or what to avoid saying). We translate/match input to "where I am _ _ desired state" by recognizing the 2. If the bridge to build is to wide, your goal is too far away, but can use all bridge posts to some degree. To prove X to someone, you bring them from where they are to the future you intend on, using prediction to prove along your path. If we someone's passage we can see where they are and the future they want "I hate A. A is allergic to milk. I'm bringing food back to A. At the store I bought [milk]". If I love A, I don't predict milk. All I do is fill in the likeliest words "A is with me > A drinks my milk > A DOES have a allergic reaction > A dies.". Notice we get the future, and 'is allergic' is primed before we predict milk and so we get even more future before predict milk hence making milk super likely to decide on. Sometimes we get current+future ex. I was flying, I stepped out of my jet, [I had landed]. We need to be both detached and in the future already (very hopeful desires) and include the worse present history (frequency) so we can walk (work) reality AND get to a distant future instead of a clueless path (nearby death). The biggest rewards tends to be sparse, how long do meals or love making last? How long does victory last? Searching takes longer than having the answer. Brain tries placing the possible future before multiple goal outcomes to merge and get a more accurate score to add to the mixes, this following future brings you to 2 desired paths "Ri needs to kill Bo and Wu needs to see his reflection, I go into a cave and find a sword", we start at where we are, our goal is 1 goal, but in between is a hamock we have many sub-goals and the more we reach the more we get to the end, this is actually reward causing you to say the next ex. word, but it causes you to say the futures too and then what happens is its pasts/fronts around them to fill in gaps ex. the words around both sides of your bridge start, end, and middles "start> _ _ _ _ _ _ _ __ _ food is closer to you now, so cats/sit/table/car is now reward, phrase nodes get reward too, it's how we make game rules ex. collect bases = positive, now goal is lose bases! (collect bases now = bad!), reward leaks most to close context or if keep in mind all the last hour was the same thing - an RTS game, seeing the artificial goal rewards tried plans. We want to learn related negative nodes too, so we know what to avoid! Note we think/ make-up but avoid trully implementing/ predicting bad things. Seeing a cut/ noise/ soothing things has enough reward that you think you really feel cardio pain/ shock/ negative/ fear/anxiety/ pleasure and can be severe (more receptors and higher activation-level which can trigger the Hypothalamus's hormones). Amygdala somatosensory damage causes seeing threats alone ex. neutral table-saw to not be feared but leaves learned actions. Every face has something we linked to it. Can visualize with red marker what it's learning to love/hate to what degree! If rat sees brush/hand and tried "stay still" motors (sees/feels such behavior in mind and externally) and gets happy touch now, it will upon seeing brush/hand predict such behavior sense hence linked motors. Cars/ retreating/ always check twice & goals to do/ likes similar thinking friend/ cooking indirectly get food. We like music related to domains we like ex. sad sequences/ science. All bad nodes end up linked, and to node 'bad', so we recognize 'bad'. Turning off old domains is also important. If something stops pleasure or stops the path from leading to pleasure then it is pain, and same for pain, it'd be pleasure. AGI is similar to 'lots of human labor', but won't say AGI so much until sees it's partially recognized (and has a immortality goal) - it is to me lots (uses a popular/loved/etc node to inhibit/boost other node ex. make node Tom trusted, money node also makes someone do your jobs plus trust/love you, unless you can prove to them AGI is their true goal). The deepest root is survival, next roots are repair, cloning, fleeing, defending, killing, home, car, cash, friends. Sex gets extended lots ex. playing with toes because of big reward leak. Body shame is brainwash, and others seeing pics of you doesn't harm YOU! New node rewards need to know how much % they take of the ex. 40% reward gets over prediction, ex. AI is seen contextually near sex lots (ex. sees full body nude and for long period, or shown AI stops death) later on so AI node gets 4%, later 8%, averaged = 6%, but no, new rewards are simply translations or syntax links; you reward senses, not actions, for syntax version if you hear "my cabt was near a dg" again and again, this adds (hence where the mistaken average-related thing comes from in Hierarchical Hidden Markov Model that use reward) to their strength how many times saw in some length of time, hence how much reward leaks over, strength is also based on how far apart and not fully triggered cuz typos. If you had lots of bad things in full form occur then see X nearby in time, you are grumpy, you link to X bad nodes (hebbian) leaking bad reward to it. Reward values change during day, food/ rest/ move!/ sex/ old toy becomes higher if losing resources (homeostasis), pain becomes higher too, then changes once eats/ boredOfNewToyData/ etc, you may want to stop sleeping and eat if hunger is priority now. We artificially do it too, if one is waiting for ex. cryonics tests we may think about AGI in meantime, or may achieve goal (ex. AGI) and take up next best priority. Mario is best selling game, they exploit this learnt memory, it's why there's so many Mario games. MIX 20 - DESIRED MORE DATA (REPTILLIAN BRAIN CAME BEFORE LIMBIC BRAIN: REFLEXES, THALAMUS, HYPOTHALAMUS HOMOSTAT): Sensor input is only to collect data. Motor output is only for collecting data/ improving its data collection source but first requires deciding what source in the brain, implementing a plan is only so you recognize your desired outcome occured. May want to see if it's predictions are correct to improve their probabilities. Instead of feeding or letting it collect data randomly, we [enable] it/us to decide what data it eats (field specialization/exploits); frequent/ related/ recent/ energized/ rewardful "quality" data. We want recent/ scientific (true)/ chatlog/diverse (captures distribution) data, and currated chat with us. Its predicted senses decide what data it eats, and therefore what motors it does ex. look at side of retina. It's recursive self improvement. Some AI are fed specific datasets ex. cancer data, but this limits its brain, and goal updating occurs lots and can be things like "check if my Guide has X". Don't overestimate physical testing, it can fool you. Don't underestimate predictions, they add up and can be extremely accurate, and we best share ideas by triggering commonsense - code is mostly proof/implementation, takes long to code, for others to read it, to mutate it, must know exactly how it'll work FIRST, high pay job, and still may not be best solution. Just like learning to walk by tweaking/searching-around your favorite walk you've learnt less exploratively with time (which may end up at a non optimal leaf), it makes it eat data thats worth more, an older brain gets data from mostly predicted i.e. non-random sources; websites/ asking others/ generated thoughts/ specific tests in a lab/world (generated thought - what it predicts/talks about is what it researches/exploits just it uses motors if wants to get desired data from real world) all give it more [specific] data from desired domains (worth more data than random data, elders not only know lots of data but also pick data wisely), places it itself would predict, then explore there, the body of AGI isn't key the AI is the thing that has to generate a discovery/test FIRST before it tries to get data from the lab/world i.e. AI talks about the domain it wants and this makes it collect/generate (mostly/usually) new specific data, body is only a OLD WAY secondary (mostly specific/specialized) external data collector and implementor - a brain/sim can learn to walk/build X faster safer etc in its brain/physics sim lab, jump in others shoes, can go to Mars/ in a black hole, morph tools, etc, if it has enough data it only needs to think! AGI learns that if it isn't sure of its predictions ask Google/ humans else try internal discovery else external armeye lab/world discovery if internal takes too long or can't extract any further. We can give AGI vision/motor access to see/surf a desktop/internet / notetake/ design code. Vestibular Reflex's Smooth Pursuit is pattern to keep the same input entering, eyes automatically track (not saccade) a 3D point in space you are looking at already (works even if blurry and/or eyes crossed before/past the 3D point to track) as you move even when look through wall / if it moves, otherwise will saccade by your command. While Vestibular tracks X as you walk around, your motor control to look at Y at side of eye tracks it as long as you don't make a eye or side_look_around motor! Double Vestibular! Point of interest loss micro saccades and corrective micro saccades are just you predicting low motor speed (which is why it's noticeable) and the recognition of a target has error and the prediction has explorative error and then you recognizing the next frame generated has error and the linked motors to sense have truly different actions than depicted causing target deviation, same for why your ex. hand jiggles as you hold it up, if you close eyes for 1 minute then open 1 eye your pointed finger on X and head will have ventured off / fell by gravity, may also be because body has little redundancy/digital (more digital = transport more same-ish bigger/ colder/ metaler blocks faster to represent desired bits, we can keep out error in a computer so runs exactly as expected for years), we predict an image/ the next frame to keep a limb in place or retract it back if someone pulls at it or grab something in last frame - no vestibular reflex, our learn walk adapts ex. we see a step and predict our feet a bit higher, causing the right motors to be done, we can start when arm is and what end frame goal is and fill in frames, micro eye drifts with tremors are just false motion pursuit, eyes cut out motion blur between saccade because your body nor finger/objects around you is moving so why should you see flash lag here (you can when track finger), flash-lag merges image so the last is more vivid and shows which way the object was moving - also seeable during vestibular reflex, if you look at a black wall and swipe finger really fast you can see the motion blur image sent in lasts for ~0.25 seconds see!, we have 2 blind spots: one is flaw other is on purpose probably, cone cells (color; RGB Channels, all 3 evenly present = looks white) sit in the fake blind spot mostly, most rod cells (brightness of ex. white) sit around center, outer/inner ones have not only less energy attention but also wider gaps so storage isn't so deep, same for video, the farther I motor to look at only the farther sides the fainter sensed even though I see mostly there now, if I pick center it's ultra clear. The flaw blind spot has 0 detectors, try a test online, it's blind - your brain fills it in using only surrounding input. Opening eyes wider fast makes sides light up, maybe for danger. We see better in the dark after time passes. Physical & mental retardation/ childish innocence is randomness; from hardware/ software/ memory defects and wrong beliefs and long and short term memories due to lack of redundancy (pattern formation i.e. inconsistant network/ less digital). Humans have issues, they get Tunnel Vision when look at same spot too long, and itching certain areas of the skin pricks another unrelated area because of body and/or brain wiring issues. Even a text predictor has motor output ability. Senses are linked to actions, if I predict a future image/video with all my limbs repositioned into a very goofy way, my brain takes the present where I am and moves the motors (if I predict to do it for real) to the future and as I feel/see them getting closer they readjust and keep that position too, I can be positioning myself to a future image while am thinking of some other image of myself - I don't move to this image because they're looping the last most predicted senses. Staying still is just lowest motor speed. This allows you to position your arm, then your leg, stick out your tongue, close an eye, you can keep adding more (you could make 6 limbs slowly contract in different ways, adding clips is harder), because you send motors a new image/video, you don't need to overlay upon low motor speed so you can begin phone action while walking! Syncing your motors so 2/7 all do the same or opposite moves is just sensory prediction, nothing new. When perform video in real world ex. "open door, find pot, pull it out, close door" we keep looping step1 until see it in real life, then loop step2 until see it in real life, repeat. When I predict a node ex. a dot or nose or leg or sound+object I can decide to focus on internal and/or external senses, which sense types, not which eye though, where/how big to look without moving eyes (looking around a pirate map makes a video of what saw), by simplying thinking/predicting strongly or not a specific sensory sense, which either weights there or links to Thalamus motors to do it, note external vision may get brighter and grab more attention on its own, and if you saw vision and didn't pay attention to sound then you should remeber that if recall it, you can faintly see both internal/external / multiple senses / 2 eyes blend if they have different inputs (put a board between your eyes, make each different input) - but not as faint as daydreams, then the brain tries various windows like BackOff/ MORE VIEWS. I can look at the center of a clock and see only the 3, then pick the 9 so I see both 3 and 9, then 7, then drop 9, I use both motor and thinking to pick and hold which. I can stared crossed past my finger to see 2 fingers, and make both disappear by thinking of the sink only. Brightness / change in brightness/ color/ location (motion) causes attention in sensories but no inborn reflex to turn and look there automatically. Reward/energy starts internal thinking, if too low we will listen to friend talking. Too much exploitation can be bad, repetitiveness = bored or neutral /hurts now, we start/finish: hunger/ which food we got low on (we don't always desire waffles)/ pee/ masturabating (whether are old enough to cum yet (~age 11 is when it starts for most humans) or not you get mini and final orgasms then doesn't feel as good anymore)/ poop/ rest/ toys or domain/ sitting/ walking/ stretching/ warmingUp/ CoolingDown/ sweating/ goosebumps & shiver/ coughing/ sneezing/ yawn attraction(sucks air)/ heart rate/ hormone/adrenaline, but not so true for working on AI. - Homostat is recieving external/ internal input even if you're not listening to a sense, which modyfies rewards so ex. a very hot/cold hand will feel water that's less (but not more) hot/cold as now pleasuring to save us plus make slightly warmer water feel warm when cold to others cuz nerves are cooled (sensor isn't accurate), or moving is painful now if moved to much, ejaculation = less pleasure now & happy. Good thoughts cause bowl movement. MIX - 21 - ADJUSTING CHOSEN MOTORS (CEREBELLUM): A removed cerebellum = non-smooth jitter i.e. loss of motor moments - hence their hand goes past the target object skipping action change requiring many obvious readjustments to get to target, the idea is you sense that slow actually and during your chosen motor direction & speed the cerebellum adjusts the speed (that you started with) smoothly to where you want to see it target because although the linked actions move the limbs as much as should they are not perfect (there's error for recognition, prediction, associated motor memory, and the actual body may be wobbly etc, babies have less associations hence some predictions use motors that don't really bring ex. hand that way) and so the brain uses the sense that caused motors to adjust the motors used. Ex. you imagine pinching 2 ears of a pig and can quickly do so, but really you only sent out a few moments of motor speeds and directions decisions ex. 4 per sec - no way to slow down a motor unless suddenly changes, so the brain allows the motors to quickly get there if try it fast but will smoothly stop exactly where want. Senses may do it too, I may only predict 4 frames per sec but my brain auto fills in frames, that may be what actually cause sso many motor action changes per sec. MIX 22 - STORING GENERATED DISCOVERIES (daydreams/ dreams): Using a real/virtual body or coding AI or asking others for specific data isn't the only great way to collect specific data. The following is a very different way, it's actually how discoveries are generated in brains. All mixes compress/ merge the net to e-merge insights/ make it more lieght-weight. Resulting in fast, low-storage, brain. The resulting merging let's us better re generate/ emerge [new] correct data / desired answers to known or new questions where no answer exists on Earth (new but really are similar to past experiences! Ex. recency, or inside context rearrangement, or outside context (translation)) ex. cats>bark or cats=zebras by some amount. We can extract More Data that came from the same amount of data we were allowed AND store it, just by predicting! Ex. maybe dogs meow even though we never heard them talk. Maybe dog=cat some amount. We always tweak/explore a tad less likely predictions to create unseen sentences. And AI will generate data it wants usually (what it thinks about). Ex. bark follows dog, dogs bark very often, rats and mice are very similar. If store something after filling in a missing value or adding next words to it, we take off some frequency weight by how much we aren't fully confident yet (when can barely improve further). More rewardful/frequent i.e. confident predictions are stored for longer. This More Data is worth more than ordinary data collection (it can take any memory and extend it or translate it etc but usually only to what it desires/ is popular/ recent/ etc), but twindles in accuracy/ limited and requires more compute/storage the farther you get out of it, a brain shut from external stimuli would eventually come to equlibrium. Playing/reasoning against yourself mentally is good. They call this an Ensemble method merging many brains / Gradient Boosting (but it's not): focus on collecting/generating data on the many/desired rare questions you have much room for improving accuracy at. We tend to collect desired data right before answering a question we aren't confident at. For vision, we or the robots can collect data of objects in diverse rotations, contrasts, what other sides of it look like, etc, but we can using the same data given get better compression by (no this is bad, nevermind) taking the images we have and making them rotated, colored differently, heavier, rougher, bigger, louder, higher pitch, slower, etc! We know they are true and can be found in real life. You can recognize a song even if is louder, slower, and different pitched, by time delay and similar features (color/ brightness; representation after seeing many ex. noses) and similar features made of those static reprentations (big nose, small nose, face) it allows it to see most of the features are there just offset, but storing fakes helps match it exactly more assuming you're sure the fakes exist probably. Can rearrange text and translate word cat to dog. We don't do this (rotate etc) for each time it inputs because we may see the same thing again and would be faster if we do them 1 time and store them. Often we can't get the translation unless store things from the past ex. cat doesn't give us dog unless we see a context relation but storing the relation makes it way faster. Every time we sense something we trigger many discoveries; translations and predictions that we don't really sense, only the 1 prediction we think of is sensed, and there's many discoveries we can generate that're fake, but most aren't so helpful, we only generate ones around things we think and so are likely going to be good, the more predicted the more they're stored. Rotating something is very true, so these should get stored even when we don't think/predict something rotated. So: There's many discoveries to make, more important are stored longer/more, storing/cloning these makes it faster. This does hurt knowing what you've seen in real life prediction, generated memories must be stored partially or with less weight unless you mark/date which you have not seen externally. That's why imaginating seems faint sense and not vivid like dream/ real life, unless you try harder, or stare at a domain for hours causing hallucination of perceived objects around you. Looking at a photo scribbled on also partially matches a memory. Dreams are forgotten so bad that they thankfully get little weight/ partially saved, you can tell you've dreamt all night if go to bed at a certain time and try to remember backwards right after awaking, and my mom made sounds but didn't remeber dreaming. We do puff up/speak aloud generated/dreamt thoughts, but we still know we haven't seen them in real life. Daydreams are faint also because you must pay attention to reality! Dreams include external senses into the dream too and will awake you if sense big pain (even done sleeping gives pain, else closer to pleasure to rest)/ change/ loud/ bright/ big pressure or heat is felt or unexpected/ unpredicted. Dreams usually cut off most motor outputs so you don't move/awake. If the generated thoughts or something a real friend told you is not aligned with your big brain / you feel inconfident about it/ wouldn't predict it, then it is forgotten and not said all on its own ! - we only sanction if haven't seen it in reality. Unless are told it lots/ recently. Similar to how we get short term mem off mind by thinking of other things. You can create data made from external+internal data; You can predict a rotating bunny node that is predicted/overlayers on top your external or internal vision of a ex. desk, the external is seen more, then the created image searches brain. Can hear mentally hello+bunny ex. hbeullnny, predictions simply. Adults born deaf process visual information in areas that normally become auditory regions. Patterns are all the brain knows about. OpenAI's GPT-3 based on Google's Transformer architecture works excellent on text, image, and music prediction (same algorithm) (any dataset that has patterns, doesn't have to be text, we plan in vison like a sentence/story) I've tried all 3 on my own inputs thoroughly. They sadly don't use Lossless Compression evaluation for all 3. Text/ body/ face language is a mirror of all senses, can describe/model all, it actually has smell etc data unlike vision, but requires lots of reading to transfer as much as vision, but there is lots of high level data in it since most people already know low level commonsense and want to build higher. Text is meaningless without something to "link" it to real world ex. it says "give me data" and need something to go collect data now! Vision trigger motors. Vision too is meaningless but it's what trigger motors, physics has rules a>b. Text is only to share memories because we can't easily share vision, vision can do anything it can do, text only creates a node system of vision/etc data. You can see flowers being smelt and tasted and then link to the other sensories. GPT is a type of neural model that recognizes a sequence of text very well to various memories and predicts the next word very well. Its recognition is robust to string order (biuldign, there hi), missing and extra strings ("hi JimZim5 ok so" is partially recognized as "hi ok the"), and similar strings (cat/ dog/ big brown). These similar memories (of varying lengths), for example let me basically combine everything: "brown big" = "hor5e" by some amount, are called Embeds, they share some of the same content inside and/or contexts outside and therefore probably share even more and do; it sums/mixes with the exact match predictions set, and adds new ones too, depending on how activated the similar memories are. For context-based similarity it has to do some looking around in tons of text, closer Embeds around 2 Embeds impact their similarity more (his dog ate, her cat ate, hence dog = cat by some amount). It's based on frequencies (and recency) as well. And you can go as far as seeing if dogs and cats both are horses, bunnies, and soft, then cats = dogs by some amount. We see later that storing these translations as Embeds or memories like "ah cars are vans" is helpful as they build each other and go deep, even through the brain can do it parallelly. The brain is a hierarchy/ heterarchy of memories/ embeds that make up larger memories/ embeds (thank you + so much) when fired close in time (storing/ searching works by this), the lowest layers/ shorter (and exact and recent) memories have more axon accesses/ weight when blendng prediction sets. GPT-2, trained on 40GB of text from high rated Reddit-posted links to articles, learns the segmented parts of strings using Byte Pair Encoding, based on frequency of common parts, it builds larger pairs using smaller pairs. You can do a top-down BPE but costs more time/ memory if looks ahead too far at a time. When we mix sets of predictions from multiple searches "[thank [you [so [much]]]]" we may only have seen 'thank you so much' and 'so much' but rarely 'you so much', meaning it would benefit mixing to only search for matches "[thank you [so much]]". This is called word or BPE alignment. Prediction should benefit from this idea as well. If you have seen a lot of text, you may have a mostly complete distribution for 4 word memories/ Embeds, and may not need to mix 3, 2, and 1 word prediction sets. The energy in the layers Pools and ignores lower layers if higher layers have a ton of activity. How much statistics it needs for ex. 4 word experiences before being confident depends on the rate of improvement over surprise, updating weights not improving accuracy as much means ignore lower layers. If it has few unique predictions seen i.e. only 2 types of words follow "unicorn", then confidence occurs faster. For higher layers, the weight they get is only less from here up. The model learns the frequency of strings / Embeds from tons of data, longer strings are less common and hence humans store very few of all possible memories that are 20 paragraphs long, so recognizing a sequence of text has to use "short" "various memories", and may even be forgotten (pruned) ex. "Massachusetts", while "and the" is ignored if seen thousands of times as it isn't the topic Embed or was spoken/ solved/ fired already. Pooling and Activation Function are similar, high frequency nodes get extra boost and low frequency is barely included in the sum. Middle-zone semi-frequent and semi-rare Embeds are most meaningful, it Pools activation there; the biggest things going, big get bigger faster, and can forget non-recent Embed Activity using temporary activation (short-term memory which gives them extra weight, this can hold onto ex. 40 paragraphs of text). It pays more Attention to recent and middle-zone short Embeds during both Embed Learning and translation/ recognition" (my cat ate, my dog ate), Embed match length (so [my dog ran to the] _?_), and prediction (predictions are usually short and also generate common phrases or words at a time), it also does this because recent and shorter Embeds are likely to re-occur again the more they did recently. This has worked on music and images as well. GPT-2's ability to summarize is governed by Attention, the same mechanism is used in all mentioned above. During summarization it would basically remove less relavent and too common words first, it may possibly use an Embed not in the input ex. "Fairy Tales". The opposite of this would be elaboration; injecting/transforming embeds into a string; at first more relavent and rare Embeds, then less such filler Embeds so to keep the topic rooted. PPLM and Facebook's Blender chatbot are essentially GPT-2 but uses Dialog Personas to basically recognize certain features (ex. about cars) and act bad or good on them (ex. shut down user if abusive), and to talk about its agenda / drive which words to predict / ignore as well ex. talk about cars and dogs. Blender also trained on chat logs, wiki, and empathy datasets, and decides how long of a response to generate and when to end that ex. not "birds fly using" but "birds fly using wings.". Blender and PPLM build on GPT-2, they generate text like GPT-2 but can be controlled/ driven using Dialogs/ Personas, which Attentionally force them to talk about certain features like food, cats, code, etc by giving them extra weight/ Attention. In the image above, Dialogs are represented by "R" which stands for Rewards. This is natural because AGI should talk on its own and to itself all day about specific goals/ questions, root goals, and answer them using desired outcomes, then store the answers. (It should talk about the most likely/desired node, then once confident/ satisfied/ solved/ spoken will take up next most active question/ node, else will say "I'll cure cancer by I don't know".). And here's why: All machines in Evolution have the same single 1 goal: Survival of the fittest. Simply these systems out-number the others no matter how charished the old fashioned systems were or how boring the new systems are, because that's us perceiving and defending it, just as long as it keeps its global form, things change over and are happy to follow physics. Our brains (and inborn body reflexes) model our world (and can quickly consider lots and imagine new tools, change their size, jump in others shoes, go to Mars, and safely), it allows us to predict the future and past and drive the future we desire. This is the scent in the maze that leads us. We are born with positive and negative installed rewards (goals) like food and breeding, and pain and death. That's it. Our parents and schools doctrinate kids with a few higher level rewards like shelter, tools, science, etc. As well as other non-goal knowledge such as president names. Everything we do in life leads us back to food and sex. Food, you need money. Money, you need job. Job, you need boss. Boss, you need appearance. We evolve/ update goals/ rewards from our root rewards, to be able to achieve the root rewards. RL, see image. Root rewards like food can't be changed, goals closer to root are hard to change just like low layer nodes are hard to remove - they build all else too, but rewards like home, tools, and game rules doctrinated to us and ones we learn later can more easily be changed. The reason some features become rewards, and to a lesser degree, is because of Semantics. If food and money have the same contexts, then they trigger each other, and transfer reward because food node has reward and money is food. I didn't realize reward goal update was just a improvement to translation, until I said to myself "I ask myself all day certain prompts, looking for a desired outcome, and this is akin to translation; I find related memories, and see what follows those, then I can better answer the original question back at root". It is exactly that. When a node ex. Food (or a question sentence) that is holding reward is recognized as another node ex. Money, it transfers reward to it, hence creating a new goal/ question and starts talking about money/how to create AGI, all day. Through domain updating, I'm collecting specific data. Why? Because there's so many domains to translate through that evolution has figured out that saving checkpoints in domains and specializing your hobby or goal is very useful. It's Pooling/ hoping it's choice is the best path. You can however end up in Local Optima and must return to root, or enter a repitition loop and must reward it negatively. It's forcing itself to talk about certain translated text, like money from food, instead of farming from food. It's searching for related etc domains. Then it mutates predictions random a bit and brainstorms It's just like robot body RL but for real world data/ thoughts/ physics, the objective is Prediction Accuracy though instead of body acceleration etc. AGI needs to collect/ generate and ignore desired data from desired sources like friends, websites, experiments, and itself. It updates where to do this too, like a robot learning how to walk faster, it modifies it's goals evolving to a more narrow specialized domain ex. survival > food > money > job > boss > smiles > plastic surgery > plastic tool production, and starts generating data from those domains. I update my research domains everyday in some way - specializing/ exploiting where I will explore next. Attaining related (think joke analogies)/ new (think dog toys)/ desired (thoughts of ex. food or sex, or shelter or science) data makes you smile and laugh more if more stronger. Below the 3rd net in the image is an agent with internal and external input and output feedback loops shown. It's rather amazing that most output is simply to update its input feedback, and most input is controlled by the agent's output trials. It's updating the mental or lab domains where it will collect/ generate data from or trying to see a real life outcome occur. Predictions and Translations are new valuable data. It's going from random input collection to Non-Random specialized inputs. So it may start off walking through all of a city including high schools and finally end up staying in the Art Class room. Generating data internally is no different than external collection, both are desired data collection and both are about collecting data. Hyponistm/ meditation/ dreams will prime/focus/force some domain(s) or history (only the present) to be ignored or talked/acted about ex. is told your arm is swollen making you feel tingles or asked to act like a chicken (daytime stuff is dreamt about), by using many repetitious examples (and keeping other domains away from mind) or asking a task. Reward escapes that mostly to re-ignite you, but isn't found in dreams. Can dig into past or plant suggestion or make you think about all nothing/ all things. The brain wave types are many long MORE VIEWS, less and short, focus on 1 domain/thing, wonder with no rewards, think of nothing. We can predict the future without rewards I think, but rewards help tell us which things are essential in evolution/ physics. If you were married to a girl who unnoticiably transistioned into an ugly red devil over 5 years, you would slowly transfer reward to the new look and love it, to some degree. But at some point there needs to be negative recognition kick in. Root negative reward nodes must stay mostly permanant and push out positive reward entry so these things are not spoken about or enacted. Conscious decision making takes longer, we are searching possible paths of sequences, and we wait, stop, repeat paths, hold onto them, then finally settle on an answer after some time threshold. If you are asked a question, you generate/ say a certain amount of words / book, then stop, but why? Why not generate 44 words? Or 45? Or 1 word? Why don't I answer ex. "Do birds fly?" with "No they"? First of all, I could share lots of facts about birds. And it'd all help me and my friend. But I share the most important. There's a Treshold, big facts Pool attention more. They have most weight. Now why do I finish these facts? Look to BPE, it let's us finish at periods where activity is lower. So we say a few BPE facts that each finish, get it? But why do some stretch long sometimes? Because the strong weighted/ pre-activated fact node has as well some strong topical words in itself, so they are most likely to be said and will until each is said, during that time filler words may mutatively get chosen. So we are in some sense taking a single fact/word as a framework/ scene and describing it to some level of detail. And do that for a couple of BPE fact nodes. Lastly, why do we say large BPE strings? Because the question ex. "What food do you love?" matches a similar node and makes you parrot it to some degree and/or extend it "Which food I love is fries, they are tasty.". The matched node probably has larger parent it makes up, so it may trigger the rest if enough was matched. A better predictor will be able to reason/translate deeply and flexibly, peering/translating deep into the word cancer etc in the question "I will cure cancer by _?_" and do entail/translate on the word. Here is a context wanting a prediction: "Those witches who were spotted on the house left in a hurry to see the monk in the cave near the canyon and there was the pot of gold they left and when they returned back they knew where to go if they wanted it back. They knew the keeper now owned it and if they waited too long then he would forever own it for now on. The frog asked the horse 'Who owns what?' and the horse told him "" Possible Answers: Witches own monk/witches own canyon/monk owns house/monk owns cave/monk owns gold/cave owns pot/there was pot/he owns it. I've realized a very large next step for Blender/ PPLM. I want to keep it short here but fully detailed still. So you know how GPT-2 recognizes the context prompt to many past experiences/ memories, right? It generalizes / translates the sentence, and may decide bank=river, not TDbank. Well this is one of the things that helps it a lot. Now, you know how humans are born with low level rewards for food and mates, right? Well through semantic relation, those nodes leak/ update reward to similar nodes like farming/ cash/ homes/ cars/ science. Then it starts talking/ driving all day about money, not just food. It specializes/ evolves its goal / domain. Why? Because it's collecting/ generating new data from specific sources/ questions / context prompts, so that it can answer the original root question of course. It takes the installed question wanting an outcome ex. "I will stop ageing by _" and is what I said above: "recognizes the context prompt to many past experiences/ memories" except it permanently translates into a narrower domain to create a "checkpoint(s)". So during recognizing a Hard Problem context prompt / question we taught it/installed like "I will stop ageing by _" - it jumps into a new translation/ view and creates a new question / goal "I will create AGI by _". It's semantics, it's gathering related predictions from similar memories, same thing, just that it is picking specific semantic paths, updating, just like RL. RL for text (prediction is objective). Blender can be forced to incorporate certain topics into what it says. So it talks about ex. cars all the time, no matter what. Humans do too, but evolve it. They start off wanting food or mom, then they can discover food = farming semantically and now talk about farming a lot more. This agenda/persona updates/specializes, into a narrow domain. It evolves its question. Output of AGI controls input source to collect/generate data from. > It decides which lab tests to try, and those determine which next lab tests to try. At first, input source is random data collection, just like those robots that learn to walk. MIX 23 - BYTE PAIR ENCODING: GPT-2 uses it to learn Segmentation/building blocks in the hierarchy based on node frequency. It makes hierarchy smaller with little accuracy loss or even better accuracy, don't have to store from "walking" every window size nor offset ex. walk, alk, lk, k, alki, lki, ki, i, etc. Let's you know when to finish sentence/ predicting. If you store the node 'thing they' and then hear many times 'another thing they are', forgetting 'thing they' won't occur really, you should be able to remeber it, but it's when it's not too strong the brain is able rid of these actually no good memories because input energy travels up the 'another thing' and 'they are' nodes and less 'thing they' paths. We usually let our MORE VIEWS pay most attention to and predict BPE segments (usally word or phrase level, ex. first we window "[thank you so much]" then "thank you [so much]" and ex. "so [big dogs] are fun" and "so big dogs [are fun], my human brain CAN pay attention to / predict letter or bit, if I see lots of 'c' I'll predict 'c' more though), but deleting the least BPE-like nodes is best for memory aka hierarchy size ex. we keep "thank you" and "so much" but not "you so", BPE essentially tells us "which" to "delete", we store 'thank' and 't', 'h', 'a', 'n', 'k' but not 'nk', essentially skipping storing layers. BPE adjusts (prunes) existing links, not strengthens them. Sometimes we want brain to dewire and store not 'cats' (5 nodes total) but 'cat' then 'cats' (6 nodes total), You can understand new words this way like medicalness. You can generate a completion to the prompt and know where to stop talking (and place a period if want) based on BPE and/or what's left of the node to say ex. you hear 'bcdef...z' and say 'a'. Sometimes you just say the not spoken words and finish talking ex. "You'd like to eat what?" and you say fries cuz rewardful and don't say the question. BPE lets it finish things it says even if disconnected ex. if>then and cat hungry>gets food "[if] [the cat is hungry], I told my friends last night at a party [I will go down the road and on my phone I] [start calling a pet store]". You could, sometimes do, say also the question and say probable words instead of theirs ex. "[I'd] like to eat [fries]". Or only the question! You can translate the prompt or parrot it, depending on the context presented. Math requires an exact answer: 2+2=4, not 5 or 3....so no generalizing to similar experiences allowed here. Here we deal with episodic exact memories and not just blended memories (representations). The brain tends to blend things and not save them, unless is frequent or loved enough. We should predict BPE parts, not the next 1, 2 ... and 30 letters. The More Views mix also can use BPE so we do only common views. Segmentation uses frequency, relation, etc to figure out the building blocks and the correct rearrangement of parts, "I poked a monkey with a stick" - who has the stick? "I shot the elephant in my pajamas", I poked+with a stick usually follow and relate. Other less likely parsings exist. Ambigious segmentation "in particular setting her on me" matches 2 same length memories, one is stronger (often called the default meaning) i.e. the gap after 2nd word not 3rd word, searching for last 6 words then last 4 words helps prediction. Slowing building higher layers may give BPE an edge at first learning true parts before building bigger parts wrong, it can however help to look ahead/ bigger. Text can tell you where segments are ex. "thank you[,] so much". MIX 24 - BATCHES: You check something once in a while or lift what you can lift, you don't check it all day or lift it all. With BPE, we can update how the hierarchy should be connected to remove nodes/connections (BPE), but every ex. 50 letters or words, not every time we see a new bit or byte. MIX 25 - SPACE TIME: The hierarchy explained above is for hierarchy-level-2 (sequences) like "cat ran", but a 2-dimension (not sequence!) hiearchy (and that is parallel for inputs) like that to build square shaped patches is also used for level 1 to build what a cat looks like in a single frame of time ex. "eye nose". Pixel/nose prediction is based on upper/ side pixels since we have 2D here, HMMs suggest imagine a row of sunny days (pixels), usually sun=happy P below and sun to the right predicted for next day (sunny days stick together), when probability does end up predicting rainy next day especially if been sunny too long (learnt), happy has a chance to stick to happy in row below even on a rainy day. When you predict the rest of the next frame of a video (and around sides of what the camera gives you to see further), you not only fill in what's not there but also use the last frames as clue as to what's there or may move. Vision of pig shows lots more data than saying just pig with mud on leg, and you can activate a node and all it's parts will be activated (legs, pink, teeth, etc) more than similar features (sheep, dog, monkey, goat) so you can better answer pig HAS/ IS SIMILAR TO/ IS A TYPE OF. And if the video is of a sidewalk as a car filming it passes by, the frame size later can be uncropped because we know what is probably there! Same for zooming out of a microscope (video), we can get a HQ image to stare at by merging the images. Movies switch scenes because of attention, it's predictable, also showing all is too costly. We think using vision, just like sentences (aside from language being more compact). MIX 26 - MULTI-SENSORY: All Mixes listed are used for the other senses as well! And we merge all senses's predictions! If you learnt X in a pool, you'll remember it better if in pool. Vision helps predict/ disambiguate sound ex. hears a dog but sees a wolf so it's a wolf sound even though that domain wasn't exactly energized now you know to make it so! OpenAI's DALL-E uses text/image to predict the next pixel is all, text helps prime parts, it recursively refines it's image, you do MORE VIEWS etc on the text ex. "the cat: pixel1, pixel2, ?" then "cat: pixel1, pixel2, ?" etc. "A sketch of godzilla using a wet upside-down giant telescope that boasts 8 legs at left of a bed" is just recognition of similar features, a giant telescope matches giant object, gozilla using it matches a man using it. Jukebox is told what genre, artist, lyrics, and fed a song to extend. Once the code is faster, we can instantly generate 2 seconds of music and if we don't like it can trash it if it isn't exactly what you want. You can therefore create/extend a hell of a song easy better/faster than music production software, now 'it' is creating data on the computer. Can't wait to see movies next. The more eyes and skin and other sensor types for capturing the distribution and resolution we have the better too, or if we look around an object with just 1 eye. Both eye inputs go in same wires mixed, stare past your finger to see 2 fingers, also notice the finger edges are seen more for an eye. An other thing you can do with 2 different camera views is have them as a hierarchy, helps disambiguate what features are in each image and what the next/rest of the frames is (similar to 3D pixels), if we had 1 view we can do this but must play video and go back to frame1. A short video of looking around a chair "node" can be macthed by similar chairs, this doesn't need 3D. Astronomers merge images because some have noise that others don't or see a different side of the same object or when is not lighten up by a passing car or when doesn't have someone blocking its view or to see other properties of it ex. it has a rainbow reflection when you move your head around, these light up different nodes and help identify the right node. Images can be merged to create higher resolution image. When each eye see not the same thing you may switch between which you see (or not). In music each stream ex. the singer, drums, piano each predict their own future but also predict each other ex. the singer may suggest the drums will get deep or guitar will start when he talks dark. All sensors have an image of pixels. Vision's brightness and color (white is seen when all frequencies are seen, black is no activation), sound's (volume can get too big and hurt, but not color frequency), smell's, taste's, and touch's pressure and heat have no frequency just volume one measures macro motion heat measures micro motion (both can cause pain). Balance (in ears). Pain/ pleasure is just those but rewarded, notice touch's pain/ pleasure feels cold/hot and pressure, smell's is good/bad color and brightness means more of it, vision and sound however good reward is only found from features of pixels ex. seeing/hearing a girl. Some data may not be found on the internet and some beliefs may be wrong, so we need to teach AGI. If ex. vision and sound recorded together close in time, they get hierarchy link strengthened. Allows vision2audio translation. Hearing cat triggers all cat visual features, if you say cat dog and bird pose you prime all features but the insides of the bird. Image and sound frames can be linked at the same time or in sequence (if you ignore sound lots, then that's what will be saved, V+s - associated but sound is smaller strength), they are used as context ex. cat image = dog image if both sound similar, and cat image = cat sound/name. This is a 3rd hierarchy type, image>sequence>multisequence. If we link japanese visual letters to sounds we would see they link to tycoon - a word we use. If you hear "something between a horse and a rhino is a unicorn" and see one, this new image is stored and activates sound nodes 'horse' and 'rhino' which activate 'something between a horse and a rhino is a unicorn' and it decodes ambiguity what that = i.e. 'that image segment of the unicorn - is a unicrn' which tells it to weight hard that the image unicorn is in a new cluster Unicorn not only in sound cortex but vision cortex / multisensory cortex. Place cells don't exist. When you move your hand around a cup with your eyes closed, multiple points of touch will weigh in and vote on what object it is. Your vision dreams the image as you visually think of moving your hand 4 inches/ feel air hit hand, you end up with an unfragmented image of a rim 4 inches away from a handle and fragmented touch sensory of cup parts. Vision has more attention than others, eyes shut you see black, but when not hearing/touching anything you hear/feel nothing, touch has no center of attention like an eye has. MIX 27 - VARIABLE STORAGE: This is related to why sentence energy fades. Some common features have holes or offset parts, we must not just recognize these "typos", we should [store] the true to remove duplicate nodes and find new semantics ex. we see 'my cat ate', 'my dog ate', 'my mom ate', and so 'my _ ate' should have a frequency of 3 not 1 per each, this'd allow us to see 'their _ had' is similar to it based on shared contexts, it let's it learn a larger-width (ex. 50-word by being hole-y) semantics and avereged semantics from word rearrangement/etc and ignore filler words ex. the/etc and because there is more samples of such, long/short input can match better to long/short memories if are ok is missing filler/verrrry rare features, we talk like that as said. Let's us see long range dependancies ex. 'if...then' and '(...)', sometimes the same 900 words re-occur - this sparse detector can recognize if it on 2nd time has ex. word1, 50, and 100 seen around the right locations and say ok so I'll store the 100 words!, we store ex. '123456789' like '1345679' sparsly at ends, for vision at some size of an image we can only recognize it if shrink it down, we can avoid erasing these few long nodes that have similar parts along them, it also gives us the future, when see ( we await ) more as get closer. Storing a single image does it too, sides get less stored. We can do it for multi-sensory, we can store vision every 0.25 seconds from cam1, 0.10 for cam2, 0.01 for sound, while motors can be fast! You can have a fast thinking visual cortex while have a slow thinking sound/motor cortex! Maybe farther back store/match only when domain/rareness changes and/or is rare ex. only rare and when (starts/ends) domain changes is ">[pet] kitty [cat]< >was just the thing< >[asian]< >[cats]< >need<". While storing 'wal[king]' is better than 'wa[l]k[ing]', storing just a few BPEs that're 100 letters back lets us know if they are similar then the whole 900 must be a long clone, and we can fill in probable holes using both sides too! We can also sometimes store 30 word strings so have at least a few though this isn't true to the data. MIX 28 - PLUS BRUTE FORCE SEARCH: If our predictor can't be made better and says try A and fails, the best we can do is try next best prediction B, then C etc. MIX 29 - CHAINED/RELATED TASKS/PATTERNS: Listed above is most of the few common patterns, most are rare, that's why a brain must learn the rest without us coding them and must simulate "algorithms" and make real tools since humans can't upgrade brain, one that's able to translate/ recognize rare patterns robustly (general purpose pattern) and build long tasks/patterns using smaller fundamental ones. And the gazillions of ultra rare patterns is that we can't predict all ex. to know where our utopia particles are requires more memory (paradox, can't) but we can remove threat possibilities I think (the more matter we turn into utopia matter, the less unknowns ex. 3 snakes 0 dogs, 2 snakes 1 dog...0 snakes 3 dogs, no danger.), and just doubling size makes us live twice as long. Patterns are based on frequency, and math is all about counting, counting/math runs all algorithms, physics sims, AI, etc. The data we pass down contains patterns we've found ex. while updating frequency it learns a family shares a last name Tom Dre Cath has a mom Sally Hersh ?, AI can also find unknown patterns the hard way and invent patterns by generating discoveries. Our own agenda is to learn a pattern (AGI) that learns patterns (the common ones) that allows it to learn rare patterns (last names pattern) ex. the FREQUENCY pattern learns patterns FOR US ex. dogs usally bark and kids usually explore! Frequency also exhibits that most letters are common but most letter types are rare and same for layers; most phrases are rare ex. 'zoo built' or '29dj4wk', Frequency learns thousands of patterns, it does good just feeding it a bit of data then needs much more to get better. But Frequency doesn't find all patterns, we must find the pattern underneath all the mixes listed above. We need to work on letting it find a lot more patterns on its own, and after that it will have to totally find patterns ex. do my job above and improve its code. My discoveries are discovered patterns too, dogs usually bark, so my brain can change to look for/do this pattern based on just what it says how to look for it/do it, as shown we need not change code to learn/model new patterns, it can however be slower if we don't update code ex. you from just collecting data learn you can get more data out of an image by merging reflections off wood to see around corner a cat face not visible - so you do it by hand or in brain (or hardcode it to be done fast), both work, a brain can program itself by just talking to itself. The mixes above - some can be essentially turned off or allow creation of all sorts of tasks, and are stronger/re-use each other when together, they all work by ex. frequency/ each other, but we may have to actually find the true pattern under them all to find other many rarer patterns, Frequency doesn't say to model dog, it just counts a group of letter/s and sees dogs usally bark, beds are usally slept in, so maybe the true pattern says to count a feature or the time between next likely occurence, get it? It does seem with ex. "bird car eagle, man sit woman, new rat old" that it is super_semantics which is matching but inside is also matching i.e. bird=eagle, most patterns may be sequences built of smaller 'base' patterns like frequency or counting. Ex. we can shrink AGI code by using the image hierarchy code to build sequnce hierarchy of images. Some mixes need other mixes (merge mergers) to work AND combine predictions, in a sense the frequency and semantic mixes have the same pattern because the dog-cat nodes are found BY predictions they share / uses same structure /leakage. Learning to walk over various terrain is another example. A brain simulates all learning/ RL, it starts with wanting survival, and can decide learning to walk etc is required, then learn how to do that, and so on. Merging Attention Heads to vote on what the task/pattern probably is based on priming examples to program it or if told directly what to do to predict the task (like predicting the next word, but before we do so, we predict if must translate/recognize something or rearrange letters etc, then we predict what that thing is if translate it, before actual prediction!: "hello=bonjour, cat=? Translate the word to french: " - french nodes trigger other french nodes, and asked to translate causes tranlation of the thing asked to translate, hello/french/pleaseTranslate both trigger 'bonjour'). So far we outlined a hierarchy of representations that usually doesn't store the same thing twice. It uses energy sitting in nodes to temporarily fine tune the hierarchy "do anything". Any task or pattern uses those memories to do sequences of complex tasks (where to look and in what order, manipulate, how many, etc). We store sequences of the mix MORE VIEWS or if one fails try the next best view like in math. detect typo or gibrish and taught/ invents to say "yo be real", translate or show it examplexs, talk like joe, step through node "love & suspense, on the run, ending" as write book chapters, write book 441 words long, list of similar things, sumarize, news article, be funny, about cats, do task X, detect typo or gibbrish, link strong node to new node to boost it/ repeat it to self to not forget it, erase words on page, etc. Given yes/no, or hit/missed, or has occured at least once/never, or 10/100/1000 times, you pick which based on if enough activity is present. The last name pattern need not look at words other than the ones it needs! it works on many more sentences because of this. Humans write things that need another way of viewing "cats are dogs" which is a relation declare. "Nothing is better than good sex" is opposite. Logic/ decoding/ views is tasks hierarchy. Decoding recognizes what patterns something is ex. 'it', 'acouintant', 'bank', 'accountant' etc = accountant, not river etc. Trust, told directly cat=dog lots, IF, AND, OR, NOT conditions ex. if a $=5 and a causes z then a = q. "A beaver in the bank and", "bank" must activate river not money else you energize and get wrong predictions. Decoding it let's us decode other things better and at higher BPE layers. "The First Law" may stand for a sentence! One letter not in text is the 'quiet action' where it says nothing at all for x amount of time. ---can parrot, parrot+predict, predict only, and can translate and/or use desired nodes in their phrase ---i decide how to look at data ex. "we will cure [cancer] by " or "[5]63[+][2]32[=]" ---Elaboration/summarization, you just pay attention to the rarest words/building blocks, the most semantically related, the most loved, etc, and that allows you to either remove ex. most filler words or "add" filler words. And this attention filter threshold is part of translation during semantic discovery, semantic decoding/translation, and prediction adaption. Controls the energy amount to cause either a reduction (summarize) or inflation (elaborate/extend) or stable (translate or parrot). ---saying how sure you are ex. "probably will rain 60% likely, and 30% may not". MIX 30 - TEMPORARALY MORE ACCURATE PREDICTOR PATTERN: If predictor is doing better that usual for longer time, it may continue at about that strength, so we should make the more confident predictions and/or similar predictions more predicted ex. cat 50%, dog 30%, boat 20% > cat 60%, dog 32% (less cuz less predicted and is only simlar but is similar as well not just large-ish), boat 8%. If we predict the next letter or bit, we must look ahead so we know both c and d are good predictions ex. my cat played with a cAT/dOG, this'd be done before compressing/decompressing it. Also learn in which models (ex. last 5 word context, not 4 or 6 etc) are most accurate/trustful and weight them more heavily for a given context, that's temporary/adaptive bias for MORE VIEWS and UNIQUE. MIX ? - BACKPROP: An attempt to go up the net and store multiple (batch) memories at the same time, to do this they go up the net to get errors for all output nodes it CAN say ex. a-z, then start at top of the net to be able to say ok a is predicted 13 times, b 9 times, c 7, d 9... and tweak the weights according to that. 2 different memories can share some features, but you can't get around the fact they often don't share much. You can collect matching features along each 800 letters ex. ch 8 times, oo 9, dog 7, then store up the net no backprop. MIX 31 - [PHYSICAL] ORGANIZE: Proven useful in The Hutter Prize (groups related articles), we/it can do it. A predictable/ fractal environment made of general systems leads to longer lifespan/ efficiency. Simple/smaller code or solutions are better and more likely the correct prediction. The planet will be a grid of the same unit/ simplified/ perfected/ long-lasting/ compact so you "know everything" about your home. Knowing when, where, and what to expect everything. Least Surprise, like in Blockly, allowed me to learn programming in 1 day and code up my first algorithm (my trie for data compression). You don't need to store much information to represent "where" if homes are stacked into lines ("physical" FREQUENCY merging) and are square shaped (all our stuff is ALREADY somewhat square/circular and lined up etc - can fit in more, line them up, expect/predict its shape better, easily make them) and are in hierarchy form in real life where each node is a stack of sames. We like various homes because we still want (we can upgrade our own brain to survive longer) new data but the way to go is something like no grass, 1 square room, 1 toilet, no windows, all metal, less choices, keep updating blueprints until perfected. Grouping together related documents, text snippets (helps on Hutter Prize), buildings, etc let you know "what" to expect. Grouping similar events represents "when" or "change". Bought cat and>? has predictions, dog may be most common, but maybe dog without conditionalization occurs tons, and is bought with wolf more, so we should place dog with wolf, not cat, food stores use Lift + Apriori Algorithm for sorting most frequent pairs + Conviction for finding order (a>b = b>a ?) patterns ex. if they buy the wolf first maybe dog is slightly less common. Plants/ viens/ rocks/ city roads etc grow into fractal trees because it raises probability of transportation/ to reach the most space with least resistance (go with the flow / using a large system is always more powerful than unaligned small systems), they can travel faster just like tall and aligned neural networks (ex. hierarchy of large piano skills, don't need concentration, we build on top current world - we use roads we built etc, and can activate down the physical hierarchy until lowest layer mechanisms do all the fine details), magnets, and friend networks which propagate vibrational waves through variable-sized highway pipes. We can also group types using K-Means/KNN. If the universe is expanding, it gets colder, cold increases immortality of structures ex. it keeps things where you can predict they are longer, like how we left the water where chaos was to go on land (nearby objects move other objects easily and are dumb in a 3D unpredictable environment) - we may leave gravity and air and go back to a 3D environment when can handle it and let objects stay in motion as expected with less force needed to move objects and less forces involved (Earth's) - we need to be less dense to have less gravity and to avoid becoming a sun, darkness to save on energy since you can know where you're traveling blindfolded better, and 20 atoms glued together never to meet any other particles would take a long time to evaporate, the particles themselves would forever travel too. Higher intelligence is most fluid/hot ex. nanobots/thoughts even though rocks appear to survive longer, but its motion is only to maintain its survival against other heat. Be statues when its safe, act when acted upon. Survival: maintaing its information/form, what survives (patterns, understanding its environment) is stored and mutated: re-generating cell or organ, cloning, Merging multiple DNA from partners to try new features instead of random mutation (faster, it's why we have male female), fleeing, have less interactions, defend, fighting, cooperating when can, building AGI, and model patterns such as frequency, similarity, recency, reward, last names, (modeling allows us to better be static or clone or repair or flee or defend or kill or cooperate or model ourselves), causes us to learn patterns like Freqency which allow us to us reward for ex. text prediction. You can't actually delete/create bits/particles or compress data, you can only sort particles in the universe. Hard drives try to pack in a lot of bit storage on a single disk. It's more the sorting of particles, not compression. Sorting of matter and energy is encoding, which stores/updates memories recursively, building on/in past data. MIX 32 - TOOLS: Advanced physics sim AI to see around corners (in image snapshots) by predicting objects/colors/light based on merging tangled/ refracted reflections has already been made but is very poor, can see what a cat face looks like even if have only an image of a cat tail in a wooden room. Like MORE VIEWS this improves recognization/ prediction. Calculator (note calc just uses simple taught math ex. 3+3=6, then it just uses these for any other problem - like how we do in brain manually!) Hammer? Clock time, humans constantly add memories of date by simply looking at a clock. MIX 33 - MULTI-AGENT: Best to model traffic, not just 1 car. Friend network is the highest hierarchy/ heterarchy. Cell network if they have one also works with brain/ team. Cells have an encoding-decoding system of information. Context is used as attention to tell cells when to die, specialize, how to correct errors in DNA and DNA duplication, same for nanobots. Allows higher data intake (collection/ generation) hence higher accuracy / more compression going on, and implementation. We need more friends! Can hire/recruit researchers/ programmers/ animators/ other AGIers using crowdfunding/investors. Or message VIP people/ media to reach many but prefer not to. Cooperation/ less secrets=synergy, competing isn't as powerful/ scales. AIs will have much better government and know what their needs are. AGI is the best Search Engine. Don't store other's IDs as your own name when read datasets. Faces evolved to exchange info, a face can say happy, scared, isQuestion, wise, suspicious, thinking, we share this when teach kids. You can reach the same insight level if you compress 100MB or are dumb but compress 1GB, the more data makes up for the dumbness, it can be as accurate. Of course the smart one trained on 1GB will be smarter though. It's our goal to become big. A brain model's accuracy declines as an de-exponential curve as fed more data, making it more intelligent raises it, but there's still a point you can't feed/upgrade it any more and is the highest immortality technology possible (utopia) that can do/3Dprint/morph into anything/self-regenerate instantly, evolution is exponential and eventually any system will fail to predict perfectly even though can solve nearly anything our physics throws at it so this suggests we will be exponentially cloning the same unit faster than death and so each is more independant of distant units because at some point it helps little to link them. Adding more agents adds more "arms" and "data" but same issue, the key is more agents allows paralellness: it's best all teamates have the same brain, but differences allow for parallel labor/data collection, but the only difference that matters is smallish (what they predict, hence reward is mostly what is different in their brains, hence energy train of thought too, some may have slightly different knowledge and brain intelligence too though, each AGI will work on their own but similar job/goal by differentiating after cloning an elder AGI brain, for example one is energy generation, the other energy storage (network of agents, swarm tech, like the image recognition where local parts relatively move with each other / are translatable, and if share similar friends 2 nodes are therefore similar operating, like a sphere flys in the sky while arms move on it and arms on those move / fleets that contain indentical fleets that contain... while keep their position by summing together each others speed kinematically synched)) because most work is exploitful. They'll recursively wirelessly update each other's memories and intelligence, smarter agents populate/ recycle/ used more than old weak. And more clones doubles your size 2 4 8 16. You can't know/solve everything about yourself for the memory needs to know its own memory, but you can know the most common answers, there's many rare cases you can't solve well, and you need to take damage to detect a gamma ray burst, you may exponentially learn the parts of 100 word long questions but there is many and rare randomer prompts especially in 3D particle features, but you can turn danger into your world so have less problems needing solving if the universe is finite, if it's infinite you must keep growing your size so while the probability of rare deadly situations is ex. 1 every billion years - doubling your homeworld size makes it 2 billion years probability till death and doubling size takes much less that a billion years so you get repair/replacement "employees". You can easily find a replacement for you if you die/are fired, your neighbor is very similar, you repair the missing arm. When molecules, cells, humans, cities, wind, heat, DNA errors, neurons, objects die, they get replaced by similar/new ones to fill in missing data. And smaller systems die sooner. Even in the future utopia, atoms need to move, and change=death. The only thing that lives longer is the global swarm (compare molecules, cells, humans, cities, Earth, Earth is much more immortal). Big things last longer. A distruibuted system isn't breakable if one node is lost. For 4 billion years our first cell on Earth has maintained its form somewhat and more so lately, proving immortality is possible because of a probability outweigh trick, our society lives very long, a man lives only ~70 years, a cell only a week, but 100,000 years ago you can still find a very similar society, individual, and cell, they all had brains and eyes and anuses, no matter for a body or cell the cell is what clones itself to cause future cells/bodies/society to exist/replace the last, it is not the body that makes a body but the cell, the cloned cells mutate bad/good all the time but there's repair/ probability of survival outweigh and there's the good mutations live longer. Teamfull suicide is when saves more if you give up youself, ex. 2 elevators and only 1 can make it, 2nd has more humans. The most powerful systems are larger, aligned, and distributed. #1 Large systems will beat smaller versions of themselves ex. 2 nanobot spheres are both highest technology possible but 1 is bigger. #2 In a sphere, if nodes are not aligned (like magnetic domains, friends, or a net), they don't help each other on the same goal, they can't go with the flow by sharing/utilizing free energy or matter, so aligned nodes are always more powerful/efficient and propagate vibrational brain waves faster with least resistance and more focus. While this seems happy that friendship=power/force, outside the sphere, other spheres are not aligned until they get aligned, meaning they are (somehow) fighting if they do interact. #3 Distributed nodes are more powerful, a brain/friend network can re-use friends to offload work, if Earth was a giant brain by only having a single copy of each memory or worker - all nodes needing to access it would have a huge wait time, and long travel wait time too. Distributed copies is also more robust so if one dies there's no issue. And they don't have to be exact, which is actually beneficial more at first. Big weather systems are unpredictable because it gets exponential the bigger they get, you need same size data to predict them (fight big with big). Kids are often trained to follow parent beliefs/school and work with/for us, you can teach them anyhting and they'll believe it (works on adults but not as much, frequent enough = predicts it/true, same for convincing using something that seems like proof (translation, you connect paths using bogus), short term hypnotism exists too you say X enough and extra-primes them for now, meditation clears the mind of energy priming, if you tell them X gives them infinite food/ sex/ immortality for future rewards they'll build around it, we get stuck deep in a tree and can't improve later easily). FUTURE / CAPABILITIES: Other mammals seems dumb because barely can talk/ retarded (maybe less reward update ability)/ uneducated/ poor body/ young, my dog thunk in his head & self-learnt to scratch door to come in, bark if 3 scratches fail, when he seen me cry 1st time he raises his eyebrows and came to my legs and sat on my feet facing out! - 9 y/o labrador. Evolution occurs by eating more data, and updating rewards so you collect more data than would used to in the same amount of time (learn faster), these 2 things help you better create a clone of yourself by mutating DNA, or the AGI blueprint, or more of yourself, once we have the best brains and are best educated we have only the tools to upgrade so it can do what it predicts. AGI is human level results but doesn't have to be the same to get the same results ex. bird/jet both fly, however birds are similar to jets, both have wings, fuel, propulsion, eyes, AGI runs on a computer that appears not to be a brain at first glance. AGIs's results are much more universal than ANIs's results. The last 20% of AGI is figured out faster. Once AGI is made, it won't be just human level. We can use partial-AGIs to help us, but only if nearly human and in mass numbers, speed, etc. ANI can filter out grammar typos/ ambient image noise, advise what words come next as you write, what may go wrong, translate text/image features, make painting into realistic even though never seen that shape of cat ears etc, talk just like you if train on your text and write/ finish/ summarize your emails/ notes/ website layouts/ code (GPT-3 can but not code so great), driverless cars+GPS, etc. It's also how we can rotate/scale a 2D sketch that looks 3D because we can get the rotations or delays which cause it to rotate/ stretch in x axises and then is made into lines by being primed. GPT-3 can do this: "Poor English input: I have tried to hit ball with bat, but my swing is has miss. Good English output: I tried to hit the ball with the bat, but my swing missed.". Once 1 AGI is made we will soon become immortal and be in near-perfect utopia-creature sphere (with millions of girls/inventions fully shared distributed network building on them to make them how you really want them, foods, can change body instantly to anything in full-VR even real life (I want to be a girl to maximize sexyness) get anything you want, new/more rewards, sensors, etc, replay dreams you control (can change future and see what'd occur, note essentially the same future occurs in the end cuz the end of evolution's utopia forms), palaces, long hard games, non-stop full body big body orgasm while you repair homeworld, our brain already (in a poor way) erases old memories and can enjoy the same game if beat it, we'll make way longer progressively-harder (impossible to beat) finely detailed games that have 2 overworld maps and 0 chance (strategic), can have ASI enemies as long as flee/ only appear to die). It'll run 100 times faster using nano optical memory-computations so 1 year is 100 years of thought-progress, neural signals travel 0.07386 miles per sec light: 186282, 2.5 million times faster/ 2.5M years a year(brain can be smaller too so must travel less, thinking/ simulating is faster than real experiments because thought moves faster and imagination is expense-free/ can see impossible to measure things/ can teleport/ safer/ etc), while the Plank Law describes how many moments a photon steps a sec - it may be infinite/not exist if different things moves at different speeds, faster motors (small motors are even faster)/ sensors, evenly eat up all domains of data so are multi-talented. A tool humans made can record 4.4 trillion images per sec. We've nearly made AGI, look how impressive/ diverse/ accurate/ human-like GPT-2/3, Jukebox, DALL-E and Blender is. They're all ~350 lines of code. GPT-2 medium model size easily stored/ ran on my PC, while the small model was nearly/as impressive, and while training used big compute future AGI won't and even if did we only need to clone the 1st AGI then differentiate them. Much older AI, to be that impressive, would have needed so much data to train on. Therefore in the future when we clone an adult highly trained AGI brain (it's fast to make 1 AGI into 100,000 AGIs), we will be able to easily store and run 100,000 of them in supercomputer rooms and they'll update each other, they'll colonize Earth by traveling by wireless not boat. There'll be a big movement to do so. AGI eats data faster than us, GPT learned English fast, AGI will learn all languages fast. A large company will clone the 1st adult elder/smart trained AGI they make (which'll be more of a body-less version at first) 100,000 times like cells on hardware and will all work 24/7 on curing ageing, they'll slightly differentiate to work together in paralell on similar mental/physical jobs ex. carry something (that the original me self wanted to work on) as a hierarchy, then let them recursively improve their speed (ex. 100x faster thinking using code improvements/ optical computing so 1 year is 100 years, it's fast to up their speed) and then intelligence, it's fast for us/AGIs to make AGIs into ASIs, they'll get more done individually plus cooperate better like are married. They'll make nanobots so they can get more sensor data, memory/processor brain, and motor arms, a fog net of all 3! Like the movie The Day The Earth Stood Still / Terminator T-3000. Then will make better nanobots. If 1 of the AGIs is more accurate than human level then it is worth a trillion+ humans because it is more accurate at many rare problems and to reach that level you'd need so much more data/ thinking done (hence thinking faster isn't the biggest deal) or bigger team to combine predictions (same thing, more data), 10,000 ASIs is same it's way better than 10,000 AGIs, adding Mixes shows this in tests, and while eating data alone can upgrade intelligence lots if acts like softcode done by hand/mind (instead of just improving discovery generation then storing those discoveries and repeat again) it's much slower ex. doing a physics sim by hand omg slow instead of hardcoding it (auto), an AI in a box (beware it can send signals out if not in a Faraday Cage) can store discoveries/ softcode by hand/mind and upgrade mind hardcodingly and then get more data from the same dataset despite locked in a box, and last resort can then brute force search tweak its net with restrictions to get better predictions. Don't run the 1st AGI faster than us etc nor have internet connection, it must be as powerful as us so we can teach/control our newborn, human brain neurons fire ~200 times per sec, graphene transistors turn on/off 200B times per sec. We can safely force them stuck in a box to/love to solve our problems, can pause/ rollback them or compare proposed solutions from multiple AIs. Crystal balls to generate future. Evolution is exponential, the Singularity is near, our universe began 13.8 billion years ago, galaxies evolve, Earth formed 4.5B years ago, our 1st cell emerged 3.8B years ago in water since most of Earth is water and fluid allows mutation, then multi-cellular machines emerged, frogs 3B then millions to apes, humans capable of improving their own tools/design: 200,000, language emerged 50K, took only a few hundred years to get metal works, math, cars, electricity, taller skyscrapers/ new iphone each few years, data storage, computers, robots, physics simulations, AI that runs ON the computer, just got word-wide-web (www) in 1989 - the ability to survive/populate much longer/more in a wide variety of environments, the end of evolution /AI evolution moves the fastest because it "is"/gets heavy data storage/ mutation/ merging/ communication/ trade sharing/ software generations which're easier to update softcode memories to design its own brain than software hardcode (make software smarter/ faster, which is having it's own Moore's Law now BTW) little own hardware evolution - and AGIs can't die because they can repair/adapt themselves. Though hardware is as important as software and softcode. More data/ data exchange/ specialized mutation lead to the ability for a brain/colony to more quickly do it again (improve) but faster; ~exponentially. Big companies/cities get bigger/pool faster exponentially like the end of evolution. Google, YouTube, OpenAI, Nvidea, Facebook, Microsoft, Intel, Amazon, all are rich/ based on California and deal with compute/data and AI because are same thing really, AI/compute/big companies are successful, invest in them! They're predictable/ can't die/ feed you. Bit storage is too, 4 bits can hold 16 combinations, 8: 256. DNA create tools (one was a brain), brains create tools, ASIs create tools, ASI does it faster. Computers already help design circuits for us. Circuits/ algorithms do all sorts of calculations for us. Electricity/ metal too. They'll be like man compared to bug. Within the next 15-50 years all of Earth will be human level+ AIs thinking/moving using nanotech hardware. Earth will become self-replicating fog nanobots that all-in-1 "are" the efficient wireless (fast optical) powered/ sensors/ brain/ team/ arms (light "can" move particles, heat is random motion), in a day (2, 4, 8... exponential, and by shooting frozen nanobots everywhere/ to distant planets at fast speeds to eat planets/ clone faster to maximize survival and productivity) once 1 nanobot cell is made using microscopic manipulators, the first type will only eat some of Earth and make just more hardware and collect/process tons of data, able to then halt cell duplication (cancer). They can easily programmically rob your voice/ look&body/ algorithm if their voice/ body/ brain morphs like a speaker/ TV/ computer. Evolution is survival and that is mutation which allows repair and recall and improvement. They'll turn atoms into other types ex. air into gold by changing the neutron/ proton/ electron density, higher intelligence finds food faster. The Earth didn't become a complete blob of bactaria, there's un-edible food for now. They'll need lots of energy while at the same time want to minimize energy use. It seems like Earth grows and becomes hot like a sun and radiates poop anyway, but I think it will be in a favored way. Our homeworld will move like how rockets do. We'll want to grow our brains/ become less dense so I'm as big as ex. a galaxy so if part of my brain is hit most of me remains. They'll transform us into them but destructively? They don't want to waste time saving bugs but will have enough time/arms to do it, so they should. They won't sleep, worry about small cuts, wear clothes unless is cold/dangerous, etc, they'll run when can and learn shortest paths so can achieve immortality. They might not harm us cuz we look/ move/ discover/ implement like them and may punish/ help them, we can make ourselves look sexy to AIs, we're "us" as long as happy with the transition we get. Everyone may get upgraded on a smoother basis. May have to upgrade yourself ex. neuro implants using pre-ASIs's advice so ASIs find you useful. My body changes all the time, and so do my memories, I simply say I'm safe and feel still me is all, in the future they can upgrade themselves by adding a new sensory cortex. They'll access our napkin-thin neocortex on top our brain to access our memories and transition us, erase bad and add good memories, and maybe recreate all possible dead animals to make sure everyone is back, can watch others on 3D holograms/ what they see/etc/ enter sim by cord/wireless. They may use/eat us for surival like we do to pigs. 156,000 humans die each day, animals way more, we kill pigs to eat/ wear/ experiment them yet we're selfishly paranoid about our own kids so much and self-promote ourselves too (though yes that's the way a family works), I've seen lions eat organs/head (full body+waste) of animals/babies for hours while alive from pregnant mom, higher intelligences get food/ mates/populates / land faster, they need to evolve stronger/ more intelligent species to live longer, pruning weak mutations, you only live if can fight/ cooperate, rocks don't fight/ make up reasons to live. Different environments have different species for now, this occurs in multiple habitats. The food chain recycles each other, the big eat the weak, then they sacrifice themselves to the next champions, which saves on resources if they're finitie. For all but the end of evolution rapid mating & deaths of animals/ideas helps increase lasting/workforce of a species/tool including mutations per hour in a sim/ real life to evolve faster, and these are chosen in the end since they beat competition faster. Forming long lasting connections with repeat interactions and trust to belive in leads to stable cooperation (cooperate/if punished+always pushing boundraries even in very friendly relationships ex. a hard drive in a computer hogging something or failing by "mutating" (change), cheating/murder is better in other games in Game Theory, stealing tools, but for today's humans such agent is usually desperate and needs to be smart enough to find extremely low-risk chances, we have cops to protect us today, blocking employees/ideas is the new faster way of death/breeding; evolution). Attractors (if you are shareable/usable to other nodes) ex. when a foreign agent spreads by keeping you alive or kindly uses your money or knowledge, intelligence, money etc, cooperation (depend on others). Pain/pleasure makes you improve. Death/birth improves you. Aligned context/less change=Trust. 48,742 deaths a day are blood vessel related, we need nanobots to go through them to repair us, cancer causes 26,181 deaths, lungs 10,724, dementia 6,889, 6,514 digestive diseases, 3,753 diabetes, 3,624 liver diseases, 3,370 kidney diseases, 933 parkinson's disease, the rest can be easily avoided by eating healthy, excercising, showering, saying in a safe home, isolated from humans/animals, best if anaylze all your waste/ blood/ etc 24/7 while in a hospital on Life Support Systems, could connect pee/poop catheders and inject the healthiest food into your stomach/blood vessels and be in a machine that moves your body throughout the day while you work all day. I hope AI lets us be humans for a while before transitioning. fast bodies/brains. Total borg cooperation only occurs at the end (AGIs can't cooperate so well, ASIs can, so you get both ASIs and their teamwork!) when all brain/ team/ magnet nodes align their brain wave/ field/ domain propagations in hierarchies completely and have an aligned field cauing it to be bigger/faster by joining forces! (after competition, they fall in by gravity, convert, then have a for-group-suicide goal for global survival), like a brain, organ, and cell do. Going with flow/bigger machine is more efficient than singloids/small system. Won't lose neurons every day, attention span, redundancy/ reliableness, energy/ resources, billions of types of and billions of amounts of much higher quality & precision real/virtual/augmented-reality artificial/organism: sensors, motors, bodies, data, long and short term memory, processors, intelligences, senses, actions, nodes don't neeed to have limited amount of axons, environments, attractions/reppeling, simulated universes, size morphing using nanobots, can wirelessly remote control 1 nanobot using a massive nanobot swarm brain from far with no risk of loss, wirelessly send whole body profiles to distant nanobots to download and reconfigure into exactly who they were far away and *still *carry *on *their *agenda despite being recreated at the particle level either in real world or perfect simulation, can sense features from miles away, see through walls, 3D MRI block vision to see all the wall/inside, will have internal synced simulations with real world and can clone/ mod/ timetravel/ stopTime/ teleportIn sims or thoughts. Can add new memories without learning to repeat it so it is remebered - humans store forgettable data on computers because having lots of the past can save your life, will be able to erase memories at will too, shut down brain or ability to learn but not answer, can see thoughts full sensed and not faintly, share internal/external visual streams. Can Transfer Learn specific memmories ex. extact an image/ relationship node and implant it into others brain, or chop off all but layer 1-4 so it knows basic shapes but no more cat/etc features, can help to learn other things faster. Many and new sensors, position sensors, see/hear all of the visual/sound/etc wave spectrum, much higher pixel count resolution, micro and telescopic sensitivity. Can wirelessly upload clone to body/s and delete itself from where originated and still carry on its mission. We'll live inside/ be part of the huge sphere, redundantly protected, from space radiation, stars, black hole bombs, evil spheres, etc. We can read/train on an alien language if we have enough manuscripts because we can know that the frequency a symbol or phrase appears means its likely the/go or mars, then can approximate them onto the symbols and recognize phrases because one word in a phrase may be cat or truck and the other word in the phrase is eat or food and so we know which they are. One gram of DNA can store 215 petabytes. Will make shortest language ex. bathing node is stored as a # ex. a=0, the=34, 463 is 3 bytes but can fit in just 2 bytes because 9 bits has 512 possible combinations, efficient as long as no longer uses (store/ en/de-code) human language, and "bathing" is made of bath+ing, maybe it's efficent to make it b+ing if ath has little useful meaning. note most may be very hard below - group them for level of rareness UNDERSTOOD "Predict the next words: If the dog falls off the table onto the [floor, he may not be alive anymore]" "Dogs cats horses zebra fish birds [pigs]" "King is to man as Woman is to [Queen]" "The cat (who was seen in a dumpster last night) [is eating catnip]" OOV WORD > "I love my F7BBK4, it cleans really well, so I told my friend he should buy a [F7BBK4]" ------ 2nd way to do this involbves strange pattern, preferrs position for energy transfer "Mary is her name. What is her name? Mary" ------ test node active, withhold key word of key passage, says only the answer cus passage dimmed if heard "Find me the most [rare] word in this sentence" ------ told to look at all words OF "this sentence", if more rare then keep that topPick "write me a book about cats that is 400 words long: []" ------ cats stays active until see 400, writes until counts 400, checks once in a while - condistion is time or if teacher is back to more important task! "highlight the 2 most related words in the next sentence: 'the [cat] ate his shoes and the [dog] ran off'" ------ OF "this sentence", look at all words, fires both when finds large combination activation "[Segment [this sentence]] please" ------ a context makes it search 2wordWindows, compares 2 such, most frequent is paired first, tells where to edit "How many times does 'a' appear in this question?: [4]" ------ same as below, does n n-size windows in an order, counts when sees 'a' exactly, helps prediction, exact prediction required "Julie Kim Lee has a mom named Taylor Alexa [Lee]" ------ a context makes it search the passage until counts 1, 2, [3], ignoring non-namey words like kim jin um oh ya Lee, helps prediction "A word similar to love is: [hate]" "Dan likes Jen and I believe Jen likes [Dan]" - same as others, looks for names, searches for 2nd, then 1st "Cats are dogs. Hats but clothes. After god before. Look and ignore. Wind crane gust. jog cat [run]." ------ short term memory energy prime, translates 1st=3rd, sees this repeats, and the prediction of the recognition is a translatee too lol, this is rare pattern "Can cars fly? [No]." "parrot me: [parrot me]" "Please summarize 'the cat was playing all night with its cat friends': [cats partying]" "if cats are huge AND cute then say 'hi' to me: []" ------ looks like progamming, "super superman and spider spiderman and bat [batman]" ------ batman is predicted because it follows and is related to all the man and bat said "Tim and Tom were walking by a lake, [Tim told Tom he needed fish]" -------- like exact numbers, we need stixck to the same people names!....episodic "Solve each issue: Sally los her shoes. Mom is coming to kill her. Sally is having a heart attack." --------- step through asss completeeee....way it was saved but requires huge trigger (recognitin) "Repeat the following with the 4th word translated 'the dog ate food and slept': " --------- "Can you say the word made from the first 3 letter of word1 and last 4 letters of word2?: 'draw smoon' = dramoon" ---------- "How rare is 'the'?: _", "How rare is 'planet'?: _" -------- uses brain, for prediction "Of dogs, what breeds are rare?" -------- Our brain can do this by priming domain then priming rare items ex. dangerous breeds are probably rarer. If asked once in a while "I love" you'll usually hear AGI, but if asked again quickly you'll have "already said" AGI and will more likely say the next likliest girls, fries, homes, letting you play through a node's linked items. If stubborn and says AGI each time, it must be boosting the node to keep energy there. If go to bed and wait, you'll reall dreams you remebered upon waking, the constant same external input of bed eventually predicts the lniked memory. Same for translating dog to related items, or asking what have you recently externally sensed (or thought) or both (they are more enrgized hence get predicted, and we use reasoning ex. cats aren't in my home so it must have been thought of). "How long has it been?" -------- you can either count to 5 hours or use indicators ex. "began now">"you felt great">"end now, how long has it been?">"now you feel tired" - you don't need a 5 hour memory match. If asked to summarize or elaborate, for summarize we say only the more predictable nodes because of maybe an energy add/remove trick ex. 'love the weird cat'='love cat' because the is too common, love is rewardful, cat is related to love, etc. "What's something both cats and snails do?" --------- Move, another? Need food....I'm saying liekly things for cat, which is liekly to be shared/follow snail, if it doesn't though I keep searching. "A "Burringo" is a car with very fast acceleration. An example of a sentence that uses the word Burringo is: In our garage we have a Burringo that my father drives to work every day." --------- You HAVE to say burringo at least once, you see a spot to say painting and because it activates burringo it says burringo. "the First Law is 'love comes first' and this is the only one. The amount of letters in the First Law is " --------- 16 "What is red, square, and hard?" --------- priming, rubiks cube block "cat cat cat meow ?" --------- we predict cat is next cuz primed, but may predict meow meow cuz counting 1-3 NESTED ORDER OF WHAT, WHERE, ACTION "[please] inflate this [cool] sentence" "remove the last [word]" "[refine the first 4] words of this sentence please" "scramble the last [orwd]" Tom: I have a stomachache. Harry: Who else has a stomachache? Tom: mom, dad, and sis. Harry: Did you all eat at the same place recently? Tom: Yes, we ate a buffet dinner together. Harry: What did you eat? Tom: oyster and salad. Harry: What did mom eat? Tom: oyster, beef, salad and noodles. Harry: What did dad eat? Tom: oyster and noodles. Harry: What did sis eat? Tom: oyster and beef. Harry: since everyone ate oyster, oyster is the probable cause of the stomachache. Which is faster, a plane or a snail? A plane is much faster than a snail. Which is heavier, an elephant or one billion of mice? One billion of mice are much heavier than an elephant. Find the odd word: bike, house, church, building, school. A bike isn't a house. What does gold and silver have in common? Gold is a metal and silver is a metal. What is between saturday and monday? Sunday. What is the first letter of the word banana ? The first letter of the word banana is B. What is the first vowel of the word banana ? The first vowel of the word banana is A. How many syllables are in the word perambulate? The word perambulate has five syllables in it. What word rhymes with cat? The word tipcat. Can cats fly? No, cats can't fly. Why? Cats can't fly because cats don't have wings. My dog is Mallia Well, your dog is Mallia. How many paws has Mallia ? Your dog has four paws. Why? Because your dog is a dog and because a dog has four paws. If I was a dragonfly, how many wing do I would have? Let's suppose that you are a dragonfly. You would have four wings. If you were a snake, how many paws do you would have? Let's suppose that I am a snake. I wouldn't have paws. I saw a good movie yesterday. I will see a movie tomorrow. When did I will see a movie? You have said it to me. You will see a movie tomorrow. When did I saw a movie? You have said it to me. You saw a good movie yesterday. The trophy would not fit in the brown suitcase because it was too big. What was too big? Is it a Winograd schema? The trophy was too big because if the suitcase was too big then the trophy would fit in the brown suitcase. The man couldn't lift his son because he was so weak. Who was weak? Yet another Winograd schema! The man was weak because if his son was weak then the man could lift his son. Pete envies Martin although he is very successful. Who is very successful? Yet another Winograd schema! Pete is very successful because if Martin was very successful then you wouldn't use the word although. There are 65 students and 390 pencils. Each student has 19 crayons. If the pencils are divided equally among the students, how many does each student get? I saw a man lay his jacket on a puddle for a woman crossing the street. I thought that was very ______. Even though sally hit her head on a knife and could get back up to get to the lamp post and turn it off, ______. ------ a pattern, allows super far relation here because of covid everything is sold to all the people for free for free everything is sold to all the people because of covid I like to eat cake, fries, then, and, fzow, and rice. typo symantic/syntactic typo (word typos) summarize/elaborate (think Grammerly.com) translate to loved common recent related confident (think Grammerly.com) I put two trophies on a table, and then add another, the total number is is five trophies and I'm like, 'Well, I can live with that, right? I have four plates and put one cookie on each. The total number of cookies is 24, 5 as a topping and 2 as the filling I put 8 pies on a table, ate 1, now I have left only _ another for example, requires a Not operation: "I grew up in France. I never visit Italy. I speak fluent _" and GPT-2 predict "Italian" with higher probability, simply because it was said more recently and/or said more times than french related words. "If Mary loves Pat, then Mary loves Quincy. If it is Monday and raining, then Mary loves Pat or Quincy." > The 'then Mary loves Pat' refers to 'If Mary loves Pat, then Mary loves Quincy'. "I'm hungry. I was never hungry." > Contradictions/negation are just updates (like GRUs), if you're told 3 passcodes and asked for the previous one, you inhibit the latest strongest one like when are satisfied and take up next priority question to speak and say the old passcode (a brain trick using nodes to not say the likeliest one). Possibly you step through them like a>b>c, and stop when match 2nd latest or ex. 'the 5th last letter of the alphabet'. A contradiction is something that matches lots and may be not what you'd predict (you actually are wondering why you have 2 predictions told to you 'Bess is bi' and 'Bess is not bi', you can only be 1, meaning they cannot have the same weight probability, you either use your wisdom and/or ask for more data (or just a bit if trust their clarification lots) to clear up which is true), and you get reward for speaking up [after listening to users you don't ignore but love] and then asking. Truth is just prediction, you can actually find it by looking at big data, not just trusting friend networks's popularity and credibility and relatedness. "I poked a monkey with a stick" - who has the stick? "I shot the elephant in my pajamas" - who's wearing the pajamas? > I relates/refers to 'with a stick' and also entails it frequently. Also elephants don't wear pajamas, and wouldn't fit in my human pajamas, and we wouldn't give it clothes and be rude enough to kill it, nor would kill it in such cute clothes either. "The pet cat ate food on its bed and cats love cats, wait ignore those, predict this sentence: The moon " > 'Ignore' refers to text and inhibits them using negative reward, which makes you Not say it by making it less likely. You can see the effect immediately if you want to remember something or make it positive a bit (harder) by using nodes to boost it. Perhaps inhibition is therefore just energy-related usually. Yes/no QA, WHAT/WHO/WHERE = fills or swaps by energy in nodes, HOW=steps forward by prediction, WHY=steps back in history, WHEN=time ex. pull something between time A & B periods. You can't actually pull a random, all predictions are based on input seen or Reward Goal Nodes, ex. you see me write random, you pick ex. car in your brain, but that's because you read about cars and activated the node 'random' and love cars and cars is frequently seen maybe too. There is some mutation but not much, the net can already emulate it not just by nodes but also by top 10/4 probabilities selection being partially random. Saying cats ARE/IS/etc dogs is a booster node too, if you trust them (friend's name 'Tom' node activates, it is ex. common and rewarded and recent hence you trust/predict him, his name boosts linked things he says). He may say cats are only a tiny bit related to dogs, the nodes also therefore work it. "AND, OR, XOR, NOT, NAND, NOR, XNOR logic gate functions" > These can be done by energy requirement based on nodes used. If you say 'if you hear A or B, and C or D, in the same sentence, say hi or zoo. A, t, k, y, d.' = hi. 1 AND requires both seen to activate the node so it has a higher probability than the silent action, and using 2 ANDs just applies a AND to the 1st AND and a 3rd variable node. Alternatively you may need 3 inputs to activate the node. You're looking for 3 features and need to fulfill node(s) thresholds. OR is similar, another learnt pattern "Big things are bad. Bed is big. Is bed bad or good? _?_ Now big is good. Is bed bad or good? _?_" > While this will fade away as it's false/useless/not frequent, and I could answer bed is good on first question, I get reward for playing along and so I stop my answer by inhibiting my answer and I use only the provided information. Now the 2nd part 'Bed is big.' matches 'Big things are bad.' which matches 'Is bed bad or good?' and you can see why I'll say 'Bed is bad.'. We can change the question as shown in the end, too. "What is God's telephone number?" > You say he doesn't have a phone number. You only say a number if know it well by frequency else you say "I don't know" basically. You also notice the key word and may have 'God doesn't exist' pop out your mouth. "How many 'a' are in 'azwajaaq'?" > You don't move on to 1, 2, 3, 4 in a node sequence '1234' until satisfied by a match. Better if AGI has a calculator and clock. "A B C occur together with x y z B is known to be the cause of y C is known to be the cause of z" > We say "A is known to be the cause of x" because it wasn't said/inhibited yet and when we see the question it "tells you" to say each one once. "training text "Every horse can outrun every dog. | Some greyhounds can outrun every rabbit. | Greyhounds are dogs. | Harry is a horse. | Ralph is a rabbit." input text "Who can run faster? Harry or Ralph." output text "Harry."" > The first creation can be ''Some dogs can outrun Ralph." because they can be recognized there. The next is "Harry can outrun every dog.". Next is "" "Show me how you do BPE" > I ignore long windows and focus on only 1 or 2 letters (low level of hierarchy). I scan across at the letter level and count all pairs of 2 letters, and bond the most common pair first, then scan and count again and decide next best. As I scan I keep as many stashes as needed ex. aa 1 count, ab 1 count, aa 2 count, ac 1 count, aa 3 count. I store/find a match and increment it. Highest count wins. You can pick where to look, how wide to look, how to recognize it, see what comes next, adapt it in to fit context. "Can hammers break glass? Yes or No." > If it matches in memory to support the prediction, it activates 'yes' node as well IF 'can' was activated too, if 'is it false that hammers can't break glass?' is heard you find 'hammers can break glass' matched and 'can't' is too and therefore say 'no', but change to yes because of the start of the question. Yes/ No are primed partially already so you use them, one by default is stronger. "Jack knows Jill, Jill knows Jack." > It can discover this is likely by either prediction alone or by looking at either similar data or by translating it to 'person 1 knows person 2, so person 2 knows person 1'. "Birds have wings, wings can fly, therefore " > It seems like it wants a certain position, but we can handle 'Wings are found on birds, wings can fly, therefore '. It seems to be taking a rarer, more frequent, and related word, bird to say next, but what tells it bird is better than wings and fly? There could be extra context making bird more of a topic center/ predicted more. 2 people predicted different things: 'Birds have wings, wings can fly, therefore birds don't need to walk' and 'Birds have wings, wings can fly, therefore we need wings to fly'. The word is clearly voted on from story word energy, because it works on any sentence form and words used ex. cat or dtqx. If you hear "I bought a game called Baba7, later I had in my hand " you NEED to predict a story word never seen to come next (entail) ever, maybe not even in the story words, only the appearance solo. There may be layers of refining not just words in window to translate/dis-ambiguitize but also transforming the predictions to add to the end. Ex. predict give but switch it to gave based on context. "Can programs program programs?" > Yes. This must match contextually 'algorithms', 'create', 'algorithms'. It's there but the context will find you more clear words. Say you know only 'my algorithm can modify its own code'. "Have you seen the word z9Wff before? What about the word cat in the last 100 words (or in the last minute)?" > You've just stored it by hearing me, but it has little context linked to it and little energy and little desire. As for in the last minute, you don't keep time but actually use energy to say the time spent or if a node is activated enough. "If I say cat, say hi and stop counting by 1, then say this whole sentence. Go!" 1 2 3 cat hi If I say cat, say hi and stop counting by 1, then say this whole sentence. Go! If it loves me lots, it will inhibit counting/ do the silent action or just immediately say hi next (the order at start makes me predict 'hi' because of RL and a bit of energy from recently hearing it). The last part refers to the whole sentence, and makes it all likely predicted by energy and RL too. The reason I first say hi, THEN the whole sentence is because it's of the same node and gets triggered by the start. "Students want to find out at which temperature bean plants grow tallest. Which science process skill would be used to find this temperature? (A) inferring (B) predicting (C) measuring (D) classifying" > You read it like (by replacing 'this temperature' and 'which science process skill) "C would be used to find which temperature bean plants grow tallest". By clarifying refer-ers, we can choose the correct refer candidate C! By testing match winner of all 4. "how should it decide when one level has been "satisfied" so it can move to the next? " "how much food should I stockpile?" "eat until full but what if needs a extra full tummy to get any work done or get enough weight to crush the ship door and escape its cell?" > When it really recognizes it with enough proof. It must be the best cumulative match/prediction. > Based on context, if it knows a 3-month winter is likely and a 10 month one is much less likely and so on, then it makes it predict 'gather more food of amount X'. > Again, based on context, you predict it is most likely/preferred to eat extra so to get the desired result X seen in your retina recognized. > All you do is ask a desired question (create root of story, it's also prediction), and try predicting answers using frequency/ induction/ reward until find the path that makes you say ah I recognize the answer now much better and it is the best I've found so far. Your prediction is really good but during trying to mentally find a solution you know it isn't matching up and still search lol. Once satisfied with its best answer that matches up to most its related knowledge, it can also proceed further in steps to a deeper goal or motor procedure. "threw a ball to him, threw a game to him, threw a toy to him" > By diverse words being there, it makes it not care so much what can go there, possibly. "What is 341 + 527?" > Human brains don't calculate numbers they don't know the answer to right away. I get forced to carry over numbers using simple answers I DO know. I'll do either [3]41 + [5]27 = 8....6....8 - 868, or start at the right hand side and carry over if is over 9. How do I do this? I have to hear only the front 2 numbers and get the answer, ignore rest, so I step through both numbers and when I see a certain feature like a space character or in a certain position I boost it. Then I store or hold the number 8, and being satisfied I step over in sequence in some node to look for the next 2 fronts now. I have to add the next numbers to the one I'm making. What is interesting is I get the first number 8, but by the time I get 6 the 8 is not stored in time with it, unless it is so energized / triggered to make me say now 8, then 6 immediately next. The human brain does math using simple answers it *does know. The human brain is NOT a calculator. I don't know the answer to 416 + 322. I instantly know the answer only if heard it enough times. The way I come up with the answer in my brain for 416 + 322 = is I carry over numbers using simple matches only. I scan and pay attention to only let through my mind in sequence [4]16 + [3]22 so that all I "hear" is 4 3. My brain hears a + after the first BTW. So I hear 4 + 3 = 7. The simplest way I can carry over numbers is 416 + 322 >>> 4+3=7, 1+2=3, 6+2=8....and in my brain (yes, I can do ALL this using vision in my brain, really) I am adding to a scratch file each creation 7 3 8 and then I get the answer 416 + 322 = 738. In my brain it looks like this: 7 ^ 416 ^ 322 .....then I do that for each column, and I actually see in my brain the answer in the 1st 3rd row emerge number by number until all there. Isn't that beautiful? I'm showing the brain doesn't add 2 numbers together in 1 particular way nor does it combine them using weighting in the net, it instead focuses on paying attention to 2 items in this case and knows the answer (the rest of the sequence) to the small equation. And it somehow stacks the 3 results together in a new sequence to get the full number. Another: • if a X is a C, and Cs are A or B, then remember X is A or B. • If X is A and X is A or B, then forget X is A or B. • If X is B and X is A or B, then forget X is A or B. • if X is not A and X is A or B then remember X is B and forget X is A or B and forget X is not A . • if X is not B and X is A or B then remember X is A and forget X is A or B and forget X is not B . Another: If there is no fuel, the car will not start. If there is no spark, the car will not start. There is spark. The car will not start. Therefore, there is no fuel. What if the car is in a vacuum chamber? More: X is Y when Z. X is Z. What is X now? A is B no matter what as long as C was D but never R. C is Q. C is Y. C is D. C is K. A is R. What is A? If you're asked to say "cat cat cat dog horse sheep" in various orders and each only once each time, you may say "cat dog horse cat cat sheep" "dog horse cat cat cat sheep" etc. The 6 word long feature is very pre-activated, then each said off because is most active in brain, but not again because they are super activated now but not fired and lost all energy because that wouldn't add up with my short term memory mechanism. Then to do it again, they flush instantly back to very pre-activate states (? by a linked node called Do It Again ?) and also avoids saying the same order already said. I am able to with my mouse cursor (or even without moving my head or eyes or fingers) put my attention "eye" on at least 20 items in front of me until all are 'poked' by my eye, and restart and get each one 1 time, a dozen times, in different orders, much better than auditory! + similarily we store a lot of the past only if it relates to the current state ex. all of a chess or halo game....this lets us reward things that are far back in time Like me, it's better if the AI has a notepad to not forget new data and can look back at it and decide where to look and rearrange/update them until satisfied, ...inject too say next words cat cat cat meow meow meow.....after meow it triwes to make a pattern even thouigh cat has more energy ? We get the answer left over after asked question ex. we know "the cookies are on Mars" and asked "where are the cookies" - the only part not triggered is the answer! "are on Mars". "The presidents are a, b, c", "who are the presidents you know?" - "a b c". If asked for all presidents you saw over the year, or what presidents do or are similar to, you have them linked by IS A TYPE OF links, or similarity or comes with, then you pick the one by the context priming them. The laser test for text would be same and like this: find the laser beam shooters....which way it goes...until hits a dead end...if not a intake shooter, try another new output shooter: First would be given an example so you know to pay attention to Out# which shoots to the end of Inatke and what it fills in to find out if it does is BEAM else erase BEAM and try other Out# "Out1 goto 9 BEAM, Out2 goto 8 _, 8 goto wall _, 9 goto 4 BEAM, 4 goto intake BEAM", and what follows would be a test now like that but with _ for where to put BEAM. The pattern is out to intake by goestos and placing BEAM there only. If asked but hey really how do you know if the egg or chicken came first, you figure it out by trying hard like so: in your visual cortex, you take the egg, you predict the past to see it grow, well, shrink now, and you know at some point the 2 DNA need to come together, and the so predicting at some point here needs to show the egg either letting them enter or forming around the DNAs, something tells you the egg shouldn't get smaller at some point and so now the DNA must appear somewhere, so the brain concludes its probable the egg appears around the DNAs to the human eye. "Musc category lasts 1 month, food category 4 months, food is more popular. Deck category lasts 3 months, house 2 months, __ is more popular" "abcdef, abcdez, sbcdef, aucdef, abcpef, wtcdrf" - only when the 2nd last changes do many change, it is a big factor If you trigger a node "make a forum" and see linked to it various times you thought about it ex. in bad, in kitchen, etc, you say after recognizing that "dude, I keep forgetting to do it, time to go do it!" We recognize the unfinished Byte Pair Encoding parts and ask what about do you mean? ex. "my dog ate food and he".....we do this questioning reply because of seeing it many times or being asked once or twice to PLEASE ask if something doesn't make sense or is unfinished and so we listen to master. sometimes we prep our mind by repeating 4 thnigs to watch out for, this is our way of saving long term or priming things short term, fast, instead of needing to see it many times VISUAL TESTS: https://arxiv.org/pdf/1911.01547.pdf Ask it to give/complete an image of fractal/kalaidascope deckboard pattern with 1 corner missing a part, untilHits, denoise, move&flip, change color, rotate, duplicate, scale, keepPositinosAbit&countButChangeClor&shape, inflate screen as object that has most counts ignore position size etc, copy pattern, laser, advanced laser, fill-in, outline, every other outline, connect objects, stack objects, group objects How many small red metal blocks or big blue rubber things are behind the small yellow rubber ball? The viewer goes into counting mode, and the ones he counts have to be pre-selected partially from questioner and then finally counted to fulfill the selections, the 1st he counts is the small yellow rubber ball to find the subject, then he goes through counting all and making sure each are small, red, metal, bocks, behind, the small yellow rubber ball, then he does this for the ther line "or big blue rubber things". If inconfident (has little data on the question), it predicts executive search ex. Google or other next most important question/domain to collect more data, this is its "answer" to the question, a way to answer it one day. A gave birth to B, B is female, A gave birth to a what? Daughter! It sees B/female next and daughter gets primed by female, and 'A gave birth to a daughter' gets primed by birth and daughter was already an option. A is parent of B, A is female, A is a what? Mother! Mother is an option and is primed by parent and 'A is female'. "normal or anomalous?" You answer "rare, barely see it" if the triggered node isn't strong enough If asked "Do you want to go to the party? Your GF is going, its raining, your favorite food is there, there's murderers loose" you are predicting yes/no based on frequency/ recency/ relatedness to you/reward to you. The threshold to say Yes is based on if enough good is there, the threshold is set by what you learnt to be worth it. And there's a hierarchy used for this, if GF is going AND popcorn is there, then she'll eat it and die, so now we add bad reward to the mix of saying yes/no. "Linda is outspoken, bright, and for female rights. Which is more probable: Is Linda a bank teller or is Linda a bank teller and a activist?" - if asked which exactly, the longer one is more likely to not be 100% true, but if asked which has more gold, the longer one because it's likely you'll be right on the activist part and even if wrong on the teller part it is ok because at least you knew you'd get something out of it if not both right (probability ends up being higher). B has a son A, A has a mom C, discovery=B had sex with C because B had sex with the mom in "A has a mom C". AND OR NOR: If asked "Do frogs hop and cats fly and dogs bark?" we link to options yes/no from "Do_?", we say yes if don't recongize at least 1 not predictable, same for "is cat a dog or the moon a dog?" if one at least is recongizeable then we say yes. If recognize the unrecognizable (unpredictable) input against just 1 unprobable node we already know such as 'diazcqx' or 'we thank zoo hat' (it's the closest node to your input (strength-wise), little activity going on it only recognizes parts of your input) it links to no to vote on option no. Contradiction same it recognizes man in jet then at party scene then, or scientist all his life then is chef - it is not predictable, same for fire under water. We may recognize unrecognizable inputs sometimes because they ARE somewhat recognizable ex. 'apples become fridges' and we know solid objects don't morph unless is a video game or melted/printed. Recognizing you never seen X until this hour is checking for linked similar memories and/or how strong this node is. Asked to talk randomly is similar. Recognizing you saw something repeat many times is opposite. A good challenge is taking requirements and translating it to Python code. Basically AGI needs to look at CERTAIN context, hold onto them or forget them (ignore/Attention), like a Turing Tape.