Thread by @bradpwyble, New PrePrint! Working memory makes use of long-term-memories, which is how we [...]

New PrePrint! Working memory makes use of long-term-memories, which is how we can easily remember familiar shapes or characters. This idea has been part of WM theories since their inception back in the 1960s’. Our lab has developed a computational model of this theory. (1/25)

Our new Memory for Latent Representations (MLR) model combines recent advances in deep learning (VAE, Kingma & Welling 2013) w/ the binding pool model (Swan & Wyble 2014) to explore how the brain might store copies of shapes & colors, & bind those representations together (2/)

This model was written in pytorch by @Shekoofeh93 and myself with helpful input from @ODonnell_R_E who created and ran all of the experiments. It is posted on BioRXiv and will be submitted shortly. (3/) https://www.biorxiv.org/content/10.1101/2021.02.07.430171v1

The Memory for Latent Representations: An Account of Working Memory that Builds on Visual Knowledge...

Visual knowledge obtained from our lifelong experience of the world plays a critical role in our ability to build short-term memories. We propose a mechanistic explanation of how working memory (WM)...

https://www.biorxiv.org/content/10.1101/2021.02.07.430171v1

Our goal is *not* to create a new state-of-the-art in reconstruction via VAE, or to optimize accuracy benchmarks.

Rather we are trying to elucidate potential mechanisms for storage of visual patterns in the mind by matching the flexibility of human behavior (4/)

We modified a standard VAE (Kingma & Welling 2013) by splitting its bottleneck in half to create a shape map and a color map that were trained through two objective functions, one that ignored color (shape map) and the other which ignored shape (color map). (5/)

We also added a skip connection from the first layer to the last layer, jumping over the bottleneck, which will be useful for reconstructing unfamiliar patterns. (6/)

We trained this modified VAE with MNIST and fashion MNIST sets that were colorized using random variation on 10 prototypical colors. The model could then reconstruct both attributes or just the shape or color when we inactivated the other map. (7/)

With these two maps, we can separate the color and shape information from a stimulus and then recombine them, in a simulation of feature binding within human vision. This figure illustrates VAE reconstructions involving both maps, or with one of the maps set to 0. (8/)

We take this VAE as a (very) rough model of the ventral visual stream that processes forward from V1 up to IT and then sends return projections back down to V1 for memory reconstruction (and perhaps visual imagery). (9/)

Now, we added a “binding pool” (Swan & Wyble 2014) which is a set of undifferentiated neurons that are connected to each of the latent layers in the VAE (L1, L2, Shape map, color map) by weights that are assigned randomly but not trained. This is where memories are stored (10/)

The binding pool does not use plasticity to store information, but rather it uses changes in neural activation. It’s analogous to the idea that elevated firing rates in the PFC store information and turn on/off rapidly to support rapid changes in memory state. (11/)

A representation in any of the latents of the VAE can be projected into the binding pool where it can persist as a pattern of neural activity. This representation can be pushed back to those latents allowing memory reconstruction of just that information. Some examples: (12/)

Multiple latents can be stored in the binding pool, e.g. both shape and color can be stored in one trace and then reconstructed. Encoding parameters select only specific dimensions of information for WM as implied by findings like attribute amnesia (Chen & Wyble 2015) (13/)

Storing more features in a single memory representation causes interference in the form of noise for the other features, since they overlap in the binding pool. This small cost is supported by empirical findings (e.g. Swan, Collins & Wyble 2016) (14/)

We also added tokens to the binding pool allowing it to index multiple different items, encoding specific feature clusters for each item. Each item can be retrieved according to its sequence (I.e. 1st, 2nd 3rd) or its content (I.e. what shape or color did it have) (15/)

Once retrieved, the other features/attributes of that item can be retrieved as well. Because MLR has shape latents, the model is able to link different colors to specific shapes of the same digit (I.e. two different 2’s) and then retrieve which color went with which ‘3’ (16/)

We’re not done yet! The binding pool can also store representations of categorical one-hot labels of a shape or color along with the visual information, and the encoding parameters allow us to vary the quantity of each kind of information in the memory trace (17/)

That graph shows accuracy of retrieved color and shape *labels*. Accuracy is lower (left two graphs) when the binding pool is storing both labels and visual information. When the % of visual information is reduced, mean accuracy for labels is higher across all set sizes (18/)

Thus the model can remember the labels ‘2’ and ‘blue’ and also visual information about the specific shape and hue of a stimulus. We show that people have memories of specific shape information for MNIST digits, even when they don't know what memory probe to expect (19/)

However after 30 trials of being asked only about the label of an MNIST, they are no longer able to remember the specific shape of the digit they saw. The use of labels is widely suspected in WM, and we simulate the mechanisms underlying such a shift in encoding strategy. (20/)

But wait there’s more! The bottleneck latents of a VAE can only represent information it was trained on, so how would such a model reconstructs novel shapes? MLR can build representations of the VAE's first-layer latents. (21/)

This allows the model to build memories of stimuli outside its training set, which is similar to our ability to remember novel stimuli like Bengali characters after one exposure (Lake et al. 2011). We show that humans can do this even without being told what to remember (22/)

Note that when the memory reconstructions are passed through the shape/color maps, the novel shapes are morphed into familiar ones. This is somewhat similar to imagining familiar patterns, like faces, in the clouds. (23/)

What did we learn? The MLR model gives us a working intuition for how visual and non-visual information can co-exist within a working memory trace. We can also simulate how memories of real objects are stored and retrieved and how we can modulate those memories. (fin)

Latest Threads Unrolled: