tarrlab

department of psychologyneuroscience institutecenter for the neural basis of cognitionmachine learning departmentrobotics institute

virtually visit cmuliving in pittsburgh
community standards

high-risk research on hard problems in natural and artificial systems:

the constraints necessary to "build" a visual system
better articulated models of the intermediate and high-level vision
the organizational principles of visual cortex
how vision interacts with other modalities

tools drawn from cognitive science, machine learning, computer vision, computer graphics, and large-scale neuroimaging, including fMRI, DTI, MEG, EEG

brAIn @ CMU

My personal musings about some of our recent work version 1.1

The organization of knowledge within visual cortex is best characterized as a multimodal "operating system" for learning about, representing, and processing information relevant to adaptive behaviors (I know, this isn't particularly radical). Put another way, the organization of visual cortex necessarily reflects, emerges from, and has been shaped by evolutionarily critical behaviors (how else could it be?). Under one view, category selectivity emerges because humans need to successfully accomplish certain behaviors to survive (e.g., we need to eat frequently, so visual food selectivity develops over experience). A somewhat more constrained view argues that the consistency of organization for functional structures in visual cortex is indicative of a core set of principles that enable these higher-order knowledge structures -- still anchored in real-world behaviors.

In line with the latter point of view, recent research from the lab suggests that the visual cortex's organization might be built around networks crucial for evolutionary important functions such as social interaction (using areas like the OFA, FFA, EBA), navigation (OPA, PPA, RSC), object manipulation (LOC, areas selective for tools and hands), and eating (areas selective for food). This idea expands upon the concept of a widespread network that connects visual regions with areas processing non-visual aspects of food, as proposed by Jain in 2023. Similar comprehensive networks could underlie activities like socializing, navigating, and interacting with objects. This suggests that regions traditionally studied for their ability to recognize specific categories might be better understood as part of broader networks that support essential behaviors in our natural environment. Therefore, the focus shifts from the specific categories these areas are selective for, to their role within these larger networks that are critical for survival.

news

🎉 CONGRATULATIONS to lab alumnus Isabel Gauthier who is the twelfth recipient of the Davida Teller Award, for her many contributions to vision science, including the role of expertise in object recognition and her strong history of mentoring!

NEW FUN PAGE: Visualizing mixed metaphors -- share your best ones with us!

📝 NEW PAPER: Better models of human high-level visual cortex emerge from natural language supervision with a large and diverse dataset. Nat Mach Intell. paper | supplemental | non-paywalled

🔭 NEW RESEARCH: We introduce a data-driven method that generates natural language descriptions for images predicted to maximally activate individual voxels of interest. Our method -- Semantic Captioning Using Brain Alignments ("BrainSCUBA™") -- builds upon the rich embedding space learned by a contrastive vision-language model and utilizes a pre-trained large language model to generate interpretable captions. We validate our method through fine-grained voxel-level captioning across higher-order visual regions. We further perform text-conditioned image synthesis with the captions, and show that our images are semantically coherent and yield high predicted activations. Finally, to demonstrate how our method enables scientific discovery, we perform exploratory investigations on the distribution of "person" representations in the brain, and discover fine-grained semantic selectivity in body-selective areas. Unlike earlier studies that decode text, our method derives voxel-wise captions of semantic selectivity. Will be presented at the International Conference on Learning Representations (ICLR).

arXiv preprint: https://doi.org/10.48550/arXiv.2310.04420

🎉 CONGRATULATIONS to Jayanth for successfully defending his PhD thesis "Rethinking object categorization in computer vision" (September 26, 2023).

🔭 NEW RESEARCH: We introduce a new task, Labeling Instruction Generation, to address missing publicly available labeling instructions from large-scale, visual datasets. In Labeling Instruction Generation, we take a reasonably annotated dataset and: 1) generate a set of examples that are visually representative of each category in the dataset; 2) provide a text label that corresponds to each of the examples. We introduce a framework that requires no model training to solve this task and includes a newly created rapid retrieval system that leverages a large, pre-trained vision and language model. This framework acts as a proxy to human annotators that can help to both generate a final labeling instruction set and evaluate its quality. Our framework generates multiple diverse visual and text representations of dataset categories.

arXiv preprint: https://arxiv.org/abs/2306.14035

🎉 CONGRATULATIONS to Aria for successfully defending her PhD thesis "Using Task Driven Methods to Uncover Representations of Human Vision and Semantics" (June 23, 2023). Aria will be moving to NIH this summer.

🔭 NEW RESEARCH: We introduce a data-driven approach in which we synthesize images predicted to activate a given brain region using paired natural images and fMRI recordings, bypassing the need for category-specific stimuli. Our approach -- Brain Diffusion for Visual Exploration ("BrainDiVE™") -- builds on recent generative methods by combining large-scale diffusion models with brain-guided image synthesis.
Accepted for NeurIPS 2023!

arXiv preprint: https://doi.org/10.48550/arXiv.2306.03089

🔭 NEW RESEARCH: We've trained a neural network to predict brain responses to images, and then “dissected” the network to examine the selectivity of spatial properties across high-level visual areas. Discover more about our work: brain-dissection.github.io
Accepted for NeurIPS 2023!

Gabe's twitter thread: https://twitter.com/GabrielSarch/status/1663950775284801536?s=20

BioRxiv preprint: https://doi.org/10.1101/2023.05.29.542635

🔭 NEW RESEARCH: Children typically learn the meanings of nouns earlier than the meanings of verbs. However, it is unclear whether this asymmetry is a result of complexity in the visual structure of categories in the world to which language refers, the structure of language itself, or the interplay between the two sources of information. We quantitatively test these three hypotheses regarding early verb learning by employing visual and linguistic representations of words sourced from large-scale pre-trained artificial neural networks.

arXiv preprint: https://arxiv.org/abs/2304.02492

📝 NEW PAPER: A texture statistics encoding model reveals hierarchical feature selectivity across human visual cortex. J Neurosci. JN-RM-1822-22. Link

📝 NEW PAPER: Low-level tuning biases in higher visual cortex reflect the semantic informativeness of visual features. J of Vis. 23(8). Link

🎉 CONGRATULATIONS to Nadine for successfully defending her PhD thesis "Bridging the gap from human vision to computer vision" (April 25, 2023) Nadine has accepted a position at NVIDIA.

📝 NEW PAPER: Selectivity for food in human ventral visual cortex. Commun Biol. 6, 175. Link | github

📝 NEW PAPER: Why is human vision so poor in early development? The impact of initial sensitivity to low spatial frequencies on visual category learning. PLoS ONE. 18(1): e0280145. Link | github

@mtarr@neuromatch.social

join mastodon