The Evolution of AI in Biology: Exploring BioCLIP 2 and its Impact on Conservation
Tanya Berger-Wolf, a computational biologist, embarked on a groundbreaking journey when she took on a challenge to develop an artificial intelligence (AI) model capable of identifying individual zebras faster than human zoologists. Not only did she succeed, but her victory marked the beginning of a much larger endeavor. Today, as the director of the Translational Data Analytics Institute and a professor at The Ohio State University, Berger-Wolf is expanding her focus to encompass the entire animal kingdom through a project called BioCLIP 2. This AI model leverages the largest and most diverse dataset of organisms ever compiled and will be featured at the prestigious NeurIPS AI research conference.
BioCLIP 2 is a significant advancement in AI technology, offering much more than just the ability to extract information from images. It can discern traits of various species and understand the relationships between and within species. For instance, the model has demonstrated the ability to organize Darwin’s finches by their beak size without being explicitly taught the concept of size. This impressive capability is visually represented in a scatter plot, showcasing BioCLIP 2’s sophisticated understanding of biological characteristics.
These advanced features of BioCLIP 2 make it an invaluable tool for researchers. It serves as both a comprehensive biological encyclopedia and a dynamic research platform with inference capabilities. This is particularly beneficial for addressing a long-standing challenge in conservation biology: the lack of sufficient data for certain species. As Berger-Wolf points out, even for well-known species like killer whales and polar bears, there is a scarcity of data regarding their population sizes. If data is lacking for these iconic species, the situation is even more dire for lesser-known organisms like beetles and fungi.
AI models such as BioCLIP 2 have the potential to bridge this data gap, enhancing conservation efforts for threatened species and their habitats. Released under an open-source license on Hugging Face, BioCLIP 2 has already been downloaded over 45,000 times in a single month, reflecting its utility and popularity among researchers. This latest paper on BioCLIP 2 builds on the success of its predecessor, the original BioCLIP model, which was trained using NVIDIA GPUs and earned the Best Student Paper award at the Computer Vision and Pattern Recognition (CVPR) conference.
Building the World’s Largest Biological Dataset: TREEOFLIFE-200M
The development of BioCLIP 2 began with the creation of a massive dataset known as TREEOFLIFE-200M. This dataset includes 214 million images of organisms, covering over 925,000 taxonomic classes, ranging from monkeys to mealworms and magnolias. To compile this extensive collection of data, Berger-Wolf’s team at the Imageomics Institute collaborated with the Smithsonian Institution, experts from various universities, and other relevant organizations.
The goal was to explore the effects of training a biology model with an unprecedented amount of data. The research team aimed to transition from studying individual organisms to understanding entire ecosystems. After just ten days of training using 32 NVIDIA H100 GPUs, BioCLIP 2 exhibited remarkable abilities, such as distinguishing between adult and juvenile organisms and identifying male and female animals within species, without being explicitly taught these concepts.
Moreover, BioCLIP 2 can identify associations between related species, understanding, for example, the relationship between zebras and other equids. The model learns hierarchical classifications naturally, recognizing genus and family traits through patterns and associations in the images. This capability extends to assessing the health of organisms, differentiating between healthy and diseased plant leaves and even recognizing specific diseases, as demonstrated in a scatter plot that illustrates these distinctions.
The training of BioCLIP 2 was significantly accelerated by the use of a cluster of 64 NVIDIA Tensor Core GPUs, as well as individual Tensor Core GPUs for inference processes. According to Berger-Wolf, the development of foundation models like BioCLIP would not be feasible without the computational power provided by NVIDIA’s accelerated computing technology.
The Future of Ecosystem Studies: Wildlife Digital Twins
Looking ahead, the researchers involved in BioCLIP 2 are aiming to create wildlife-based interactive digital twins. These digital twins will allow scientists to visualize and simulate ecological interactions between species and their environment. This innovative approach offers a safer and less intrusive method for studying natural relationships, minimizing ecological disturbance while providing valuable insights.
These digital twins will enable researchers to explore various scenarios and hypotheses without impacting real-world ecosystems, effectively reducing the environmental footprint of biological research. By offering an immersive perspective on species interactions, digital twins will open up new possibilities for more comprehensive and precise ecological studies.
In the future, this technology could extend beyond scientific research. Interactive platforms, such as those at zoos, could utilize digital twins to offer the public an engaging and educational experience. Visitors could explore and visualize the natural environment from entirely new vantage points, gaining a deeper understanding of the complexity and beauty of the animal kingdom.
Berger-Wolf envisions a scenario where a child visiting a zoo could experience the world from the perspective of a zebra within a herd or a spider on a tree branch. This immersive experience could inspire a new generation of nature enthusiasts and conservationists, fostering a greater appreciation for the natural world.
For more information about BioCLIP 2 and its potential applications, you can refer to the original research paper available on arXiv.
In conclusion, the development and implementation of BioCLIP 2 represent a significant leap forward in the application of AI to biological research and conservation. By providing comprehensive data and insights into species and ecosystems, BioCLIP 2 has the potential to revolutionize conservation efforts and deepen our understanding of the natural world.
For more Information, Refer to this article.

































