The Serengeti ecosystem enters the realm of big data

It’s morning on the Serengeti. The leopards and lions have started to wind down to sleep through the day, while the herbivores are just getting started doing their daily work as de facto lawnmowers. A zebra ambles along in search of some tasty grass to eat and – snap! snap! snap! – a hidden camera manages to capture a few photos of her on her way. Meanwhile, 30 kilometers away, a giraffe takes branch full of leaves into his mouth. The tree he’s feeding from happens to be placed just a short distance away from another hidden camera, so his mealtime antics are also discretely recorded onto a small memory card. Elsewhere, a group of wildebeest passes in front of another camera, and a tiny part of their epic, annual migration is recorded for posterity.

Some months later, halfway around the world in America, a wildlife-obsessed teenager sits at her computer and works her way through a set of photos from those camera traps, identifying as many species as possible, and noting whether the animals in the photos are engaging in any identifiable behavior. Are they eating? Walking? Are there any babies in the photo?

This story – which, while fictional, could quite conceivably represent real animals and real people – is only possible thanks to two important technological innovations. One is the relative ease with which camera traps can be deployed. Memory cards are getting bigger, batteries are lasting longer, and hardware is becoming cheaper, allowing researchers to get unprecedented access to the private lives of animals. The second is the explosion in internet-based citizen science. Armed with a well-designed website and a decent marketing strategy, researchers can engage hordes of interested folks, many with no formal scientific training, to help extract useful scientific information from millions of photos.

That’s why University of Minnesota graduate student Alexandra Swanson, together with her team, deployed 225 camera traps in a grid covering 1,125 square kilometers of Tanzania’s Serengeti National Park. Those cameras have operated continuously since 2010, and by 2013 they had accumulated a whopping 99,241 camera-trap days and produced 1.2 million sets of photos. Working with the citizen science team at Zooniverse, Swanson built a website called Snapshot Serengeti. More than 68,000 members of the public contributed some 10.8 million classifications to the dataset (each photo set was reviewed by multiple users). Of those 1.2 million photo sets, users identified animals in 322,653 of them. The remaining photo sets (each photo set was comprised of 1-3 photos taken in rapid succession) were devoid of critters, the result of the camera having been triggered by moving plants or by heat. The project was overseen by University of Minnesota biologist Craig Packer.

A hyena enjoys a tasty snack. (Image: SnapshotSerengeti.)

A hyena enjoys a tasty snack. (Image: SnapshotSerengeti.)

Naturally, not all classifications were accurate. Most of the users did not have a background in wildlife identification, some photos were unfocused or hard to accurately assess, and so on. In order to get around this problem, the researchers implemented a “plurality algorithm.”

Here’s how it worked: if five users in a row declared that a photo set had no animals present, then that set was marked as “completed.” A set containing no wildlife could also be marked as completed if ten non-consecutive users indicated that no animals were presented. For photos with animals in them, if ten users agreed on an identification, even if non-consecutive, then the set was marked as completed as well. For example, if 12 users viewed a photo, and ten identified an impala but two identified the animal as a waterbuck, then it was labeled as an impala. Alternatively, if the photo set was presented to 25 users and there was no consensus (e.g. eight marked impala, seven marked Speke’s gazelle, nine selected Thomson’s gazelle, and one chose zebra), then the photo set was retired as complete but without consensus.

This honey badger doesn't care about the dark. (Image: SnapshotSerengeti.)

This honey badger doesn’t care about the dark. (Image: SnapshotSerengeti.)

In addition, the researchers presented a subset of the photos – 4,149 photo sets – to a panel of wildlife experts for identification. By comparing the experts’ identifications to those of the citizen scientists online, they were able to estimate the probable accuracy of any completed photo set that had reached consensus.

Serengeti National Park is part of the broader 25,000 square kilometer Serengeti ecosystem along the Kenya-Tanzania border in East Africa. While the area has been extensively studied, the datasets describing the wildlife there have been separated by time and place, and the data itself has been collected using a variety of methods, making it difficult to combine for comparisons. Since the 1960s, for example, the Serengeti Lion Project has monitored individual lions and their movement patterns, while the Tanzania Wildlife Research Institute surveys herbivore herds by using flight counts and aerial photography. But never before has anybody attempted to continuously monitor a broad swath of the ecosystem, collecting data on both predators and herbivores, using identical procedures. Indeed, “our camera survey expands upon historical monitoring by providing the first continuous systematic data on all of the larger predator and prey species, day and night, across several years,” writes Swanson this week in Scientific Data, a new journal from Nature.

Swanson and her colleagues aren’t keeping their massive datasets to themselves. Instead, they’ve made all photo sets (including metadata) available, along with the raw classification data, consensus data, and expert classifications. The classification data is accessible via the Dryad Digital Repository, and the photos themselves are currently being provided by the University of Minnesota Supercomputing Institute (for example). Swanson notes, though, that the UM Supercomputing Institute’s website is not a data archive, per se. “Currently, there are no archiving systems or organizations available for storing the terabytes of images from our study. We hope that image archiving options will become available in the near future.”

A lioness feeds her cubs in the daytime. (Image: SnapshotSerengeti.)

A lioness feeds her cubs in the daytime. (Image: SnapshotSerengeti.)

The ecological applications for these datasets are fairly obvious. Researchers can utilize the data to estimate the relative abundance of different species and how those values change throughout the seasons or from year to year. More than 8,000 images include impalas, for example, while just 27 include genets and 17 feature the rare zorilla, or African polecat. Since the camera traps are deployed in such a way that there are at least two for the average home range of a medium sized mammal, some researchers can even use the photos to track the movements of individually recognizable animals, at least in a coarse-grained manner.

But Swanson and her colleagues hope that this rich dataset will prove useful to lots of researchers, beyond wildlife biologists. Those who are interested in citizen science and informatics can develop more robust algorithms for validating data quality, for example.

A Southern Ground Hornbill inspects a camera with its beak. (Image: SnapshotSerengeti.)

A Southern Ground Hornbill inspects a camera with its beak. (Image: SnapshotSerengeti.)

The photos could also be useful to computer vision researchers. “By using the raw images together with the consensus dataset, machine-learning algorithms could be developed to automatically detect and identify species, using part of the dataset for training the image-recognition algorithm and the rest for testing the algorithm,” write the researchers.

Finally, the researchers hope that the photos they’ve collected, and the data they’ve generated, can be utilized by educators. They envision curricula being developed about the scientific method, using the charismatic fauna in the photos as a means of engaging students. The datasets can be used in undergraduate classrooms to provide actual, meaningful research experiences. Undergrads at the University of Minnesota, for example, are using the data to ask questions about daily activity patterns and seasonal movements of Serengeti wildlife. And graduate students at Columbia University have already begun to use the data to train computers to identify animals in natural scenes. – Jason G. Goldman | 10 June 2015

Source: Alexandra Swanson, Margaret Kosmala, Chris Lintott, Robert Simpson, Arfon Smith & Craig Packer. (2015). Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna. Scientific Data 2, 150026. DOI: 10.1038/sdata.2015.26.

Header image: A zebra foal, via SnapshotSerengeti.