Are biodiversity data lurking on Instagram and Flickr?

In September, a curious video on the site LiveLeak went viral. It showed an unexpected sort of cooperation among ants, in which they formed a daisy chain – each ant bit onto the rear of the ant in front of it – in order to haul a millipede back to the nest where it could be eaten.

When ant expert Alex Wild watched the video, he witnessed an ant behavior that was entirely unfamiliar to him. So he turned to the scientific literature. On his blog Myrmecos, Wild wrote, “the clip appears to show an Asian Leptogenys daisy-chaining their bodies in parallel lines to haul away a large millipede. I have spent the morning searching the technical literature for mention of this unusual behavior, and am coming up empty.” What Wild did find, though, was a video showing a similar behavior on YouTube, uploaded by a Cambodian beekeeper.

Researcher Christian Peeters confirmed, after seeing Wild’s blog post, that he too had seen the daisy-chain behavior in Cambodia; he simply hadn’t gathered enough documentations to have written a scientific paper on it yet.

One of the many insights derived from this chain of events is that there is a disconnect between empirical scientific information and the information available online. University of Kansas graduate student Vijay Barve thinks that social networks – like the video sharing sites and blogs that feature prominently in the ant story – can help to narrow that gap.

High quality biodiversity information is critical to fields ranging from biogeography, ecology, invasive species biology, and climate change science to studies of food security, disease ecology, marine productivity, and wildlife conservation. Writing in the journal Ecological Informatics, Barve argues that most traditional sources of such information are either broadly inaccessible (less than 40% of some three billion museum specimens worldwide have accessible data associated with them), or are narrowly focused on particular taxa or regions.

Photos shared on social networking sites like Flickr, Facebook, Instagram, and Google+ could represent a novel, easily accessible source of biodiversity data – especially since so many photos, including nearly all photos taken by smartphones, are automatically packaged with geo-location data and a timestamp. The combined efforts of billions of social media users could result in far more data amassed in far less time than more traditional biodiversity research can, even if most of them are non-scientists (or, citizen scientists).

To prove his point, Barve turned to Flickr, a photo-sharing service launched in 2004 that is now owned by Yahoo! As of March 2013, Flickr hosted some eight billion photos, with three and a half million photos added each day. He limited his efforts to two well-known and well-studied species: the monarch butterfly (Danaus plexippus) and the snowy oil (Bubo scandiacus). ” Both of these species [are] fairly well known to citizen scientists and have some popular appeal,” he says.

When he searched Flickr using the monarch butterfly’s scientific name, he found 16,474 records, of which 4,799 were geo-tagged. But when he searched instead with the simple “monarch butterfly,” he found 46,684 records, of which 5,318 had geo-location information. To be sure, just relying on a simple search is foolish. For example, some records from Asia and the Middle East were misidentified, or were cases of users using the monarch as a point of comparison for the species photographed. Some of the records he found were actually of the African Monarch. Still, when he compared his results with the data already present in the “Global Biodiversity Information Facility” (GBIF), an online repository for biodiversity data which also incorporates citizen science records from massive databases like eBird and iNaturalist, he found some new insights. As expected, the Flickr data coincided with GBIF data throughout the species’ historic range in North America, and parts of Australia where it has been introduced. But he found Flickr data for sightings in Europe where there were no GBIF records. “This highlights the potential of Flickr augmenting areas of distribution not captured by GBIF data,” he says.

Barve found similar patterns when it came to the snowy owl, a raptor native to parts of Europe. After removing some misidentified records and some photos that were actually of museum specimens in Africa and Asia, he found that the Flickr records indicated snowy owls were present in parts of Southern and Western Europe that were not already indicated by GBIF data.

While Barve’s demonstration is more anecdote than experiment, it does suggest that photos uploaded to social media can be a useful source of biodiversity data. Indeed, in some cases, that’s already proving true. Citizen science data largely in the form of photos, for example, have already given Smithsonian researchers a bounty of new data on the ecology of the recently-discovered olinguito. And Barve himself practices what he preaches, uploading his own butterfly and insect photos to sites like Flickr and Google+.

It also suggests, though, that collecting the information isn’t enough; a human with expertise still needs to sift intelligently through the data to weed out the misidentifications. Moving forward, Barve suggests that future research focus on assessing the accuracy of taxonomic identifications provided by citizen scientists, as well as the accuracy of the photos’ geolocation data. Although he suspects that “taxonomic accuracy would certainly increase for highly distinct taxa since those will be known and tagged only by serious amateurs and experts.” – Jason G. Goldman | 31 October 2014

Source: Barve V. (2014). Discovering and developing primary biodiversity data from social networking sites: A novel approach, Ecological Informatics, 24 194-199. DOI:

Header image: