AI Datathon Aims to Aid in Browntail Moth Invasion
Thom Klepach and Colby students explore how technology can predict patterns and help manage the invasive problem that's wreaking havoc across the state.
On a bright autumn morning, Thom Klepach hiked through Perkins Arboretum with his head tilted back, searching for signs of one of Maine’s most troublesome invasive species: the browntail moth.
It didn’t take long before Klepach, a visiting assistant professor of biology and the Ward 3 Waterville city councilor, spotted a few webbed nests in the tops of hardwood trees and shrubs. Each nest harbors as many as 400 tiny caterpillars, the larval stage of the browntail moth. And each caterpillar is covered with toxic hairs that can cause a skin rash and even breathing problems in those who come into contact with them.
Lots of Mainers are all too familiar with the havoc the hairs can wreak because browntail moths, which were accidentally introduced to New England from Europe in 1897, have been in an outbreak here since 2015. Managing the outbreak has been taxing for agencies like the Maine Forest Service and for communities all over the state, including the city of Waterville.
“It’s a very real-world problem,” Klepach said. “There’s absolutely nothing about it that’s hypothetical.”
And that made it a perfect fit to be the subject of September’s inaugural Davis AI Datathon. During the all-day datathon, teams of students took browntail moth data collected by Klepach and manipulated it with the state-of-the-art data platform Dataiku to try and figure out patterns and potential solutions to the problem.
The event was popular, Amanda Stent, director of the Davis Institute for AI, said, with 37 students participating.
“It’s pretty amazing,” she said of the turnout.
The datathon happened, in part, because after Stent began working at Colby last fall, an alum who works at Dataiku, an artificial intelligence and machine learning company based in New York City, reached out to her. Stent met other people who work at the company, spoke at a Dataiku conference, and learned about its academic program, which provides free access to its software to teachers, researchers and students at academic institutions.
“I think their platform is really cool,” Stent said. “They have made it more transparent for people to do data science and machine learning without actually having to become a computer scientist. It’s very aligned with our mission, which is AI for everybody.”
This spring, she was on the hunt for a project for the Dataiku platform when she read an article in the Morning Sentinel about Klepach’s efforts to collect data about the browntail moth infestation in Waterville. The data came from three sources, including a survey of 700 trees on city property done by an arborist and a survey filled out by residents. The information would be used to help the city try to eliminate nests, but there were issues with the data provided by the more than 500 residents who filled out the survey, Klepach said.
“Some of the data was good. Some of it was a mess,” he said. “Maybe they thought they had a web, but they didn’t have a web. Or maybe they didn’t know what kind of tree it was on, or they counted their trees wrongly. There were a number of different issues.”
He was looking for a better way to synthesize and organize the data when Stent offered her help.
“I thought, ‘Oh, perfect,’” Stent recalled. “Because we want to use AI to make a difference in people’s lives, but without surveilling people. I’m happy to surveil browntail moth caterpillars. We don’t care about their privacy.”
Klepach was delighted to have the students look at the browntail moth data for the datathon. After all, the stakes were high. The Waterville City Council voted to allocate $100,000 last year and $50,000 this year to combat the problem, which they declared a public health nuisance through the Maine Center for Disease Control.
“The problem is growing exponentially right now,” Klepach remembers telling other councilors. “We just came out of a pandemic, so you know what exponential growth means. This could be a massive issue. We need to be prepared for it, and to do that, we need to collect data and develop a treatment strategy.”
A nine-hour data marathon
That’s where the students, and the datathon, come into play. At 8 a.m. Saturday, Sept. 24, in the basement of the F.W. Olin Science Center, Stent gave them the data and a deadline.
“They had nine hours to do useful things,” Stent said.
The students were assisted by Stent, Emmett Smith ’24 and Max Jacobs ’24 of Colby’s student-run data science club, Davis AI Postdoctoral Associate Tahiya Chowdhury, and Christopher Peter Makris, a data science manager from Dataiku. With their help, students focused on cleaning the data, creating a reproducible pipeline for combining the different data sources, visualizing the data, and making predictive models. They were asked to consider the interests of different stakeholders, such as a Waterville resident, a city councilor, or an arborist.
The students dove right in, with the goal of preparing a presentation by 5 p.m. that day. The AI they were using to work with the data is what Stent calls “small AI,” or machine learning, which involves teaching a computer how to find its own patterns in data. This kind of AI is ubiquitous right now, she said.
“It’s in your thermostat in your house, in the newspaper articles that you’re shown when you wake up, in the way the McDonald’s worker is scheduled for work and called into work. The assessment of people in hospitals, and their healthcare decisions, and the way that education metrics are created,” she said.
Although some of their conclusions were not ultimately pertinent—the distance to water connection was a red herring, caused by the fact that much of Waterville’s municipal land is close to water—others were, Klepach said.
“I was pretty impressed by what they were able to do in a day. I worked on that data for weeks and months, and given the extremely short period of time they had, it was pretty interesting what they did,” he said. “It was a first pass for those students to begin thinking about the problem, and we identified some students who were really talented and interested in it.”
Baron Wang ’24, one of the participating students, was on the team that won an award for the best usage of Dataiku. “Overall, I would say the datathon was a fun, topical, and intellectually challenging experience that I would definitely recommend to other people,” he said. “I like how the professors and the assistants were able to offer help and clarification whenever we were stuck.”
Wang decided to participate because he saw an opportunity to gain real-world experience with his data-analysis skills, and he’s glad he did.
“It was more than worthwhile, and we learned a lot about browntail moth caterpillars,” he said.