“You hear that forecasts have improved, and they have. But Hurricane Harvey shows we still have a long way to go. When it comes to extreme weather, we need better modeling to understand all the data, such factors as development and climate change.”
So reports Eric Berger, editor of Space City Weather and opening keynote speaker at the Rice 2017 Data Science Conference. Hosted Oct. 9-10 by the Ken Kennedy Institute for Information Technology, the first-ever event drew some 440 leaders from industry and academia.
“Our purpose is to encourage engagement. We want to challenge people to network with others who are not necessarily in their domain. We want data scientists to have conversations with people in other disciplines,” said Jan E. Odegard, executive director of the Kennedy Institute and associate vice president of the office of information technology at Rice.
A meteorologist and former reporter for the Houston Chronicle, Berger recounted his hour-by hour coverage of Harvey, before and after the hurricane made landfall in Texas on August 25. Harvey took 22 lives, damaged 73,000 homes and left at least $5 billion in flood damage.
“How do you communicate risk to the public?” Berger asked. “Some of the forecast models were really grim. It’s one thing to have data and communicate it to people, but the ability to put emotion into it, that’s important too. That’s what I tried to do. We had over two feet of water over an area the size of West Virginia.”
Berger’s talk was titled “What Did the Public Really Know about Harvey, and How Can We Better Inform Them?” He suggested that Houston is no better prepared for extreme weather events than it was when Hurricane Allison hit in 2001. The five heaviest rainfalls ever recorded in the city, he said, have all occurred since 2015.
“We need better modeling even if we do make flood improvements on the ground,” Berger said.
In “Using Big Data and Machine Learning to Build Spatially Fine-Grained Prediction Models of Wind and Flood Damage Risk,” Devika Subramanian, professor of computer science at Rice, concurred with many of Berger’s observations.
“Clearly, we have a problem. When Hurricane Ike hit Houston in 2008, only 59 percent of the predictions were correct. The damage did not match the predictions. We decided to run an error analysis using machine learning,” she said.
She and her colleagues, including Leonardo Duenas-Osorio, associate professor of civil and environmental engineering at Rice, examined the factors that contributed to the likelihood of a Houston residence being damaged by high winds or flooding. They discovered that some of the customary criteria for predicting damage to houses, such as roof shape and type of framing, were not pertinent and that new factors were: building and land value, quality of construction, years lapsed since remodeling, among others.
“We built models directly from the data. From NOAA we took wind data. We used LIIDAR data and data from the Harris County Appraisal District,” Subramanian said. “One of the things we learned from Ike was that 51.2 percent of the homes not in a designated flood plain have experienced more than one flood.”
In his keynote address, Rick Stevens spoke on “Integrating Simulation, Data Analysis and Deep Learning in Science and Engineering Applications.” Stevens is professor of computer science at the University of Chicago and associate laboratory director for computing, environment and life sciences at Argonne National Laboratory.
“By 2020, the market for machine learning will total about $40 billion. Deep learning is using multi-layered neural networks to do machine learning, a program that gets smarter or more accurate as it gets more data to make predictions. It’s a useful way to learn to solve problems,” Stevens said.
Deep learning, he said, has applications in such research areas as materials science, genomics, climate science and drug design. “Cancer is a very rich target for machine learning. It is outperforming traditional physics, and we’re learning a lot about the climate/disease/environment association,” Stevens said, and made a prediction: By 2021, one-third of all supercomputing jobs will be machine learning applications.
“Data at all steps will be integrated using machine learning,” Stevens said.
Niall Gaffney is director of data intensive computing for the Texas Advanced Computing Center (TACC) at the University of Texas at Austin. TACC’s supercomputer, Stampede2, is ranked the 12th most powerful in the world, and soon will have a peak performance of 18 petaflops.
“TACC made possible pre-storm Hurricane Harvey and Irma forecasts with researchers from Penn State. We did storm surge modeling and preliminary river flooding and inundation maps. We’re also working on how to forecast hail, moving the warning time from two hours to 24 hours,” Gaffney said.
Genevera Allen is associate professor of statistics at Rice, with a joint appointment at Baylor College of Medicine, and an investigator in the Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital. Her talk was titled “Interactive and Dynamic Visualization for Clustering.”
“In the simplest terms, clustering is finding groups in data. We look for objects which are similar to each other,” said Allen, whose research focuses on development of statistical methods to make sense of big data in applications such as high-throughput genomics and neuroimaging.
Allen cited a recent study of presidential inaugural speeches undertaken by one of her graduate students. Using statistical analysis, and identifying the 75 most often used words, he determined that the 29th U.S. president, Warren G. Harding, was “the worst inauguration speaker in U.S. history.” He also found that the words most frequently used by the current president, Donald Trump, were “job,” “today” and “Mexico.”
Near the conclusion of the Data Science Conference, Odegard told his audience, “What we have been learning here is how to best leverage data. We hope to bring communities together, professionals and practitioners.”
The conference featured six plenary talks, sixteen parallel sessions and the presentation of 49 student posters. In addition to Odegard, the workshop organizers included Natalie Berestovsky, Anadarko Petroleum; Keith Cooper, Rice University; Alena Crivello, Chevron; Trond Elefsen, Invatare; Roy Keyes, Houston Data Science group; Scott Morton; Craig Rusin, Baylor College of Medicine; Francisco Sanchez, HEDS Group; Jim Wes, Two Sigma.