Jump to content

Course:EOSC311/2025/Big Data and AI—Revolutionizing Earthquake Forecasting and Preparedness

From UBC Wiki

Summary

This page discusses the role of big data and artificial intelligence, including machine learning and deep learning, in revolutionizing the way scientists forecast and prepare for earthquakes. It introduces the fundamentals of earthquake forecasting, data collection and preparation, Machine/Deep Learning, and Big Data, with a focus on the Cascadia Subduction Zone near Vancouver. Even though there are several challenges to using these technologies for earthquake forecasting, the improvements they bring and its potential far outweigh them.

Statement of connection and why you chose it

Big data, AI, and earthquake forecasting is a topic I am very interested in and it stems from a personal connection to the subject. As someone who grew up on Vancouver Island, I have been constantly reminded of the looming threat of "The Big One”, which is a potentially catastrophic megathrust earthquake along the Cascadia Subduction Zone that I will talk about later in this page. From elementary school drills to public service announcements, the importance of earthquake preparedness was apparent to me from a young age.

This awareness of the seismic risks led to a curiosity of how science and technology are evolving to address the risk. Since AI is a prominent topic of today, and I am a computer science student fascinated by big data who works in graph databases, it was natural to research a connection between the two. This topic allows me to explore how these cutting-edge technologies directly impact my community, and potentially mitigate the very real dangers presented by the subduction zone near us.

Introduction to Earthquakes and Forecasting

An earthquake is defined as an intense shaking of the surface of the Earth, caused by movements of the Earth’s outermost layer. (NASA Space Place, n.d.)This is often caused by the movement of tectonic plates at fault lines rubbing against each other, or colliding. These movements cause many extremely dangerous hazards such as tsunamis, landslides, and damaged buildings/infrastructure.

Recent earthquakes such as the March 2025 magnitude 7.7 in Myanmar, and the January 2024 magnitude 7.5 in Japan, caused immense damage. The Myanmar quake caused 3815 fatalities, and over 5000 injuries, whereas the Japanese quake caused an estimated $17.6 billion USD worth of damage (VolcanoDiscovery, 2025).

With how destructive earthquakes can be, advanced notice is key in mitigating damages, and losses. The time-scale of forecasting is crucial too, but it is quite hard to forecast very far into the future accurately for earthquakes. For example, an earthquake deemed “The Big One”, which is a potential megathrust earthquake from the Juan de Fuca Plate subducting under the North American Plate off the coast of Vancouver Island (where I am from!). Scientists are unable to pin down exactly when it will be, and with traditional forecasting methods dating relies heavily on historical patterns and statistical models (Tuhin, 2025).

Based on the historical data from previous quakes in the same area, it is forecasted to be sometime in the next 200-300 years. Not very specific or helpful is it?

Traditional methods of forecasting have their problems, the elephant of which being that they cannot be precise in predicting dates, which leads to very limited capability to adapt in real time. (Geller, 1997). Better methods of forecasting Earthquakes are needed, but how can we ‘read’ tectonic plates? What kind of data can we use to forecast these catastrophic shakings?


Data Collection and Processing

To forecast something, data is needed to base the forecast on, otherwise it is just a senseless guess. Likewise, data is needed to forecast earthquakes magnitudes, damages, and locations.

Some of the primary data sources include seismographs, GPS stations, Interferometric Synthetic Aperture Radar (inSAR), satellite imagery, and historical earthquake catalogs.

Seismographs measure ground motion caught by seismic waves, which is very helpful detecting earthquakes in real time, it is also useful for analyzing patterns in earthquake activity, and identifying active vaults and stress zones (IRIS, 2021).

GPS stations work on a more macro level; they are useful for deciphering movements of tectonic plates, which can help estimate where stress is accumulating, leading to an earthquake (EarthScope Consortium, n.d.).

inSAR is in my opinion the most interesting data source, as it uses satellite radar to detect ‘silent’ earthquakes, which are not felt by people (U.S. Geographic Survey, n.d.).

Satellite imagery, like the GPS stations, work on a more macro level. The images help track landscape changes near important fault zones, and map fault lines to understand behaviours (Więcławska, 2023).

The last data source I will talk about is a very important one, historical earthquake catalogs. These are databases with the amalgamation of information from historical earthquakes, including data such as date, location, depth, and magnitude. These are crucial to identify patterns, cycles, and assess overdue faults such as the Cascadia Subduction Zone, and train models for earthquake prediction (which will be discussed more in depth later).

Once this data is collected after processing from various sources, the raw data (meaning not formatted or prepared) is cleaned, normalized, and time-aligned. Some short descriptions of these processes:

Cleaning - Removing errors, such as human errors, or non-seismic occurrences that were tracked

Normalizing - Bringing all data in categories in the same scale or formats

Time-aligned - Organizing the data from multiple sources or sensors by data/time

Now that all the data is collected and processed, it needs to be stored. One emerging technology used for this is Graph Databases (what I work on), however they are still emerging and not fully adopted yet. Currently, the majority of data is stored in relational databases in large servers or open access repositories such as the International Seismological Center, USGS Earthquake Catalog, or the IRIS Seismic Data. Many countries such as Japan and France have their own databases as well.

Machine and Deep Learning in Earthquake Prediction

With this abundance of data, making sense of it becomes extremely difficult and time consuming. Skilled seismologists and geologists have a difficult time making use of the vast sea of data collected, and it takes large sums of time, and money. Not only that, but when doing such a large difficult task, human error is bound to seep in. One of the newest emerging technologies revolutionizing forecasting, is machine and deep learning, however the choice of model often depends on the size and nature of the dataset and the specific forecasting goal.

Machine learning is a substrate of artificial intelligence that makes algorithms and models for computers to make predictions and learn without being programmed to. It analyzes large data sets, and identifies patterns and makes predictions based on the data (like a traditional seismologist!).

Deep learning however, is another substrate of artificial intelligence that uses neural networks to analyze data finding patterns and relationships. They become more accurate, learning as they are fed more data, which is very useful for earthquake prediction (sakshiparikh23, 2024).

These models can process data and predict earthquakes with similar or even greater accuracy then seismologist experts. This study by Ross et. al (2018) found that deep learning models can classify first-motion polarity with 95% accuracy, suggesting they can perform as good, or better, than human experts.

Another study by Wang et. al (2023), discovered that their deep learning algorithm improved magnitude estimation in early earthquake warnings, to better performance then traditional empirical methods.

More specifically, some of the most common models include Random Forests, and Neural networks. The learning method for random forests builds multiple decision trees and merges their outputs to improve predictive accuracy and control for overfitting of the model.They can also handle missing data and effectively model the relationships between different seismic precursors (Rouet-Leduc et al., 2017). Artificial Neural Networks and more advanced deep learning architectures are particularly powerful for forecasting. They can identify subtle, complex patterns in large datasets that may not be apparent to human analysts such as continuous seismic waveforms (Mousavi et al., 2020).

Several real-world applications and research studies highlight the potential of ML/DL in earthquake forecasting. Japan's Earthquake Early Warning (EEW) system is one of the world's most advanced, and uses data from thousands of seismometers and incorporates algorithms to quickly detect and analyze P-waves, issuing alerts before the more destructive S-waves arrive (Japan Meteorological Agency, n.d.).

Recent research has focused on systematically comparing model performance. A 2023 study published in Nature demonstrated that a deep learning model was able to forecast the locations of aftershocks in different regions with greater accuracy than traditional physics-based models (DeVries et al., 2023).

Model Type Study Accuracy/Performance Metric Key Finding
Deep Learning DeVries et al. (2023) Area Under ROC Curve (AUC) = 0.849 Outperformed traditional models in forecasting aftershock locations globally.
Random Forest Rouet-Leduc et al. (2017) Successful prediction of lab quake timing Identified hidden signals in emissions that reliably preceded lab-generated earthquakes.
LSTM Mousavi et al. (2020) Roughly 95% accuracy in earthquake detection Successfully detected earthquakes from a single-station waveform, outperforming traditional methods.

Big Data Analytics in Seismology

Earthquake science is a big data problem. Modern sensor networks generate tons of information that is too large for traditional software tools and seismologists to process efficiently. This is due to main 3 V’s of big data:

Volume: Terabytes to petabytes of seismic, GPS, and satellite data.

Velocity: Real-time data from thousands of sensors worldwide require high-speed processing for early warning.

Variety: Data comes in many forms, from structured catalogs of earthquake events to unstructured satellite imagery and text-based reports.

(TechTarget, n.d.)

Platforms like Google Earth Engine and Amazon Web Services (AWS) provide the scalable storage and computational power needed to analyze these massive datasets. For example, researchers use these platforms to run deep learning models on global InSAR imagery to detect subtle ground deformation patterns that would be difficult with traditional methods, and might signal an impending earthquake (Ciampa et al., 2022). If you would like to learn more about how cloud computing is aiding in revolutionizing seismology and earthquake forecasting, there is a very recent review paper from Ni et. al (2025) at the University of Washington that goes into depth on the topic.

Vancouver and the Cascadia Subduction Zone

Vancouver, as previously mentioned, is situated near the Cascadia Subduction Zone, which is a 1,000 km long fault stretching from Northern California to British Columbia. Here, the oceanic Juan de Fuca Plate is subducting beneath the continental North American Plate.

For those like I, native to BC, we have been hearing about “the Big One” since elementary school. The last major megathrust earthquake here occurred over 300 years ago, leaving us due for another major event sometime in the next 300 years (Natural Resources Canada, 2018). Such an earthquake would generate severe ground shaking across Vancouver Island and the Lower Mainland, and cause catastrophic damage. Researchers estimate the quake to be around 9.0 in magnitude, and generate a Tsunami reaching heights of 30m alongside landslides and infrastructure damage (Cascadia Region Earthquake Workgroup, n.d.).

As you can see, it is quite the problem, and needs to be forecasted and prepared for as soon and accurately as possible. Researchers can use supercomputers to run motion simulations to predict how seismic waves would travel through Vancouver and Vancouver Islands complex terrain. These physics-based models are being integrated more and more with ML predictions from historical data to create more accurate probabilistic seismic hazard assessments. One of the larger risks in Greater Vancouver due to the local topology (Natural Resources Canada, 2018). Areas built on soft, water-saturated sediment, such as Richmond, are prone to soil liquefaction, which increases shaking and potential damage(City of Richmond Fire-Rescue, n.d.).

Implications for Emergency Management

There are many emergency earthquake warning (EEW) systems currently in use that use traditional methods to great success. One of the best examples is Japan's EEW that has significantly reduced potential damage and injuries to the public. For example, a magnitude 7.6 earthquake hit Noto, Japan in January 2024, and the EEW sent alerts to phones, TVs, and Radios to alert the public, while also shutting down the railway systems. And in 2022, an earthquake near the island of Honshu decimated railway tracks, but since the train systems were halted by the EEW, the train passengers were all safe from their 320 km/hr bullet train derailing (Natural Resources Canada, 2024).

Advancements in efficiency and predictions from the machine and deep learning models will have significant impact on emergency management with earthquakes. AI integrated alert systems are one of the many uses, with systems like ShakeAlert (U.S. Geological Survey Earthquake Hazards Program, 2023), already being used in the US, and systems like the Canadian water and infrastructure mapper being developed by Natural Resources Canada (n.d.). These automated systems can quickly prepare for these scenarios by shutting down gas lines, public transit, and giving people succinct time to take cover for the quake which is especially crucial for the elderly and those with mobility impairments.

Limitations

However, as useful as big data and machine learning models are, they come with some substantial complications and concerns.

Earthquakes are complex, with numerous factors influencing their occurrences like their location, depth, and tectonic activity around them. This makes it difficult to predict, and traditional methods often struggle to capture their complex patterns (Koronovskii et al., 2021). In the realm of big data, ‘chaos’ is the attribute of data being unpredictable and turbulent (mParticle, n.d.). Earthquakes are inherently ‘chaotic’, being characterized by its dependence on the initial conditions of the regions they occurred in. This means that even the smallest of uncertainties or changes in initial data or model parameters can lead to very different outcomes, which makes long term prediction very hard. This is apparent due to the fact that no scientists have ever predicted a major earthquake before (U.S. Geological Survey, 2023).

Another glaring problem with machine learning, and specifically deep learning models, is the wealth of data needed for effective training. Not only does there need to be vast amounts of data, the data must also be high quality, and labeled well to be useful. Getting datasets that meet these qualities is difficult due to the presence of noise in seismic measurements. Noise in data refers to erratic, irrelevant, or misleading parts of a dataset that does not represent the true story, like a red herring almost (AICORR, n.d.). Another problem with the data quality, is that the noise as well as errors in the data, can strongly bias the hazard estimations. It is crucial that the quality of data is high to produce high quality results from the models (Legendre & Kumar, 2023).

There are also limitations on the non-technical side. Failure to train models correctly and accurately can lead to the models predicting inaccurately, and create ‘hallucinated’ results, which is a result that is nonsensical, or extremely inaccurate (IBM, 2023) . When this is used in things like emergency broadcast systems, this can lead to increased rates in false alarms, which will damage the public's trust in the system. Lack of trust can lead to the general public ignoring emergency broadcasts when there actually is an emergency.

Conclusion / Your Evaluation of the Connections

Machine learning models can be used to improve the efficiency of earthquake forecasting and detection. Big data and AI are powerful tools that augment, rather than replace, the expertise of geoscientists. This clearly demonstrates the direct connection between computer science and earth science is having a positive impact on earthquake preparedness. While limitations remain, the inherent nature and randomness of seismic processes makes perfect prediction impossible. Despite that, these technologies are helping to create a more accurate and timely understanding of seismic hazards, increasing emergency preparedness, revolutionizing earthquake forecasting and preparedness.

Big data is already crucial to earthquake forecasting, however I think that machine learning and deep learning will be the future of earthquake forecasting. The models get better with more data and training, and the algorithms behind themselves will improve over time. Additionally, much of how earthquakes work is still a mystery, and as that gap in knowledge, and seismology as a whole advances, it will only further the efficiency of these technologies even more.

References

AICORR. (n.d.). What is noise in data? Retrieved from https://aicorr.com/articles/what-is-noise-in-data/

Cascadia Region Earthquake Workgroup. (n.d.). Cascadia Subduction Zone. Oregon Department of Emergency Management. Retrieved from https://www.oregon.gov/oem/hazardsprep/pages/cascadia-subduction-zone.aspx

Ciampa, E., Solaro, G., & Tebbens, S. F. (2022). Unsupervised learning from InSAR data: A review and a new perspective for ground deformation pattern retrieval. Remote Sensing, 14(3), 735. https://doi.org/10.1016/j.jag.2023.103276

City of Richmond Fire-Rescue. (n.d.). Know your risks - Earthquakes and Tsunami. Retrieved from https://firerescue.richmond.ca/know-your-risks-earthquakes-and-tsunami/

DeVries, P. M. R., Viégas, F., Wattenberg, M., & Meade, B. J. (2023). A deep learning approach to forecasting aftershock locations. Nature, 623(7989), 1017–1022. https://doi.org/10.1038/s41586-023-06602-x

EarthScope Consortium. (n.d.). GPS and earthquakes. Retrieved from https://www.earthscope.org/what-is/gps/gps-and-earthquakes/

Geller, R. J. (1997). Earthquake prediction: A critical review. Geophysical Journal International, 131(3), 425–450.

IBM. (2023). What are AI hallucinations? IBM Think. https://www.ibm.com/think/topics/ai-hallucinations

IRIS. (2021). What is a seismograph and how does it work. IRIS Edu. https://www.iris.edu/hq/programs/epo/life_of_a_seismologist/its_instrumental/what_is_a_seismograph_and_how_does_it_work

Japan Meteorological Agency. (n.d.). What is the Earthquake Early Warning? Retrieved from https://www.jma.go.jp/jma/en/Activities/eew.html

Koronovskii, N.V., Zakharov, V.S. & Naimark, A. (2021) The Unpredictability of Strong Earthquakes: New Understanding and Solution of the Problem. Moscow Univ. Geol. Bull. 76, 366–373. https://doi.org/10.3103/S0145875221040074

Legendre, C. P., & Kumar, U. (2023, August). Seismological data quality controls—A synthesis. ResearchGate. https://www.researchgate.net/publication/373430377_Seismological_Data_Quality_Controls-A_Synthesis

Mousavi, S. M., Ellsworth, W. L., Zhu, W., Chuang, L. Y., & Beroza, G. C. (2020). Earthquake transformer—An attentive deep-learning model for simultaneous earthquake detection and phase picking. Nature Communications, 11(1), 3952. https://doi.org/10.1038/s41467-020-17591-w

mParticle. (n.d.). What is Data Chaos and How to Solve It. Retrieved from https://www.mparticle.com/blog/data-chaos/

NASA Space Place. (n.d.). What is an earthquake? NASA Science for Kids. Retrieved from https://spaceplace.nasa.gov/earthquakes/

Natural Resources Canada. (2018). The Cascadia Subduction Zone. Retrieved from https://natural-resources.canada.ca/stories/simply-science/takes-only-seconds-protect-yourself-earthquake

Natural Resources Canada. (2024). It takes only seconds to protect yourself in an earthquake. Retrieved from https://natural-resources.canada.ca/stories/simply-science/takes-only-seconds-protect-yourself-earthquake

Natural Resources Canada. (n.d.). Current Artificial Intelligence Projects. Retrieved from https://natural-resources.canada.ca/funding-partnerships/current-artificial-intelligence-projects

Ni, Y., Denolle, M., Munchmeyer, J., Wang, Y., Feng, K.-F., Suarez, C., Thomas, A., Trabant, C., Hamilton, A., & Mencin, D. (2025). A review of cloud computing in seismology. arXiv. https://arxiv.org/abs/2506.11307

Ross, Z. E., Meier, M.-A., & Hauksson, E. (2018). P-wave arrival picking and first-motion polarity determination with deep learning. arXiv. https://arxiv.org/abs/1804.08804

Rouet-Leduc, B., Hulbert, C., Lubbers, N., Barros, K., Geller, R. J., & Johnson, P. A. (2017). Machine learning predicts laboratory earthquakes. Geophysical Research Letters, 44(19), 9647–9652. https://doi.org/10.1002/2017GL074677

sakshiparikh23. (2024). Difference between machine learning and deep learning. GeeksforGeeks. Retrieved June 19, 2025, from https://www.geeksforgeeks.org/artificial-intelligence/difference-between-machine-learning-and-deep-learning/

TechTarget. (n.d.). The ultimate guide to big data for businesses. Retrieved from https://media.techtarget.com/digitalguide/images/Misc/EA-Marketing/Eguides/The_Ultimate_Guide_to_Big_Data_for_Businesses_Updated.pdf

Tuhin, M. (2025). The science of earthquakes: How seismologists predict the unpredictable. Science News Today. https://www.sciencenewstoday.org/the-science-of-earthquakes-how-seismologists-predict-the-unpredictable

U.S. Geological Survey. (2023). Can you predict earthquakes? https://www.usgs.gov/faqs/can-you-predict-earthquakes

U.S. Geological Survey Earthquake Hazards Program. (2023). ShakeAlert - Earthquake Early Warning. Retrieved from https://earthquake.usgs.gov/data/shakealert/

U.S. Geological Survey, Volcano Hazards Program. (n.d.). InSAR—Satellite-based technique captures overall deformation "picture". Retrieved from https://www.usgs.gov/programs/VHP/insar-satellite-based-technique-captures-overall-deformation-picture

VolcanoDiscovery. (2025). Recent damaging or deadly earthquakes in the world. Retrieved from https://www.volcanodiscovery.com/earthquakes/damaging.html

Więcławska, J. (2023). Introduction to satellite imagery and its relevance to disaster management. Geoawesome. https://geoawesome.com/eo-hub/introduction-to-satellite-imagery-and-its-relevance-to-disaster-management/

Wang, Y., Li, X., Wang, Z., Liu, J. (2023). Deep learning for magnitude prediction in earthquake early warning. Gondwana Research. https://doi.org/10.1016/j.gr.2022.06.009

This Earth Science resource was created by Course:EOSC311.