Audio Classification of Marine Mammals

class: top, left, inverse, title-slide

.title[
# Audio Classification of Marine Mammals
]
.author[
### Lampros Sp. Mouselimis
]
.institute[
### Monopteryx
]
.date[
### 2025-04-24<br><svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M579.8 267.7c56.5-56.5 56.5-148 0-204.5c-50-50-128.8-56.5-186.3-15.4l-1.6 1.1c-14.4 10.3-17.7 30.3-7.4 44.6s30.3 17.7 44.6 7.4l1.6-1.1c32.1-22.9 76-19.3 103.8 8.6c31.5 31.5 31.5 82.5 0 114L422.3 334.8c-31.5 31.5-82.5 31.5-114 0c-27.9-27.9-31.5-71.8-8.6-103.8l1.1-1.6c10.3-14.4 6.9-34.4-7.4-44.6s-34.4-6.9-44.6 7.4l-1.1 1.6C206.5 251.2 213 330 263 380c56.5 56.5 148 56.5 204.5 0L579.8 267.7zM60.2 244.3c-56.5 56.5-56.5 148 0 204.5c50 50 128.8 56.5 186.3 15.4l1.6-1.1c14.4-10.3 17.7-30.3 7.4-44.6s-30.3-17.7-44.6-7.4l-1.6 1.1c-32.1 22.9-76 19.3-103.8-8.6C74 372 74 321 105.5 289.5L217.7 177.2c31.5-31.5 82.5-31.5 114 0c27.9 27.9 31.5 71.8 8.6 103.9l-1.1 1.6c-10.3 14.4-6.9 34.4 7.4 44.6s34.4 6.9 44.6-7.4l1.1-1.6C433.5 260.8 427 182 377 132c-56.5-56.5-148-56.5-204.5 0L60.2 244.3z"/></svg> <a href="https://monopteryx.netlify.app/portfolio/">monopteryx-dashboard/</a><br><svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M579.8 267.7c56.5-56.5 56.5-148 0-204.5c-50-50-128.8-56.5-186.3-15.4l-1.6 1.1c-14.4 10.3-17.7 30.3-7.4 44.6s30.3 17.7 44.6 7.4l1.6-1.1c32.1-22.9 76-19.3 103.8 8.6c31.5 31.5 31.5 82.5 0 114L422.3 334.8c-31.5 31.5-82.5 31.5-114 0c-27.9-27.9-31.5-71.8-8.6-103.8l1.1-1.6c10.3-14.4 6.9-34.4-7.4-44.6s-34.4-6.9-44.6 7.4l-1.1 1.6C206.5 251.2 213 330 263 380c56.5 56.5 148 56.5 204.5 0L579.8 267.7zM60.2 244.3c-56.5 56.5-56.5 148 0 204.5c50 50 128.8 56.5 186.3 15.4l1.6-1.1c14.4-10.3 17.7-30.3 7.4-44.6s-30.3-17.7-44.6-7.4l-1.6 1.1c-32.1 22.9-76 19.3-103.8-8.6C74 372 74 321 105.5 289.5L217.7 177.2c31.5-31.5 82.5-31.5 114 0c27.9 27.9 31.5 71.8 8.6 103.9l-1.1 1.6c-10.3 14.4-6.9 34.4 7.4 44.6s34.4 6.9 44.6-7.4l1.1-1.6C433.5 260.8 427 182 377 132c-56.5-56.5-148-56.5-204.5 0L60.2 244.3z"/></svg> <a href="https://www.linkedin.com/in/mlampros/">linkedin</a><br><br><img src="images/watkins_cropped.png" width="295" height="210"><br><font size="2"><a href="https://whoicf2.whoi.edu/science/B/whalesounds/index.cfm">Watkins Marine Mammals Sound Database</a></font>
]

---

class:hide_logo

<div>
<style type="text/css">.xaringan-extra-logo {
width: 110px;
height: 128px;
z-index: 0;
background-image: url(images/monopteryx.png);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:8.7em;right:1em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.title-slide):not(.inverse):not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('div')
          logo.classList = 'xaringan-extra-logo'
          logo.href = null
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

# Audio Classification

<br>

Audio classification is the process of analyzing sound recordings and categorizing them based on features such as frequency and duration. It is widely used in

* speech recognition
* music genre classification
* bioacoustics

<br>

In this Project we use the Watkins Marine Mammal Sound Database to detect and identify species based on their vocalizations using Machine Learning models. The database contains recordings of various marine mammals, including whales, dolphins, and seals. The Marine Mammal Database has been referenced in many scientific papers to date.

<br>

---
class:hide_logo

# Methodology

.pull-left[

<br>

The following workflow was used:

- The audio files were downloaded from the official website
- The marine mammal sounds were pre-processed (sample rate, duration, bit depth)
- Features were created using
    - a pre-trained Transformer Model
    - Mel-Frequency
- Dimensionality Reduction was performed
- Machine Learning Models were trained using 5-fold cross-validation and the optimal hyper parameter setting
- An Application Programming Interface (API) was created to make the results available to the user
- A web browser application was implemented so that the user can access the results
]

.pull-right[

<img src="images/diagram.png" width="60%" style="display: block; margin: auto;" />
]

---
class:hide_logo

# Results

<br>

The following table shows the 5-fold cross-validation (Accuracy) results for

- *Logistic Regression*, *MLP Classifier*, *SVM Radial* based on the **sklearn Python** library (ML-Models with a trailing **_Py**)
- *SVM Radial* using **R programming** (with a trailing **_R**)

<br>

<table class="table table-striped table-hover table-bordered" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;font-weight: bold;">   </th>
   <th style="text-align:center;font-weight: bold;"> Logistic_Regression_Py </th>
   <th style="text-align:center;font-weight: bold;"> MLP_Classifier_Py </th>
   <th style="text-align:center;font-weight: bold;"> SVM_Radial_Py </th>
   <th style="text-align:center;font-weight: bold;"> SVM_Radial_R </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;width: 0cm; "> 1 </td>
   <td style="text-align:center;width: 5cm; "> 92.35 </td>
   <td style="text-align:center;width: 5cm; "> 92.06 </td>
   <td style="text-align:center;width: 6cm; "> 92.06 </td>
   <td style="text-align:center;"> 95.86 </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 0cm; "> 2 </td>
   <td style="text-align:center;width: 5cm; "> 90.59 </td>
   <td style="text-align:center;width: 5cm; "> 92.06 </td>
   <td style="text-align:center;width: 6cm; "> 94.41 </td>
   <td style="text-align:center;"> 91.23 </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 0cm; "> 3 </td>
   <td style="text-align:center;width: 5cm; "> 93.51 </td>
   <td style="text-align:center;width: 5cm; "> 93.51 </td>
   <td style="text-align:center;width: 6cm; "> 93.81 </td>
   <td style="text-align:center;"> 93.45 </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 0cm; "> 4 </td>
   <td style="text-align:center;width: 5cm; "> 89.97 </td>
   <td style="text-align:center;width: 5cm; "> 89.38 </td>
   <td style="text-align:center;width: 6cm; "> 92.04 </td>
   <td style="text-align:center;"> 93.82 </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 0cm; "> 5 </td>
   <td style="text-align:center;width: 5cm; "> 92.63 </td>
   <td style="text-align:center;width: 5cm; "> 94.10 </td>
   <td style="text-align:center;width: 6cm; "> 94.10 </td>
   <td style="text-align:center;"> 94.13 </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 0cm; font-weight: bold;"> AVG </td>
   <td style="text-align:center;width: 5cm; font-weight: bold;"> 91.81 </td>
   <td style="text-align:center;width: 5cm; font-weight: bold;"> 92.22 </td>
   <td style="text-align:center;width: 6cm; font-weight: bold;"> 93.28 </td>
   <td style="text-align:center;font-weight: bold;"> 93.70 </td>
  </tr>
</tbody>
</table>

---
class:hide_logo

# Web Browser Application

.pull-left[

The embedded video shows how the implemented web browser application works, where a user can upload a .wav file and

- view the Spectogram
- receive the predicted classes (plot)
- view the metadata as a datatable 
- download the predictions as a .csv file

]

.pull-right[

]

---
class:hide_logo

# References

* [Watkins Marine Mammals Sound Database](https://whoicf2.whoi.edu/science/B/whalesounds/index.cfm)
* [BEATs: Audio Pre-Training with Acoustic Tokenizers](https://arxiv.org/abs/2212.09058)
* James Lyons et al. (2020, January 14). jameslyons/python_speech_features: release v0.6.1 (Version 0.6.1). Zenodo. http://doi.org/10.5281/zenodo.3607820
* [Scikit-learn: Machine Learning in Python](https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html), Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
* [kernlab -- An S4 Package for Kernel Methods in R](https://github.com/cran/kernlab), Karatzoglou et al., Journal of Statistical Software, 2004, doi. 10.18637/jss.v011.i09
* Python reference manual, Van Rossum, Guido and Drake Jr, Fred L, 1995,Centrum voor Wiskunde en Informatica Amsterdam
* R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (https://www.R-project.org/)

---

# Professional Services

<br>

If you are looking for professional assistance in machine and deep learning tasks<br> don't hesitate to send a message to
https://monopteryx.netlify.app/contact/