Extracting Phonemes

Extracting Phonemes From Speech Samples

My best single model for the recent speech recognition kaggle competition. Was a model based on the idea of extracting a probabilistic map of the phonemes present in a particular speech sample and to then using that phoneme map as a feature set to predict the word.

The dataset provided consists of examples of 30 different words with one word appearing in each 1 second sample. Since there is no phonetic information provided other than which word is which the first step was to turn each word into a phonetic spelling.

Read more…

3D CNN for audio data

3D Time/Frequency/Phase Representation of Audio for Speech Recognition.

I recently participated in a speech recognition kaggle competition. Although I didn't come close to the top of the leaderboard (238th place with 87% accuracy vs 91% accuracy for the winners) I learned quite a bit about handling audio data and had a lot of fun. One of the more novel things I tried during the competition was to spatially encode the phase information in the audio and pass the results into a 3D CNN.

A common pre-processing step in speech recognition is to turn the 1D audio into a 2D spectrogram. The spectrogram the volume of the audio as a function of time and at a particular frequency. Spectrograms are a great way of summarizing the important information in an audio clip in a way that makes it accessible visually. Here is a spectrogram of an utterance of the word "marvin".

marvin_specgram

Read more…

Wavelet Spectrograms for Speech Recognition

Wavelet Features For Speech Recognition.

I've been partipating in the TensorFlow speech recognition challenge.

https://www.kaggle.com/c/tensorflow-speech-recognition-challenge

It seems like the most common approach is to begin by turning the audio into a spectrogram and then feeding that into a 2D CNN. One trouble with spectrograms is that you have to trade off resolution in frequency for resolution in time and vice versa. In principle you can get higher resolution in time for higher frequencies than you can for lower frequencies but when you pick an input length for your short time fourier transform you lose temporal resolution much below the window length.

Wavelets are one possible way around this limitation

Read more…

Low Rank Approximation On Sparsely Observed Data

Intermezzo: Sparsely Observed Data

In the post on using PCA for data imputation we used a weight for each of our data points. By assigning a weight of 0 to missing data and a weight of 1 to the rest of our data we managed to be able to get a reasonably good approximation to what we would find using PCA on the dataset without any data missing.

This is fine when evaluating a dense model for our data matrix is not too much computational overhead. However when our input data are sparsely observed, that is to say most of our data consists of missing values then evaluating the model densely is a tremendous waste of computational resources.

Read more…

Imputing Missing Values With PCA

Uses For PCA Other Than Dimensionality Reduction Part 2

Imputation, and Noise Reduction

Principal Component Analysis (PCA) is frequently applied in machine learning as a sort of black box dimensionality reduction technique. However with a deeper understanding of what PCA is and what it does we can use it for all manner of other tasks e.g.

Read more…

Learning TensorFlow via a 3D printing Project

This is a slightly cleaned up version of the slides for a presentation I gave at the SLCPy meetup a while ago. I intended to write something with nice prose and turn it into something that can stand on its own without the verbal commentary that went along with it, but that isn't going to happen so here are the raw slides.

Read more…

Uses for PCA other than dimensionality reduction part 1

Uses For PCA Other Than Dimensionality Reduction Part I

Decorrelation, Factor Discovery, and Noise Modeling

Principal Component Analysis (PCA) is frequently applied in machine learning as a sort of black box dimensionality reduction technique. However with a deeper understanding of what PCA is and what it does we can use it for all manner of other tasks e.g.

  • Decorrelating Variables
  • Semantic Factor Discovery
  • Empirical Noise Modeling
  • Missing Data Imputation
  • Example Generation
  • Anomaly Detection
  • Patchwise Modeling
  • Noise Reduction

We will demonstrate how to use PCA for these purposes on an example face dataset. In this first post we will handle up till empirical noise modeling and handle the rest in subsequent parts.

Read more…

Using SVMs as Feature Extractors

Using SVM Seperating Plane Distance as a Feature

Support Vector Machines (SVM) are one of my favorite machine learning algorithms. I have decided to make a set of blog posts to explore tricks for dealing with SVMs. Usually SVMs are employed as black box binary classifiers. In this post we are going to explore using the underlying SVMs representation to generate features to use as input for further calculation (for example as an input to other classifiers).

Read more…

Ultrasound Nerve Detection Kaggle Retrospective

Ultrasound Nerve Detection Kaggle Retrospective

This is a summary of my explorations on the brachial plexus nerve segmentation competition.

https://www.kaggle.com/c/ultrasound-nerve-segmentation

Overview

The challenge is to use ultra-sound images to generate a mask which differentiates pixels in the images which belong to the Brachial Plexus (BP) versus those that do not. I decided that I would try using a matched filter approach to locate the BP.

Read more…

Lattice SVM

Lattice SVM

A support vector machine (SVM) is a classifier that attempts to find a maximum margin linear separator for different classes in a very high dimensional implicit feature space. The feature space is usually not explicitly calculated but is instead accessed via a kernel function which provides the effective dot product in the feature space, this has the advantage that we can deal with very large implicit feature spaces this way. In fact the dimensionality of the implicit feature space of most commonly used SVM variants is usually quoted as being infinite, for example the Gaussian kernel is one example. But the high effective feature dimensionality still comes with a high computational cost, we must somehow deal with an N by N matrix of similarities relating all of our training points to each other (the matrix of kernelized "feature dot products").

Read more…