OpenAI Whisper: the Open Source ASR based on Transformers

machine learning whisper ASR

As described on the official OpenAI website, Whisper is an Automatic Speech Recognition (ASR) system trained on 680,000 hours of supervised multilingual and multitasking data collected from around the web.

The use of such a large and diverse data set leads to greater robustness in speech recognition even in the presence of particular accents, accentuated background noise and specific or technical language. It also allows for transcription into multiple languages, as well as translation from those languages into English.

Continue reading “OpenAI Whisper: the Open Source ASR based on Transformers”

MLOps scalability

machine learning pipeline

Nowadays Machine Learning (ML) techniques are applied in various industries, along with an increasing number of projects and complexity. This generates on one hand the need for greater governance, i.e. the ability to orchestrate and control the development and deploy over the entire ML life cycle (preprocessing, model training, testing, deployment), on the other hand, the need for scalability, i.e. being able to efficiently replicate entire parts of the process, in order to manage multiple ML models.

A recent USA research, carried out to understand the Machine Learning trends for 2021, has conducted a survey on a significant sample of 400 companies: 50% of these are currently managing more than 25 models of ML and 40% of the total runs over 50 ML models. Among large organizations (over 25,000 collaborators) 41% of them turned on to have over 100 ML algorithms in production!

Continue reading “MLOps scalability”

Recurrent Neural Networks for Sentiment Analysis

ai and sentiment analysis

We described in one of the previous posts how to use convolutional neural networks, in order to perform speech recognition related to simple numbers from zero to nine.

In practice, speech recognition has superior performance by adopting particular neural networks called Recurrent Neural Networks, or simply RNNs.

Unlike “simple” feed-forward neural networks, RNNs process as input both the data currently provided as such, plus some of the output data provided retroactively. This allows them to work “with memory“.

Continue reading “Recurrent Neural Networks for Sentiment Analysis”

AI and Videogames

gaming machine learning ai

Artificial intelligence applied to the videogame world may seem of secondary importance, however I decided to talk about it in this post, because the video game market is of extreme interest also for business instead.

We are talking about a continuous growth market with an average annual CAGR (Compounded Average Growth Rate) expected in the range of 8.3% give or take.

gaming market
Gaming market forecast – source Newzoo

It is also a pervasive market involving the use of various kinds of devices, from desktop PCs to smartphones, to optimized gaming console.

Continue reading “AI and Videogames”

Neural networks and speech recognition

convolutional neural network

Deep-learning ASR convolutional-neural-networks

In this post we are going to see an example of CNN (convolutional neural networks) applied to speech recognition application.
The goal of our machine learning model based on CNN’s Deep Learning algorithms will be to classify some simple words, starting with numbers from zero to nine.

To extract the distinctive features of speech, we will first adopt a voice coding procedure rather used in the ASR area (Automatic Speech Recognition) named Mel Frequency Cepstral Coefficient or more simply MFCC.

Thanks to the MFCC technique we will be able to encode every single word spoken vocally into a sequence of vectors, each of them 13 value-long representing the MFCC algorithm coefficients.

In our case – being the single words represented by single-digit numbers – we will go to encode each single number by using a 48 x 13 matrix.

Mel Frequency Cepstral Coefficient

The previous image shows the chain of the main modules involved during an MFCC encoding process: the voice signal is segmented into several frames of proper duration in the time domain (generally 25-40 ms).

Continue reading “Neural networks and speech recognition”

Random forest

Decision Tree

In this post we are going to manage a Classification problem, by using some CART models (Classification And Regression Trees).

We will use the following Bank Marketing Data Set dataset, provided by the UCI Machine Learning Repository:
ref. [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

These are the results about some direct marketing campaigns carried out by a Portuguese bank by using outbound contact center calls, to try to sell repo financial products to customers.
The labeled output data we are interested in predicting are “binary” (column y): “yes” in the event that customers have accepted the bank deposit offer or “no” if the offer has been rejected.

Let’s import some useful libraries with scikit-learn:

Continue reading “Random forest”

Multiple Linear Regression

MLR charts

In the previous post we have analyzed an example of simple linear regression: a set of machine learning algorithms and techniques able to predict an output variable given a single independent variable, therefore through a linear function like Y = c1 + c2X.

Today we are going to see its advanced extension, that is: how to predict Y as a function of multiple linear independent variables (X1, X2, X3 etc … etc …). This type of model is also called multiple linear regression (MLR).

We can reuse the dataset relating to blood tests carried out on Australian professional athletes related to various sports few years ago: reference Telford, R.D. and Cunningham, R.B. 1991 – sex, sport and dependence of hematology on body dimensions in highly trained athletes. Sports medicine and science 23: 788-794.

The dataset contains 13 features related to 202 observations.

AIS dataset

Here is the feature description:

Continue reading “Multiple Linear Regression”

Linear Regression

Linear Regression AI

In this first post we will play a bit with Linear Regression in order to get confidence with some key concepts about machine learning.

Deep Learning Convolutional Neural Network, Recurring Neural Network, Support Vector Machine, Logistic Regression are great techniques for complex prediction, even the non-linear ones.

However Linear Regression is a great way to start when you have to perform prediction about data generally linearly correlated data.

Sport Blood Test dataset

Let’s consider the Australian athletes data set: a nice dataset collected in a study of how data on various characteristics of the blood varied with sport body size and sex of the athlete. These data were the basis for the analyses reported in Telford and Cunningham (1991).

Anybody interested in knowing more about that study can reference to the Telford, R.D. and Cunningham, R.B. 1991. Sex, sport and body-size dependency of hematology in highly trained athletes. Medicine and Science in Sports and Exercise 23: 788-794: https://europepmc.org/article/med/1921671

We are going to use Pyhton with Jupyter Notebook for such a model.

Let’s import some useful libraries to start:

Continue reading “Linear Regression”