<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.8.5">Jekyll</generator><link href="https://qianhwan.github.io/blog/feed.xml" rel="self" type="application/atom+xml" /><link href="https://qianhwan.github.io/blog/" rel="alternate" type="text/html" /><updated>2019-08-24T00:29:42+00:00</updated><id>https://qianhwan.github.io/blog/feed.xml</id><title type="html">Study Notes | 学习笔记</title><subtitle>Meow
</subtitle><entry><title type="html">Understanding kaldi recipes with mini-librispeech example</title><link href="https://qianhwan.github.io/blog/2019/08/23/understanding-kaldi-recipes-01" rel="alternate" type="text/html" title="Understanding kaldi recipes with mini-librispeech example" /><published>2019-08-23T11:15:00+00:00</published><updated>2019-08-23T11:15:00+00:00</updated><id>https://qianhwan.github.io/blog/2019/08/23/understanding-kaldi-recipes-01</id><content type="html" xml:base="https://qianhwan.github.io/blog/2019/08/23/understanding-kaldi-recipes-01">&lt;p&gt;This note provides a high-level understanding of how kaldi recipe scripts work, with the hope that people with little experience in shell scripts (like me) can save some time learning kaldi.&lt;/p&gt;

&lt;p&gt;Mini-librispeech is a small subset of LibriSpeech corpus which consists of audio book reading speech. We will go through each step in &lt;em&gt;kaldi/egs/mini_librispeech/s5/run.sh&lt;/em&gt;.&lt;/p&gt;

&lt;h3 id=&quot;parameters-and-environment-setup&quot;&gt;Parameters and environment setup&lt;/h3&gt;
&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Change this location to somewhere where you want to put the data.&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;./corpus/

&lt;span class=&quot;nv&quot;&gt;data_url&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;www.openslr.org/resources/31
&lt;span class=&quot;nv&quot;&gt;lm_url&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;www.openslr.org/resources/11

&lt;span class=&quot;nb&quot;&gt;.&lt;/span&gt; ./cmd.sh
&lt;span class=&quot;nb&quot;&gt;.&lt;/span&gt; ./path.sh

&lt;span class=&quot;nv&quot;&gt;stage&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;0
&lt;span class=&quot;nb&quot;&gt;.&lt;/span&gt; utils/parse_options.sh

&lt;span class=&quot;nb&quot;&gt;set&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-euo&lt;/span&gt; pipefail

mkdir &lt;span class=&quot;nt&quot;&gt;-p&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;data=./corpus/&lt;/code&gt; specifies where you want to store audio and language model data.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;data_url=www.openslr.org/resources/31&lt;/code&gt; specifies the url for downloading audio data.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;lm_url=www.openslr.org/resources/11&lt;/code&gt; specifies the url for downloading vocabulary, lexicon and pre-trained language model (trained on LibriSpeech).&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;. ./cmd.sh&lt;/code&gt; runs script &lt;code class=&quot;highlighter-rouge&quot;&gt;cmd.sh&lt;/code&gt;, you need to change &lt;code class=&quot;highlighter-rouge&quot;&gt;queue.pl&lt;/code&gt; to &lt;code class=&quot;highlighter-rouge&quot;&gt;run.pl&lt;/code&gt; if &lt;code class=&quot;highlighter-rouge&quot;&gt;GridEngine&lt;/code&gt; is not installed.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;. ./path.sh&lt;/code&gt; runs script &lt;code class=&quot;highlighter-rouge&quot;&gt;path.sh&lt;/code&gt; which adds all kaldi executable dependencies to your environment path. This is required every time you start a new terminal, and it can avoided by adding all paths in your &lt;code class=&quot;highlighter-rouge&quot;&gt;.bashrc&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;stage=0&lt;/code&gt; sets which stage this script is on, you can set it to the stage number that has already been executed to avoid running the same command repeatedly.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;. utils/parse_options.sh&lt;/code&gt; enables argument parsing to kaldi scripts (e.g. &lt;code class=&quot;highlighter-rouge&quot;&gt;./run.sh --stage 2&lt;/code&gt; sets variable &lt;code class=&quot;highlighter-rouge&quot;&gt;stage&lt;/code&gt; to 2).&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;set -eup pipefail&lt;/code&gt; makes the scripts exit safely when encountering an error.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;mkdir -p $data&lt;/code&gt; creates the data folder (&lt;code class=&quot;highlighter-rouge&quot;&gt;./corpus/&lt;/code&gt; in this case) if it doesn’t exist already.&lt;/p&gt;

&lt;h3 id=&quot;stages&quot;&gt;Stages&lt;/h3&gt;
&lt;p&gt;Each kaldi recipe consists of multiple &lt;strong&gt;stages&lt;/strong&gt;, which can be spotted with the following syntax:&lt;/p&gt;
&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$stage&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-le&lt;/span&gt; x &lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;then&lt;/span&gt;
  ...
&lt;span class=&quot;k&quot;&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;which simply means run the commands in this block if &lt;code class=&quot;highlighter-rouge&quot;&gt;stage&lt;/code&gt; is less than or equal to number x. I personally like to change &lt;code class=&quot;highlighter-rouge&quot;&gt;-le&lt;/code&gt; to &lt;code class=&quot;highlighter-rouge&quot;&gt;eq&lt;/code&gt; (which means equal) so that I can run the recipe step by step.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;stage&lt;/code&gt; is set to 0 by default, which means the recipe will run all blocks. If you encounter an error, you can check which stages are successfully passes and re-run the recipe by &lt;code class=&quot;highlighter-rouge&quot;&gt;./run.sh --stage x&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;stage-0-data-fetching&quot;&gt;Stage 0: data fetching&lt;/h3&gt;
&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;for &lt;/span&gt;part &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;dev-clean-2 train-clean-5&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do
  &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;local&lt;/span&gt;/download_and_untar.sh &lt;span class=&quot;nv&quot;&gt;$data&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$data_url&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$part&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Download &lt;code class=&quot;highlighter-rouge&quot;&gt;dev-clean-2&lt;/code&gt; (dev set) and &lt;code class=&quot;highlighter-rouge&quot;&gt;train-clean-5&lt;/code&gt; (train set) from the url specified before to &lt;code class=&quot;highlighter-rouge&quot;&gt;./corpus/&lt;/code&gt; and unzip them. You can check the files in &lt;code class=&quot;highlighter-rouge&quot;&gt;./corpus/&lt;/code&gt; folder after running.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;local&lt;/span&gt;/download_lm.sh &lt;span class=&quot;nv&quot;&gt;$lm_url&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$data&lt;/span&gt; data/local/lm
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This line downloads the pre-trained language model to &lt;code class=&quot;highlighter-rouge&quot;&gt;./corpus/&lt;/code&gt; then makes a soft link to &lt;code class=&quot;highlighter-rouge&quot;&gt;data/local/lm&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The files that are downloaded are:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;3-gram.arpa.gz&lt;/em&gt;, trigram arpa LM.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;3-gram.pruned.1e-7.arpa.gz&lt;/em&gt;, pruned (with threshold 1e-7) trigram arpa LM.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;3-gram.pruned.3e-7.arpa.gz&lt;/em&gt;, pruned (with threshold 3e-7) trigram arpa LM.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;librispeech-vocab.txt&lt;/em&gt;, 200K word vocabulary for the LM.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;librispeech-lexicon.txt&lt;/em&gt;, pronunciations, some of which G2P auto-generated, for all words in the vocabulary.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;stage-1-data-preparing-and-lm-training&quot;&gt;Stage 1: data preparing and LM training&lt;/h3&gt;
&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;for &lt;/span&gt;part &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;dev-clean-2 train-clean-5&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;# use underscore-separated names in data directories.&lt;/span&gt;
  &lt;span class=&quot;nb&quot;&gt;local&lt;/span&gt;/data_prep.sh &lt;span class=&quot;nv&quot;&gt;$data&lt;/span&gt;/LibriSpeech/&lt;span class=&quot;nv&quot;&gt;$part&lt;/span&gt; data/&lt;span class=&quot;k&quot;&gt;$(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$part&lt;/span&gt; | sed s/-/_/g&lt;span class=&quot;k&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Create all files that are needed for kaldi training (see &lt;a href=&quot;http://kaldi-asr.org/doc/data_prep.html#data_prep_data&quot;&gt;here&lt;/a&gt; for more details on data preparation). Normally each kaldi recipe comes with a different data preparation script, they creates same files for different dataset. If you want to train a model with your own dataset, you will need to write your own data preparation script that gives you the right &lt;em&gt;kaldi-style&lt;/em&gt; data files.&lt;/p&gt;

&lt;p&gt;If you check &lt;code class=&quot;highlighter-rouge&quot;&gt;data/train_clean_5&lt;/code&gt; after finishing the above commands, you will see the following text files:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;wav.scp&lt;/em&gt;, maps wav files to their paths (with some audio processing commands sometime).&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;utt2spk&lt;/em&gt;, maps utterances to their speaker, when speaker information is unknown, we treat each utterance as a new speaker.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;spk2utt&lt;/em&gt;, maps speakers to the utterances spoken by them.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;text&lt;/em&gt;, maps recordings to their transcribed text.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;spk2gender&lt;/em&gt;, maps speakers to their genders.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;utt2dur&lt;/em&gt;, maps utterances to their durations.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;utt2num_frames&lt;/em&gt;, maps utterances to their number of frames.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each data set (train, dev, test) should have their own set of files. Among these files, &lt;em&gt;wav.scp&lt;/em&gt;, &lt;em&gt;utt2spk&lt;/em&gt;, &lt;em&gt;spk2utt&lt;/em&gt; and &lt;em&gt;text&lt;/em&gt; are essential for building any kaldi models.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;local&lt;/span&gt;/prepare_dict.sh &lt;span class=&quot;nt&quot;&gt;--stage&lt;/span&gt; 3 &lt;span class=&quot;nt&quot;&gt;--nj&lt;/span&gt; 30 &lt;span class=&quot;nt&quot;&gt;--cmd&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$train_cmd&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  data/local/lm data/local/lm data/local/dict_nosp
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;‘nosp’ refers to the dictionary before silence probabilities and pronunciation&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Generate silence phones, non-silence phones and optional silence phones. Generated files are as follows:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;extra_questions.txt&lt;/em&gt;, list of extra questions which will be included in addition to the automatically generated questions for &lt;a href=&quot;https://kaldi-asr.org/doc/tree_externals.html&quot;&gt;decision trees&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;lexicon.txt&lt;/em&gt;, sorted lexicon with some additional silence phones.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;lexiconp.txt&lt;/em&gt;, lexicon with pronunciation probabilities.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;lexicon_raw_nosil.txt&lt;/em&gt;, the same lexicon.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;nonsilence_phones.txt&lt;/em&gt;, list of non-silence phones.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;optional_silence.txt&lt;/em&gt;, list of optional silence phones.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;silence_phones.txt&lt;/em&gt;, list of silence phones.
More detailed explanation can be found &lt;a href=&quot;https://kaldi-asr.org/doc/data_prep.html#data_prep_lang_creating&quot;&gt;here&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;utils/prepare_lang.sh data/local/dict_nosp &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;s2&quot;&gt;&quot;&amp;lt;UNK&amp;gt;&quot;&lt;/span&gt; data/local/lang_tmp_nosp data/lang_nosp
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This prepares the &lt;em&gt;lang&lt;/em&gt; directory with the following files:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;L.fst&lt;/em&gt;, FST form of lexicon.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;L_disambig.fst&lt;/em&gt;, L.fst but including the &lt;a href=&quot;https://kaldi-asr.org/doc/graph.html#graph_disambig&quot;&gt;disambiguation symbols&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;oov.int&lt;/em&gt;, mapped integer of out-of-vocabulary words.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;oov.txt&lt;/em&gt;, out-of-vocabulary words.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;phones.txt&lt;/em&gt;, maps phones with integers.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;topo&lt;/em&gt;, the topology of the HMMs we use.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;words.txt&lt;/em&gt;, maps words with integers.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;phones/&lt;/em&gt;, specifies varies things about the phone set.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;local&lt;/span&gt;/format_lms.sh &lt;span class=&quot;nt&quot;&gt;--src-dir&lt;/span&gt; data/lang_nosp data/local/lm
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Use &lt;em&gt;data/lang_nosp/word.txt&lt;/em&gt; format two pruned arpa LMs to &lt;em&gt;G.fst&lt;/em&gt; in &lt;em&gt;data/lang_nosp_test_tgmed&lt;/em&gt; and &lt;em&gt;data/lang_nosp_test_tgsmall&lt;/em&gt;.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;utils/build_const_arpa_lm.sh data/local/lm/lm_tglarge.arpa.gz &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  data/lang_nosp data/lang_nosp_test_tglarge
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Create ConstArpaLm format language model ( &lt;em&gt;G.carpa&lt;/em&gt; ) from the full 3-gram arpa LM.&lt;/p&gt;

&lt;h3 id=&quot;stage-2-mfcc-extraction&quot;&gt;Stage 2: MFCC extraction&lt;/h3&gt;
&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;mfccdir=mfcc&lt;/code&gt; specifies where to store the extracted MFCCs&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;for &lt;/span&gt;part &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;dev_clean_2 train_clean_5&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do
  &lt;/span&gt;steps/make_mfcc.sh &lt;span class=&quot;nt&quot;&gt;--cmd&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$train_cmd&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--nj&lt;/span&gt; 10 data/&lt;span class=&quot;nv&quot;&gt;$part&lt;/span&gt; exp/make_mfcc/&lt;span class=&quot;nv&quot;&gt;$part&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$mfccdir&lt;/span&gt;
  steps/compute_cmvn_stats.sh data/&lt;span class=&quot;nv&quot;&gt;$part&lt;/span&gt; exp/make_mfcc/&lt;span class=&quot;nv&quot;&gt;$part&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$mfccdir&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Extract MFCCs and computes CMVN stats from &lt;em&gt;data/dev_clean_2&lt;/em&gt; and &lt;em&gt;data/train_clean_5&lt;/em&gt; to &lt;em&gt;mfcc&lt;/em&gt; using 10 parallel jobs. Logs can be found in &lt;em&gt;exp/make_mfcc&lt;/em&gt;, they are what you are going to check if something goes wrong.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Get the shortest 500 utterances first because those are more likely&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# to have accurate alignments.&lt;/span&gt;
utils/subset_data_dir.sh &lt;span class=&quot;nt&quot;&gt;--shortest&lt;/span&gt; data/train_clean_5 500 data/train_500short
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Create a data subset of the shortest 500 utterances. We are not copying any MFCC here, if you look into &lt;em&gt;data/train_500short&lt;/em&gt; you can find a &lt;em&gt;feat.scp&lt;/em&gt; that maps the utterances to where their MFCCs are stored.&lt;/p&gt;

&lt;h3 id=&quot;stage-3-monophone-training&quot;&gt;Stage 3: monophone training&lt;/h3&gt;
&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;steps/train_mono.sh &lt;span class=&quot;nt&quot;&gt;--boost-silence&lt;/span&gt; 1.25 &lt;span class=&quot;nt&quot;&gt;--nj&lt;/span&gt; 5 &lt;span class=&quot;nt&quot;&gt;--cmd&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$train_cmd&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  data/train_500short data/lang_nosp exp/mono
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Train a monophone system using the shortest 500 utterances and the LM trained before, the trained model and logs can be found in &lt;em&gt;exp/mono&lt;/em&gt;.
&lt;code class=&quot;highlighter-rouge&quot;&gt;--boost-silence 1.25&lt;/code&gt; sets the factor by which to boost silence likelihoods in alignment to 1.25.
&lt;code class=&quot;highlighter-rouge&quot;&gt;-nj 5&lt;/code&gt; sets the number of parallel jobs to 5.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
  utils/mkgraph.sh data/lang_nosp_test_tgsmall &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
    exp/mono exp/mono/graph_nosp_tgsmall
  &lt;span class=&quot;k&quot;&gt;for &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;test &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;dev_clean_2&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do
    &lt;/span&gt;steps/decode.sh &lt;span class=&quot;nt&quot;&gt;--nj&lt;/span&gt; 10 &lt;span class=&quot;nt&quot;&gt;--cmd&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$decode_cmd&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; exp/mono/graph_nosp_tgsmall &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
      data/&lt;span class=&quot;nv&quot;&gt;$test&lt;/span&gt; exp/mono/decode_nosp_tgsmall_&lt;span class=&quot;nv&quot;&gt;$test&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;done&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&amp;amp;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Create the final graph ( HCLG.fst model ) and decodes &lt;em&gt;data/dev_clean_2&lt;/em&gt; using the graph. You can find WERs in &lt;em&gt;exp/mono/decode_nosp_tgsmall_dev_clean_2&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In mini_librispeech recipe each training stage (monophone, triphone, dnn etc.) comes with a decoding step, you can comment them out if you don’t want to decode with certain models since it takes some time. But it is a good practice to see improvements when the model gets more complicated.&lt;/p&gt;

&lt;p&gt;As you can see in &lt;em&gt;exp/mono/decode_nosp_tgsmall_dev_clean_2&lt;/em&gt;, there are more than one WER file (e.g. &lt;em&gt;wer_10_0.5&lt;/em&gt;). This is because &lt;em&gt;steps/decode.sh&lt;/em&gt; calls &lt;em&gt;local/score.sh&lt;/em&gt; where we play with some scoring parameters to get the best WER.&lt;/p&gt;

&lt;p&gt;In the example of &lt;em&gt;wer_10_0.5&lt;/em&gt;, 10 is the LM-weight for lattice rescoring, 0.5 is the word insertion penalty factor.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;steps/align_si.sh &lt;span class=&quot;nt&quot;&gt;--boost-silence&lt;/span&gt; 1.25 &lt;span class=&quot;nt&quot;&gt;--nj&lt;/span&gt; 5 &lt;span class=&quot;nt&quot;&gt;--cmd&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$train_cmd&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  data/train_clean_5 data/lang_nosp exp/mono exp/mono_ali_train_clean_5
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Compute the training alignments using the monophone model.&lt;/p&gt;

&lt;h3 id=&quot;stage-4-delta--delta-delta-triphone-training&quot;&gt;Stage 4: delta + delta-delta triphone training&lt;/h3&gt;
&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;steps/train_deltas.sh &lt;span class=&quot;nt&quot;&gt;--boost-silence&lt;/span&gt; 1.25 &lt;span class=&quot;nt&quot;&gt;--cmd&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$train_cmd&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  2000 10000 data/train_clean_5 data/lang_nosp exp/mono_ali_train_clean_5 exp/tri1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Train a triphone model with MFCC + delta + delta-delta features, using the training alignments generated in &lt;strong&gt;Stage 3&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I skipped the decoding commands here.&lt;/em&gt;&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;steps/align_si.sh &lt;span class=&quot;nt&quot;&gt;--nj&lt;/span&gt; 5 &lt;span class=&quot;nt&quot;&gt;--cmd&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$train_cmd&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  data/train_clean_5 data/lang_nosp exp/tri1 exp/tri1_ali_train_clean_5
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Compute the training alignments using the triphone model.&lt;/p&gt;

&lt;h3 id=&quot;stage-5-lda--mllt-triphone-training&quot;&gt;Stage 5: LDA + MLLT triphone training&lt;/h3&gt;
&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;steps/train_lda_mllt.sh &lt;span class=&quot;nt&quot;&gt;--cmd&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$train_cmd&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--splice-opts&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;--left-context=3 --right-context=3&quot;&lt;/span&gt; 2500 15000 &lt;span class=&quot;se&quot;&gt;\&amp;lt;&lt;/span&gt;Paste&amp;gt;
  data/train_clean_5 data/lang_nosp exp/tri1_ali_train_clean_5 exp/tri2b
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Train a triphone model with LDA and MLLT feature transforms, using the training alignments generated in &lt;strong&gt;Stage 4&lt;/strong&gt;.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;steps/align_si.sh  &lt;span class=&quot;nt&quot;&gt;--nj&lt;/span&gt; 5 &lt;span class=&quot;nt&quot;&gt;--cmd&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$train_cmd&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--use-graphs&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  data/train_clean_5 data/lang_nosp exp/tri2b exp/tri2b_ali_train_clean_5
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Again, compute the training alignments using the newly trained triphone model.&lt;/p&gt;

&lt;h3 id=&quot;stage-6-lda--mllt--sat-triphone-training&quot;&gt;Stage 6: LDA + MLLT + SAT triphone training&lt;/h3&gt;
&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;steps/train_sat.sh &lt;span class=&quot;nt&quot;&gt;--cmd&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$train_cmd&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; 2500 15000 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  data/train_clean_5 data/lang_nosp exp/tri2b_ali_train_clean_5 exp/tri3b
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Train a triphone model with Speaker Adaptation Training, using the training alignments generated in &lt;strong&gt;Stage 5&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 id=&quot;stage-7-re-create-language-model-and-compute-the-alignments-from-sat-model&quot;&gt;Stage 7: re-create language model and compute the alignments from SAT model&lt;/h3&gt;
&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;steps/get_prons.sh &lt;span class=&quot;nt&quot;&gt;--cmd&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$train_cmd&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  data/train_clean_5 data/lang_nosp exp/tri3b
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;There are several things happen in this command:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Linear lattices (single path) are generated for each utterance in &lt;em&gt;train_clean_5&lt;/em&gt; using the latest alignment and LM.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;A bunch of &lt;em&gt;pron.x.gz&lt;/em&gt; is created with the format of&lt;/p&gt;

    &lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;&amp;lt;utterance-id&amp;gt; &amp;lt;begin-frame&amp;gt; &amp;lt;num-frames&amp;gt; &amp;lt;word&amp;gt; &amp;lt;phone1&amp;gt; &amp;lt;phone2&amp;gt; ... &amp;lt;phoneN&amp;gt;&lt;/code&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;Get &lt;em&gt;pron_counts_nowb.txt&lt;/em&gt; which contains the counts of pronunciations (generated by aligning training data, not from the original text).&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;utils/dict_dir_add_pronprobs.sh &lt;span class=&quot;nt&quot;&gt;--max-normalize&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  data/local/dict_nosp &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  exp/tri3b/pron_counts_nowb.txt exp/tri3b/sil_counts_nowb.txt &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  exp/tri3b/pron_bigram_counts_nowb.txt data/local/dict
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Take the pronunciation counts and create a modified dictionary directory with pronunciation probabilities.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;utils/prepare_lang.sh data/local/dict &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;s2&quot;&gt;&quot;&amp;lt;UNK&amp;gt;&quot;&lt;/span&gt; data/local/lang_tmp data/lang

&lt;span class=&quot;nb&quot;&gt;local&lt;/span&gt;/format_lms.sh &lt;span class=&quot;nt&quot;&gt;--src-dir&lt;/span&gt; data/lang data/local/lm

utils/build_const_arpa_lm.sh &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  data/local/lm/lm_tglarge.arpa.gz data/lang data/lang_test_tglarge
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then we build a new ConstArpa LM with the new dictionary.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;steps/align_fmllr.sh &lt;span class=&quot;nt&quot;&gt;--nj&lt;/span&gt; 5 &lt;span class=&quot;nt&quot;&gt;--cmd&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$train_cmd&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  data/train_clean_5 data/lang exp/tri3b exp/tri3b_ali_train_clean_5
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Compute the training alignments using the SAT model and new &lt;em&gt;L.fst&lt;/em&gt;.&lt;/p&gt;

&lt;h3 id=&quot;stage-8-generating-graphs-and-decoding&quot;&gt;Stage 8: generating graphs and decoding&lt;/h3&gt;
&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;utils/mkgraph.sh data/lang_test_tgsmall &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  exp/tri3b exp/tri3b/graph_tgsmall
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Create the final graph (HCLG.fst model) with the small trigram LM.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;steps/decode_fmllr.sh &lt;span class=&quot;nt&quot;&gt;--nj&lt;/span&gt; 10 &lt;span class=&quot;nt&quot;&gt;--cmd&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$decode_cmd&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  exp/tri3b/graph_tgsmall data/&lt;span class=&quot;nv&quot;&gt;$test&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  exp/tri3b/decode_tgsmall_&lt;span class=&quot;nv&quot;&gt;$test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Decode &lt;em&gt;test&lt;/em&gt; set using the SAT model and the small trigram LM, WERs can be found at &lt;em&gt;exp/tri3b/decode_tgsmall_dev_clean_2&lt;/em&gt;.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;steps/lmrescore.sh &lt;span class=&quot;nt&quot;&gt;--cmd&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$decode_cmd&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; data/lang_test_&lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;tgsmall,tgmed&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  data/&lt;span class=&quot;nv&quot;&gt;$test&lt;/span&gt; exp/tri3b/decode_&lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;tgsmall,tgmed&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;_&lt;span class=&quot;nv&quot;&gt;$test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Re-score decoded lattice ( &lt;em&gt;exp/tri3b/decode_tgsmall_dev_clean_2&lt;/em&gt; ) with medium trigram LM, lattices and WERs after re-scoring can be found at &lt;em&gt;exp/tri3b/decode_tgmed_dev_clean_2&lt;/em&gt;.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;steps/lmrescore_const_arpa.sh &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--cmd&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$decode_cmd&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; data/lang_test_&lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;tgsmall,tglarge&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  data/&lt;span class=&quot;nv&quot;&gt;$test&lt;/span&gt; exp/tri3b/decode_&lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;tgsmall,tglarge&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;_&lt;span class=&quot;nv&quot;&gt;$test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Re-score decoded lattice ( &lt;em&gt;exp/tri3b/decode_tgmed_dev_clean_2&lt;/em&gt; ) with large ConstArpa LM, lattices and WERs after re-scoring can be found at &lt;em&gt;exp/tri3b/decode_tglarge_dev_clean_2&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;You can see the WER improvements from &lt;em&gt;exp/mono/decode_nosp_tgsmall_dev_clean_2&lt;/em&gt; to &lt;em&gt;exp/tri3b/decode_tglarge_dev_clean_2&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&quot;stage-9-dnn-training&quot;&gt;Stage 9: DNN training&lt;/h3&gt;
&lt;p&gt;I’ll leave this to another note.&lt;/p&gt;

&lt;p&gt;Thank you for reading through :)!&lt;/p&gt;</content><author><name></name></author><category term="kaldi" /><category term="asr" /><summary type="html">This note provides a high-level understanding of how kaldi recipe scripts work, with the hope that people with little experience in shell scripts (like me) can save some time learning kaldi.</summary></entry></feed>