Skip to content

Seq2SeqSharp is an C# encoder-decoder deep neural network framework for sequence to sequence tasks. The encoder is bidirectional LSTM neural network, and the decoder is LSTM-Attention neural network. It's based on Tensor operations, supports automatic differentiation feature, both dense and sparse feature types, and could run on both CPU and GPU…

License

theolivenbaum/Seq2SeqSharp

 
 

Repository files navigation

Seq2SeqSharp

Seq2SeqSharp is an C# encoder-decoder deep neural network framework running on both CPU and GPU

Features

Pure C# framework (except kernel C code in CUDA)
Deep bi-directional LSTM encoder
Deep attention based LSTM decoder
Graph based neural network
Automatic differentiation
Tensor based operations
Running on both CPU (Intel MKL lib) and GPU (CUDA)
Support multi-GPUs
Mini-batch
Dropout
RMSProp optmization
Pre-trained model
Auto data shuffling
Auto vocabulary building
Beam search decoder

Usage

You could use Seq2SeqConsole tool to train and test models.

Here is the command line to train a model: Seq2SeqConsole.exe -TaskName train [parameters...]
Parameters:
-WordVectorSize: The vector size of encoded source word.
-HiddenSize: The hidden layer size of encoder and decoder.
-LearningRate: Learning rate. Default is 0.001
-Depth: The network depth in decoder. Default is 1
-ModelFilePath: The trained model file path.
-SrcVocab: The vocabulary file path for source side.
-TgtVocab: The vocabulary file path for target side.
-SrcEmbedding: The external embedding model file path for source side. It is built by Txt2Vec project.
-TgtEmbedding: The external embedding model file path for target side. It is built by Txt2Vec project.
-SrcLang: Source language name.
-TgtLang: Target language name.
-TrainCorpusPath: training corpus folder path
-BatchSize: Mini-batch size. Default is 1. For CPU runner, it must be 1.
-DropoutRatio: Dropout ratio. Defaul is 0.1
-ArchType: Runner type. 0 - GPU (CUDA), 1 - CPU (Intel MKL), 2 - CPU. Default is 0
-DeviceIds: Device ids for training in GPU mode. Default is 0. For multi devices, ids are split by comma, for example: 0,1,2
-MaxEpochNum: Maxmium epoch number during training. Default is 100
Note that:

  1. if "-SrcVocab" and "-TgtVocab" are empty, vocabulary will be built from training corpus.
  2. Txt2Vec for external embedding model building can get downloaded from https://github.com/zhongkaifu/Txt2Vec

Example: Seq2SeqConsole.exe -TaskName train -WordVectorSize 1024 -HiddenSize 1024 -LearningRate 0.001 -Depth 2 -TrainCorpusPath .\corpus -ModelFilePath nmt.model -SrcLang enu -TgtLang chs -ArchType 0 -DeviceIds 0,1,2,3

During training, the iteration information will be printed out and logged as follows:
info,3/13/2019 5:40:22 AM Epoch = '0' LR = '0.002', Current Cost = '3.213204', Avg Cost = '4.764458', SentInTotal = '17612800', SentPerMin = '44415.11', WordPerSec = '37689.26'
info,3/13/2019 5:49:16 AM Epoch = '0' LR = '0.002', Current Cost = '3.172645', Avg Cost = '4.731404', SentInTotal = '18022400', SentPerMin = '44451.65', WordPerSec = '37674.58'

Here is the command line to test models
Seq2SeqConsole.exe -TaskName test [parameters...]
Parameters:
-InputTestFile: The input file for test.
-OutputTestFile: The test result file.
-ModelFilePath: The trained model file path. -ArchType: Runner type. 0 - GPU (CUDA), 1 - CPU, 2 - CPU (Intel MKL). Default is 0
-DeviceIds: Device ids for training in GPU mode. Default is 0. For multi devices, ids are split by comma, for example: 0,1,2
-BeamSearch: Beam search size. Default is 1

Example: Seq2SeqConsole.exe -TaskName test -ModelFilePath seq2seq_256.model -InputTestFile test.txt -OutputTestFile result.txt -ArchType 1 -BeamSearch 5

Data Format

The corpus contains each sentence per line. The file name pattern is "mainfilename.{source language name}.snt" and "mainfilename.{target language name}.snt".
For example: Let's use three letters name CHS for Chinese and ENU for English in Chinese-English parallel corpus, so we could have these corpus files: train01.enu.snt, train01.chs.snt, train02.enu.snt and train02.chs.snt.
In train01.enu.snt, assume we have below two sentences:
the children huddled together for warmth .
the car business is constantly changing .
So, train01.chs.snt has the corresponding translated sentences:
孩子 们 挤 成 一 团 以 取暖 .
汽车 业 也 在 不断 地 变化 .

Build Your Neural Networks

Benefit from automatic differentiation, tensor based compute graph and other features, you can easily build your neural network by a few of code. The only thing you need to implment is forward part, and the framework will automatically build the corresponding backward part for you, and make the network could run on multi-GPUs or CPUs.
Here is an example for attentioned based LSTM.

        /// <summary>
        /// Update LSTM-Attention cells according to given weights
        /// </summary>
        /// <param name="context">The context weights for attention</param>
        /// <param name="input">The input weights</param>
        /// <param name="computeGraph">The compute graph to build workflow</param>
        /// <returns>Update hidden weights</returns>
        public IWeightMatrix Step(IWeightMatrix context, IWeightMatrix input, IComputeGraph computeGraph)
        {
            var cell_prev = ct;
            var hidden_prev = ht;

            var hxhc = computeGraph.ConcatColumns(input, hidden_prev, context);
            var bs = computeGraph.RepeatRows(b, input.Rows);
            var hhSum = computeGraph.MulAdd(hxhc, Wxhc, bs);
            var hhSum2 = layerNorm1.Process(hhSum, computeGraph);

            (var gates_raw, var cell_write_raw) = computeGraph.SplitColumns(hhSum2, hdim * 3, hdim);
            var gates = computeGraph.Sigmoid(gates_raw);
            var cell_write = computeGraph.Tanh(cell_write_raw);

            (var input_gate, var forget_gate, var output_gate) = computeGraph.SplitColumns(gates, hdim, hdim, hdim);

            // compute new cell activation: ct = forget_gate * cell_prev + input_gate * cell_write
            ct = computeGraph.EltMulMulAdd(forget_gate, cell_prev, input_gate, cell_write);
            var ct2 = layerNorm2.Process(ct, computeGraph);

            ht = computeGraph.EltMul(output_gate, computeGraph.Tanh(ct2));

            return ht;
        }

Todo List

If you are interested in below items, please let me know. Becuase African proverb says "If you want to go fast, go alone. If you want to go far, go together" :)
Transformer Components
Support Tensor Cores in CUDA
Support Half-Float Type (FP16)
And More...

About

Seq2SeqSharp is an C# encoder-decoder deep neural network framework for sequence to sequence tasks. The encoder is bidirectional LSTM neural network, and the decoder is LSTM-Attention neural network. It's based on Tensor operations, supports automatic differentiation feature, both dense and sparse feature types, and could run on both CPU and GPU…

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 100.0%