Seq2SeqSharp

Seq2SeqSharp is an C# encoder-decoder deep neural network framework running on both CPU and GPU

Features

Pure C# framework (except kernel C code in CUDA)
Deep bi-directional LSTM encoder
Deep attention based LSTM decoder
Graph based neural network
Automatic differentiation
Tensor based operations
Running on both CPU (Intel MKL lib) and GPU (CUDA)
Support multi-GPUs
Mini-batch
Dropout
RMSProp optmization
Pre-trained model
Auto data shuffling
Auto vocabulary building
Beam search decoder

Usage

You could use Seq2SeqConsole tool to train and test models.

Here is the command line to train a model: Seq2SeqConsole.exe -TaskName train [parameters...]
Parameters:
-WordVectorSize: The vector size of encoded source word.
-HiddenSize: The hidden layer size of encoder and decoder.
-LearningRate: Learning rate. Default is 0.001
-Depth: The network depth in decoder. Default is 1
-ModelFilePath: The trained model file path.
-SrcVocab: The vocabulary file path for source side.
-TgtVocab: The vocabulary file path for target side.
-SrcEmbedding: The external embedding model file path for source side. It is built by Txt2Vec project.
-TgtEmbedding: The external embedding model file path for target side. It is built by Txt2Vec project.
-SrcLang: Source language name.
-TgtLang: Target language name.
-TrainCorpusPath: training corpus folder path
-BatchSize: Mini-batch size. Default is 1. For CPU runner, it must be 1.
-DropoutRatio: Dropout ratio. Defaul is 0.1
-ArchType: Runner type. 0 - GPU (CUDA), 1 - CPU (Intel MKL), 2 - CPU. Default is 0
-DeviceIds: Device ids for training in GPU mode. Default is 0. For multi devices, ids are split by comma, for example: 0,1,2
-MaxEpochNum: Maxmium epoch number during training. Default is 100
Note that:

if "-SrcVocab" and "-TgtVocab" are empty, vocabulary will be built from training corpus.
Txt2Vec for external embedding model building can get downloaded from https://github.com/zhongkaifu/Txt2Vec

Example: Seq2SeqConsole.exe -TaskName train -WordVectorSize 1024 -HiddenSize 1024 -LearningRate 0.001 -Depth 2 -TrainCorpusPath .\corpus -ModelFilePath nmt.model -SrcLang enu -TgtLang chs -ArchType 0 -DeviceIds 0,1,2,3

During training, the iteration information will be printed out and logged as follows:
info,3/13/2019 5:40:22 AM Epoch = '0' LR = '0.002', Current Cost = '3.213204', Avg Cost = '4.764458', SentInTotal = '17612800', SentPerMin = '44415.11', WordPerSec = '37689.26'
info,3/13/2019 5:49:16 AM Epoch = '0' LR = '0.002', Current Cost = '3.172645', Avg Cost = '4.731404', SentInTotal = '18022400', SentPerMin = '44451.65', WordPerSec = '37674.58'

Here is the command line to test models
Seq2SeqConsole.exe -TaskName test [parameters...]
Parameters:
-InputTestFile: The input file for test.
-OutputTestFile: The test result file.
-ModelFilePath: The trained model file path. -ArchType: Runner type. 0 - GPU (CUDA), 1 - CPU, 2 - CPU (Intel MKL). Default is 0
-DeviceIds: Device ids for training in GPU mode. Default is 0. For multi devices, ids are split by comma, for example: 0,1,2
-BeamSearch: Beam search size. Default is 1

Example: Seq2SeqConsole.exe -TaskName test -ModelFilePath seq2seq_256.model -InputTestFile test.txt -OutputTestFile result.txt -ArchType 1 -BeamSearch 5

Data Format

The corpus contains each sentence per line. The file name pattern is "mainfilename.{source language name}.snt" and "mainfilename.{target language name}.snt".
For example: Let's use three letters name CHS for Chinese and ENU for English in Chinese-English parallel corpus, so we could have these corpus files: train01.enu.snt, train01.chs.snt, train02.enu.snt and train02.chs.snt.
In train01.enu.snt, assume we have below two sentences:
the children huddled together for warmth .
the car business is constantly changing .
So, train01.chs.snt has the corresponding translated sentences:
孩子们挤成一团以取暖 .
汽车业也在不断地变化 .

Build Your Neural Networks

Benefit from automatic differentiation, tensor based compute graph and other features, you can easily build your neural network by a few of code. The only thing you need to implment is forward part, and the framework will automatically build the corresponding backward part for you, and make the network could run on multi-GPUs or CPUs.
Here is an example for attentioned based LSTM.

        /// <summary>
        /// Update LSTM-Attention cells according to given weights
        /// </summary>
        /// <param name="context">The context weights for attention</param>
        /// <param name="input">The input weights</param>
        /// <param name="computeGraph">The compute graph to build workflow</param>
        /// <returns>Update hidden weights</returns>
        public IWeightMatrix Step(IWeightMatrix context, IWeightMatrix input, IComputeGraph computeGraph)
        {
            var cell_prev = ct;
            var hidden_prev = ht;

            var hxhc = computeGraph.ConcatColumns(input, hidden_prev, context);
            var bs = computeGraph.RepeatRows(b, input.Rows);
            var hhSum = computeGraph.MulAdd(hxhc, Wxhc, bs);
            var hhSum2 = layerNorm1.Process(hhSum, computeGraph);

            (var gates_raw, var cell_write_raw) = computeGraph.SplitColumns(hhSum2, hdim * 3, hdim);
            var gates = computeGraph.Sigmoid(gates_raw);
            var cell_write = computeGraph.Tanh(cell_write_raw);

            (var input_gate, var forget_gate, var output_gate) = computeGraph.SplitColumns(gates, hdim, hdim, hdim);

            // compute new cell activation: ct = forget_gate * cell_prev + input_gate * cell_write
            ct = computeGraph.EltMulMulAdd(forget_gate, cell_prev, input_gate, cell_write);
            var ct2 = layerNorm2.Process(ct, computeGraph);

            ht = computeGraph.EltMul(output_gate, computeGraph.Tanh(ct2));

            return ht;
        }

Todo List

If you are interested in below items, please let me know. Becuase African proverb says "If you want to go fast, go alone. If you want to go far, go together" :)
Transformer Components
Support Tensor Cores in CUDA
Support Half-Float Type (FP16)
And More...

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
DataProcess/FilterLowFreqTerm		DataProcess/FilterLowFreqTerm
Seq2SeqConsole		Seq2SeqConsole
Seq2SeqSharp		Seq2SeqSharp
TensorSharp.CUDA		TensorSharp.CUDA
TensorSharp		TensorSharp
dll		dll
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Seq2SeqSharp.sln		Seq2SeqSharp.sln
Seq2SeqSharp_Snapshot.JPG		Seq2SeqSharp_Snapshot.JPG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataProcess/FilterLowFreqTerm

DataProcess/FilterLowFreqTerm

Seq2SeqConsole

Seq2SeqConsole

Seq2SeqSharp

Seq2SeqSharp

TensorSharp.CUDA

TensorSharp.CUDA

TensorSharp

TensorSharp

dll

dll

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Seq2SeqSharp.sln

Seq2SeqSharp.sln

Seq2SeqSharp_Snapshot.JPG

Seq2SeqSharp_Snapshot.JPG

Repository files navigation

Seq2SeqSharp

Features

Usage

Data Format

Build Your Neural Networks

Todo List

About

Releases

Packages

Languages

License

theolivenbaum/Seq2SeqSharp

Folders and files

Latest commit

History

Repository files navigation

Seq2SeqSharp

Features

Usage

Data Format

Build Your Neural Networks

Todo List

About

Resources

License

Stars

Watchers

Forks

Languages