CoNLL. You can find more info here.
#Projects
##Corpus Builder
####References
This project uses SCICT.PersianTools which is built by SCICT and is opensource under GPL.
You can find the source code here.
####What it does
We use the Program.cs file to RefineAllFiles that are inside the directory specified.
Then we will separate all sentences and words and create new files in new directories. You can overwrite your current fies if you call it like SeparateAllSentencesAndWords(dir,dir);
.
####Example Output
Check here.
##Tagger
####References
This project uses lots of SCICT.NLP DLLs which is built by SCICT and is opensource under GPL.
You can find the source code here. It also uses YAXLib.
Tagger.cs and Token.cs is written by Mohammad Hedayati.
####What it does
Output of the Corpus Builder poject is a set of words in each line where each sentence ends with an empty line at the end.
Each word then is tagged with the tagger and its Lemma POStag person and number is extracted. these information is then placed at the same line of the word with tab separated.
####Example Output
Check here.
This repository has been archived by the owner on May 24, 2019. It is now read-only.
yassersouri/Corpus-Builder
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
builds a persian corpus from text files generate in Crawler project. Format: CoNLL
Resources
Stars
Watchers
Forks
Packages 0
No packages published