public static void RegularExpress() { var d0 = "宏润建设集团股份有限公司(以下简称“公司”)于2014年1月7日收到西安市建设工程中标通知书,“西安市地铁四号线工程(航天东路站—北客站)土建施工D4TJSG-5标”项目由公司中标承建,工程中标价49,290万元。"; var x0 = RegularTool.GetMultiValueBetweenMark(d0, "“", "”"); var d1 = DateUtility.GetDate("河北先河环保科技股份有限公司董事会二○一二年十一月三十日"); Console.WriteLine(d1); var d2 = "公司第五届董事会第七次会议审议通过了《关于公司与神华铁路货车运输有限责任公司签订企业自用货车购置供货合同的议案》,2014年1月20日,公司与神华铁路货车运输有限责任公司签署了《企业自用货车购置供货合同》。"; var x2 = RegularTool.GetValueBetweenString(d2, "与", "签订"); var s0 = "2010年12月3日,中工国际工程股份有限公司与委内瑞拉农业土地部下属的委内瑞拉农业公司签署了委内瑞拉农副产品加工设备制造厂工业园项目商务合同,与委内瑞拉农签署了委内瑞拉奥里合同。"; var x = RegularTool.GetMultiValueBetweenString(s0, "与", "签署"); var s1 = "收到贵州高速公路开发总公司发出的通知"; var s2 = "接到贵州高速公路开发总公司发出的通知"; var s3 = "收到贵州高速公路开发总公司发出的告知"; var s4 = "接到贵州高速公路开发总公司发出的告知"; Regex rg = new Regex("(?<=(" + "收到|接到" + "))[.\\s\\S]*?(?=(" + "通知|告知" + "))", RegexOptions.Multiline | RegexOptions.Singleline); Console.WriteLine(rg.Match(s1).Value); Console.WriteLine(rg.Match(s2).Value); Console.WriteLine(rg.Match(s3).Value); Console.WriteLine(rg.Match(s4).Value); }
//符号包裹 void ExtractByMarkFeature(MyRootHtmlNode root) { foreach (var word in MarkFeature) { Func <String, List <String> > ExtractMethod = (x) => { var strlist = new List <String>(); foreach (var strContent in RegularTool.GetMultiValueBetweenMark(x, word.MarkStartWith, word.MarkEndWith)) { if (word.InnerStartWith != null) { if (!strContent.StartsWith(word.InnerStartWith)) { continue; } } if (word.InnerEndWith != null) { if (!strContent.EndsWith(word.InnerEndWith)) { continue; } } strlist.Add(strContent); } return(strlist); }; SearchNormalContent(root, ExtractMethod); } }
public struWordSRL(string element) { var x = RegularTool.GetMultiValueBetweenMark(element, "\"", "\""); if (x.Count != 6) { Console.WriteLine(element); id = int.Parse(x[0]); cont = String.Empty; //" pos = x[1]; ne = x[2]; parent = x[3]; relate = x[4]; } else { id = int.Parse(x[0]); cont = x[1]; pos = x[2]; ne = x[3]; parent = x[4]; relate = x[5]; } args = new List <struWordSRLARG>(); }
public struWordNER(string element) { var x = RegularTool.GetMultiValueBetweenMark(element, "\"", "\""); if (x.Count != 4) { if (x.Count == 3) { //Console.WriteLine(element); id = int.Parse(x[0]); cont = "\""; //" pos = x[1]; ne = x[2]; } else { id = int.Parse(x[0]); cont = ""; pos = ""; ne = ""; } } else { id = int.Parse(x[0]); cont = x[1]; pos = x[2]; ne = x[3]; } }
public struWordSRLARG(string element) { var x = RegularTool.GetMultiValueBetweenMark(element, "\"", "\""); if (x.Count == 3) { id = int.Parse(x[0]); type = ""; Begin = int.Parse(x[1]); End = int.Parse(x[2]); } else { id = int.Parse(x[0]); type = x[1]; Begin = int.Parse(x[2]); End = int.Parse(x[3]); } cont = string.Empty; }
public struWordDP(string element) { var x = RegularTool.GetMultiValueBetweenMark(element, "\"", "\""); if (x.Count != 5) { //Console.WriteLine(element); id = int.Parse(x[0]); cont = "\""; //" pos = x[1]; parent = int.Parse(x[2]); relate = x[3]; } else { id = int.Parse(x[0]); cont = x[1]; pos = x[2]; parent = int.Parse(x[3]); relate = x[4]; } }