/// <summary> ///dcthashpairに追加する必要があるハッシュを取得するやつ ///これが始まった後に追加されたハッシュは無視されるが ///次回の実行で拾われるから問題ない /// </summary> ///<param name="SaveTime">保存するファイル名に付けるUNIX時刻</param> ///<param name="BeginTime">downloaded_atがこれ以降のハッシュを取得する</param> public async Task <HashSet <long> > NewerMediaHash(long SaveTime, long BeginTime) { string FilePath = HashFile.NewerHashFilePathBase(SaveTime.ToString()); try { var ret = new HashSet <long>(); const int QueryRangeSeconds = 600; var LoadHashBlock = new ActionBlock <long>(async(i) => { var Table = new List <long>(); while (true) { using (MySqlCommand cmd = new MySqlCommand(@"SELECT dcthash FROM media_downloaded_at NATURAL JOIN media WHERE downloaded_at BETWEEN @begin AND @end;")) { cmd.Parameters.Add("@begin", MySqlDbType.Int64).Value = BeginTime + QueryRangeSeconds * i; cmd.Parameters.Add("@end", MySqlDbType.Int64).Value = BeginTime + QueryRangeSeconds * (i + 1) - 1; if (await ExecuteReader(cmd, (r) => Table.Add(r.GetInt64(0)), IsolationLevel.ReadUncommitted).ConfigureAwait(false)) { break; } else { Table.Clear(); } } } lock (ret) { foreach (long h in Table) { ret.Add(h); } } }, new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = Environment.ProcessorCount }); for (long i = 0; i < Math.Max(0, DateTimeOffset.UtcNow.ToUnixTimeSeconds() - BeginTime) / QueryRangeSeconds + 1; i++) { LoadHashBlock.Post(i); } LoadHashBlock.Complete(); await LoadHashBlock.Completion.ConfigureAwait(false); using (var writer = new UnbufferedLongWriter(HashFile.TempFilePath(FilePath))) { writer.WriteDestructive(ret.ToArray(), ret.Count); } File.Move(HashFile.TempFilePath(FilePath), FilePath); return(ret); }catch (Exception e) { Console.WriteLine(e); return(null); } }
///<summary>DBから読み込んだハッシュをそのままファイルに書き出す</summary> ///<param name="SaveTime">保存するファイル名に付けるUNIX時刻</param> public async Task <long> AllMediaHash(long SaveTime) { try { long TotalHashCount = 0; string HashFilePath = HashFile.AllHashFilePathBase(SaveTime.ToString()); using (var writer = new BufferedLongWriter(HashFile.TempFilePath(HashFilePath))) { var LoadHashBlock = new TransformBlock <long, AddOnlyList <long> >(async(i) => { var table = new AddOnlyList <long>(TableListSize); while (true) { using var cmd = new MySqlCommand(@"SELECT DISTINCT dcthash FROM media WHERE dcthash BETWEEN @begin AND @end GROUP BY dcthash;"); cmd.Parameters.Add("@begin", MySqlDbType.Int64).Value = i << HashUnitBits; cmd.Parameters.Add("@end", MySqlDbType.Int64).Value = ((i + 1) << HashUnitBits) - 1; if (await ExecuteReader(cmd, (r) => table.Add(r.GetInt64(0)), IsolationLevel.ReadUncommitted).ConfigureAwait(false)) { break; } else { table.Clear(); } } return(table); }, new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = Environment.ProcessorCount, BoundedCapacity = Environment.ProcessorCount << 1, SingleProducerConstrained = true }); var WriterBlock = new ActionBlock <AddOnlyList <long> >(async(table) => { await writer.Write(table.InnerArray, table.Count).ConfigureAwait(false); TotalHashCount += table.Count; table.Dispose(); }, new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = 1 }); LoadHashBlock.LinkTo(WriterBlock, new DataflowLinkOptions() { PropagateCompletion = true }); for (int i = 0; i < 1 << (64 - HashUnitBits); i++) { await LoadHashBlock.SendAsync(i).ConfigureAwait(false); } LoadHashBlock.Complete(); await WriterBlock.Completion.ConfigureAwait(false); } File.Move(HashFile.TempFilePath(HashFilePath), HashFilePath); return(TotalHashCount); } catch (Exception e) { Console.WriteLine(e); return(-1); } }
public DBHandler(HashFile hashfile) : base(config.database.Address, config.database.Protocol, 20, (uint)Math.Min(Environment.ProcessorCount, 40), 86400) { HashUnitBits = Math.Min(48, 64 + 11 - (int)Math.Log(Math.Max(1, hashfile.LastHashCount), 2)); TableListSize = (int)Math.Max(4096, hashfile.LastHashCount >> (63 - HashUnitBits) << 2); }
static async Task Main(string[] args) { //Console.WriteLine(System.Reflection.Assembly.GetEntryAssembly().Location); //var aaa = new MergedEnumerator(0, 2, 0x0000FFFF00000000); //aaa.Enumerator.Read(); var config = Config.Instance; var hashfile = new HashFile(); var db = new DBHandler(hashfile); var sw = Stopwatch.StartNew(); HashSet <long> NewHash = null; long NewLastUpdate = DateTimeOffset.UtcNow.ToUnixTimeSeconds(); long MinDownloadedAt = await db.Min_downloaded_at().ConfigureAwait(false); Directory.CreateDirectory(config.hash.TempDir); //前回正常に終了せず残ったファイルを消す hashfile.DeleteNewerHash(true); hashfile.DeleteAllHash(true); foreach (var filePath in Directory.EnumerateFiles(config.hash.TempDir, Path.GetFileName(SplitQuickSort.SortingFilePath("*"))).ToArray()) { File.Delete(filePath); } if (MinDownloadedAt < hashfile.LastUpdate) { //前回の更新以降のハッシュを読む(つもり) Console.WriteLine("Loading New hash."); //とりあえず60秒前のハッシュから取得する NewHash = await db.NewerMediaHash(NewLastUpdate, hashfile.LastUpdate - 60).ConfigureAwait(false); if (NewHash == null) { Console.WriteLine("New hash load failed."); Environment.Exit(1); } Console.WriteLine("{0} New hash", NewHash.Count); } else { //前回のハッシュ追加から時間が経ち過ぎたりしたらハッシュ取得を全部やり直す hashfile.DeleteAllHash(); hashfile.DeleteNewerHash(); } //全ハッシュの取得はファイルが残っていなければやる if (HashFile.AllHashFilePath == null) { Console.WriteLine("Loading All hash."); //NewHashの中身はAllHashにも含まれることになるので消してしまう hashfile.DeleteNewerHash(); long Count = await db.AllMediaHash(NewLastUpdate).ConfigureAwait(false); if (Count < 0) { Console.WriteLine("Hash load failed."); Environment.Exit(1); } Console.WriteLine("{0} Hash loaded.", Count); hashfile.LastHashCount = Count; } sw.Stop(); Console.WriteLine("Hash Load: {0}ms", sw.ElapsedMilliseconds); sw.Restart(); MediaHashSorter media = new MediaHashSorter(NewHash, db, config.hash.MaxHammingDistance, //MergeSorterBaseの仕様上SortMaskで最上位bitだけ0にされるとまずいので制限 Math.Min(config.hash.ExtraBlocks, 32 - config.hash.MaxHammingDistance)); await media.Proceed().ConfigureAwait(false); sw.Stop(); Console.WriteLine("Multiple Sort, Store: {0}ms", sw.ElapsedMilliseconds); hashfile.LastUpdate = NewLastUpdate; }