private async void AggregateFeed(List <CrawlerSet> sets, CancellationToken feedCtsToken, bool force) { try { var volatileParameters = new VolatileParametersBase { Page = 1, UseCache = !force }; var tasks = new List <Task>(); foreach (var crawlingSet in sets) { tasks.Add(CrawlDescriptor( crawlingSet.Descriptors, crawlingSet, volatileParameters, feedCtsToken)); } await Task.WhenAll(tasks); await Task.WhenAll(_branchedTasks); _branchedTasks.Clear(); } finally { _isAggregating = false; } Finished?.Invoke(this, EventArgs.Empty); }
private async Task CrawlDescriptor( List <CrawlerDescriptor> descriptors, CrawlerSet crawlingSet, VolatileParametersBase volatileParameters, CancellationToken feedCtsToken) { foreach (var descriptor in descriptors) { var semaphore = _domainSemaphores[descriptor.CrawlerDomain]; try { await semaphore.WaitAsync(feedCtsToken); var crawler = _crawlerManager.GetCrawler(descriptor.CrawlerDomain); var result = await Task.Run(async() => await crawler.Crawl( new CrawlerParameters( descriptor.CrawlerSourceParameters, volatileParameters), feedCtsToken), feedCtsToken); if (result.Success) { NewCrawlerBatch?.Invoke(this, new FeedBatch { CrawlerResult = result, SetOfOrigin = crawlingSet }); } if (result.HasMore) { var task = CrawlDescriptor(new List <CrawlerDescriptor> { descriptor }, crawlingSet, new VolatileParametersBase { Page = volatileParameters.Page + 1, UseCache = volatileParameters.UseCache }, feedCtsToken); _branchedTasks.Add(task); #pragma warning disable 4014 Task.Run(async() => await task, feedCtsToken); #pragma warning restore 4014 } } catch (TaskCanceledException) { _logger.LogInformation("Cancelled descriptor crawling."); } finally { semaphore.Release(); } } }