public TypedKeyIndex(InterGraphDataSink <TRecord> relation, Controller controller, Func <TRecord, TKey> keySelector, Func <TRecord, TValue> valueSelector) { using (var compuation = controller.NewComputation()) { var stream = compuation.NewInput(relation.NewDataSource()) .Select(x => keySelector(x).PairWith(valueSelector(x))) .NewUnaryStage((i, s) => new IndexBuilder(i, s), x => x.First.GetHashCode(), null, "IndexBuilder"); result = new InterGraphDataSink <Fragment>(stream); compuation.Activate(); compuation.Join(); } }
public static InterGraphDataSink <TOutput> NewInterGraphStream <TRecord, TOutput>(this Controller controller, IEnumerable <TRecord> source, Func <Stream <TRecord, Epoch>, Stream <TOutput, Epoch> > transformation) { InterGraphDataSink <TOutput> result; using (var computation = controller.NewComputation()) { result = new InterGraphDataSink <TOutput>(transformation(source.AsNaiadStream(computation))); computation.Activate(); computation.Join(); } return(result); }
public DenseIntKeyIndex(string format, Controller controller, int parts) { using (var computation = controller.NewComputation()) { var stream = Enumerable.Range(0, parts) .AsNaiadStream(computation) .PartitionBy(x => x) .Select(x => string.Format(format, x)) .NewUnaryStage((i, s) => new FragmentLoader(i, s, parts), null, null, "IndexLoader"); result = new InterGraphDataSink <Fragment>(stream); computation.Activate(); computation.Join(); } }
public static InterGraphDataSink <TOutput> NewNaiadAzureInterGraphStream <TOutput>(this Controller controller, string containerName, string prefix) { InterGraphDataSink <TOutput> result; using (var computation = controller.NewComputation()) { var graphContainer = computation.DefaultBlobContainer(containerName); var data = computation.ReadBinaryFromAzureBlobs <TOutput>(graphContainer, prefix); result = new InterGraphDataSink <TOutput>(data); computation.Activate(); computation.Join(); } return(result); }
public static InterGraphDataSink <TOutput> NewBinaryAzureInterGraphStream <TInput, TOutput>(this Controller controller, string containerName, string prefix, Func <System.IO.Stream, IEnumerable <TInput> > transformation, Func <Stream <TInput, Epoch>, Stream <TOutput, Epoch> > trans) { InterGraphDataSink <TOutput> result; using (var computation = controller.NewComputation()) { var graphContainer = computation.DefaultBlobContainer(containerName); var data = computation.ReadFromAzureBlobs(graphContainer, prefix, transformation); result = new InterGraphDataSink <TOutput>(trans(data)); computation.Activate(); computation.Join(); } return(result); }
public static InterGraphDataSink <TOutput> NewAzureInterGraphStream <TOutput>(this Controller controller, string containerName, string prefix, Func <Stream <string, Epoch>, Stream <TOutput, Epoch> > transformation) { InterGraphDataSink <TOutput> result; using (var computation = controller.NewComputation()) { controller.SetConsoleOut(computation.DefaultBlobContainer("naiad-output"), "output-{0}.txt"); controller.SetConsoleError(computation.DefaultBlobContainer("naiad-output"), "error-{0}.txt"); var source = computation.ReadTextFromAzureBlobs(computation.DefaultBlobContainer(containerName), prefix); result = new InterGraphDataSink <TOutput>(transformation(source)); computation.Activate(); computation.Join(); } return(result); }
public EmptyKeyIndex(InterGraphDataSink <TRecord> relation, Controller controller, Func <TRecord, TValue> valueSelector) { using (var compuation = controller.NewComputation()) { var stream = compuation.NewInput(relation.NewDataSource()) .Select(valueSelector) .NewUnaryStage((i, s) => new HashSetBuilder(i, s), x => x.GetHashCode(), null, "HashSetBuilder"); IndexFragmentStream = new InterGraphDataSink <HashSet <TValue> >(stream); var localcount = 0L; stream.Select(x => x.Count) .Aggregate(x => true, x => (long)x, (x, y) => x + y, (k, c) => c) .SelectMany(c => stream.ForStage.Placement.Select(p => p.VertexId.PairWith(c))) .PartitionBy(x => x.First) .Subscribe((a, b, c) => { localcount = c.Single().Second; }); compuation.Activate(); compuation.Join(); count = localcount; } }
/// <summary> /// Builds an index with an arbitrary type of key /// </summary> /// <typeparam name="TKey">The type of the keys</typeparam> /// <typeparam name="TValue">The type of the values</typeparam> /// <typeparam name="TRecord">The type of the records</typeparam> /// <param name="source">The source of records</param> /// <param name="controller">The controller</param> /// <param name="keySelector">A function from record to key</param> /// <param name="valueSelector">A function from record to value</param> /// <returns>An index of values</returns> public static TypedKeyIndex <TKey, TValue, TRecord> ToTypedIndex <TKey, TValue, TRecord>(this InterGraphDataSink <TRecord> source, Controller controller, Func <TRecord, TKey> keySelector, Func <TRecord, TValue> valueSelector) where TValue : IComparable <TValue> { return(new TypedKeyIndex <TKey, TValue, TRecord>(source, controller, keySelector, valueSelector)); }
/// <summary> /// Creates an index with a dense integer key. /// </summary> /// <typeparam name="TValue">The type of the values</typeparam> /// <typeparam name="TRecord">The type of the source records</typeparam> /// <param name="source">The source of records</param> /// <param name="controller">The controller</param> /// <param name="keySelector">A function from record to dense integer key</param> /// <param name="valueSelector">A function from record to value</param> /// <returns>An index of values keyed by a dense integer</returns> public static DenseKeyIndex <TValue, TRecord> ToDenseIndex <TValue, TRecord>(this InterGraphDataSink <TRecord> source, Controller controller, Func <TRecord, int> keySelector, Func <TRecord, TValue> valueSelector) where TValue : IComparable <TValue> { return(new DenseKeyIndex <TValue, TRecord>(source, controller, keySelector, valueSelector)); }
public Extender(InterGraphDataSink <Fragment> index, Func <TPrefix, int> tKey) { this.Index = index; this.keySelector = tKey; }
/// <summary> /// Creates an index with a dense integer key. /// </summary> /// <typeparam name="TValue">The type of the values</typeparam> /// <typeparam name="TRecord">The type of the source records</typeparam> /// <param name="source">The source of records</param> /// <param name="controller">The controller</param> /// <param name="keySelector">A function from record to dense integer key</param> /// <param name="valueSelector">A function from record to value</param> /// <returns>An index of values keyed by a dense integer</returns> public static DenseIntKeyIndex <TRecord> ToDenseKeyIndex <TRecord>(this InterGraphDataSink <TRecord> source, Controller controller, Func <TRecord, int> keySelector, Func <TRecord, int> valueSelector) { return(new DenseIntKeyIndex <TRecord>(source, controller, keySelector, valueSelector)); }
static void ExecuteNaiad(string[] args, string dataDir, string uriBase) { string ukFile = Path.Combine(dataDir, @"uk-2007-05"); string twitterFile = Path.Combine(dataDir, @"twitter_rv.bin"); string livejournalFile = Path.Combine(dataDir, @"livejournal.bin"); var configuration = Configuration.FromArgs(ref args); var algorithm = args[1]; var dataset = args[2]; #region file partitioning if (algorithm == "partition" && dataset == "twitter") { var stopwatch = System.Diagnostics.Stopwatch.StartNew(); using (var computation = NewComputation.FromConfig(configuration)) { int parts = Int32.Parse(args[3]); var format = Path.Combine(dataDir, @"twitter-part-{0}-of-" + (parts * parts).ToString()); computation.LoadGraph(twitterFile) .Partition(parts, parts) .WriteBinaryToFiles(format); computation.Activate(); computation.Join(); } Console.WriteLine(stopwatch.Elapsed); } if (algorithm == "repartition" && dataset == "twitter") { var stopwatch = System.Diagnostics.Stopwatch.StartNew(); using (var computation = NewComputation.FromConfig(configuration)) { int parts = Int32.Parse(args[3]); computation.ReadHdfsBinaryCollection <Edge>(new Uri(uriBase + "twitter-10")) .Partition(parts, parts) .WriteHdfsBinary(new Uri(uriBase + "twitter-" + parts), 1024 * 1024, -1L, 100L * 1024L * 1024L * 1024L); computation.Activate(); computation.Join(); } Console.WriteLine(stopwatch.Elapsed); } if (algorithm == "compact" && dataset == "twitter") { using (var computation = NewComputation.FromConfig(configuration)) { var edges = System.IO.File.OpenRead(twitterFile) .ReadEdges() .AsNaiadStream(computation); using (var renamer = new AutoRenamer <Int32>()) { var newEdges = edges.RenameUsing(renamer, edge => edge.source) .Select(x => new Edge(x.node, x.value.target)) .RenameUsing(renamer, edge => edge.target) .Select(x => new Edge(x.value.source, x.node)); edges = newEdges.FinishRenaming(renamer); } computation.Activate(); computation.Join(); } } #endregion #region page rank if (algorithm == "pagerank" && dataset == "twitter") { var stopwatch = System.Diagnostics.Stopwatch.StartNew(); using (var computation = NewComputation.FromConfig(configuration)) { computation.OnFrontierChange += (x, y) => { Console.WriteLine(System.DateTime.Now + "\t" + string.Join(", ", y.NewFrontier)); System.GC.GetTotalMemory(true); }; var edges = System.IO.File.OpenRead(twitterFile) .ReadEdges() .AsNaiadStream(computation); edges.PageRank(20, "twitter").Subscribe(); computation.Activate(); computation.Join(); } Console.WriteLine(stopwatch.Elapsed); } if (algorithm == "pagerank" && dataset == "livejournal") { var stopwatch = System.Diagnostics.Stopwatch.StartNew(); using (var computation = NewComputation.FromConfig(configuration)) { computation.OnFrontierChange += (x, y) => { Console.WriteLine(System.DateTime.Now + "\t" + string.Join(", ", y.NewFrontier)); }; var edges = System.IO.File.OpenRead(livejournalFile) .ReadEdges() .AsNaiadStream(computation); edges.PageRank(20, "livejournal").Subscribe(); computation.Activate(); computation.Join(); } Console.WriteLine(stopwatch.Elapsed); } #endregion #region connected components if (algorithm == "connectedcomponents" && dataset == "uk-2007-05") { var stopwatch = System.Diagnostics.Stopwatch.StartNew(); using (var computation = NewComputation.FromConfig(configuration)) { var format = Path.Combine(dataDir, @"uk-2007-05-part-{0}-of-{1}"); var extraInput = new[] { string.Format(format, 3, 4) }.AsNaiadStream(computation) .PartitionBy(x => 3) .ReadGraph(); computation.LoadGraph(format, 3, 4) .UnionFind(106000000) .PartitionBy(x => 3) .Concat(extraInput) .UnionFind(106000000); computation.Activate(); computation.Join(); } Console.WriteLine(stopwatch.Elapsed); } if (algorithm == "connectedcomponents" && dataset == "twitter") { using (Microsoft.Research.Peloponnese.Hdfs.HdfsInstance hdfs = new Microsoft.Research.Peloponnese.Hdfs.HdfsInstance(new Uri(uriBase))) { // HDFS needs to be initialized from the main thread before distributed use bool exists = hdfs.IsFileExists("/dummy"); } var readWatch = System.Diagnostics.Stopwatch.StartNew(); using (var controller = NewController.FromConfig(configuration)) { using (var readComputation = controller.NewComputation()) { int parts = (args.Length > 4) ? Int32.Parse(args[4]) : 1; int machines = (args.Length > 5) ? Int32.Parse(args[5]) : 1; int another = (args.Length > 6) ? Int32.Parse(args[6]) : 1; var format = new Uri(@uriBase + "twitter-40"); var collection = readComputation .ReadHdfsBinaryCollection <Edge>(format); Stream <int[], Epoch> readStuff = null; switch (args[3]) { case "sp": readStuff = collection.GroupEdgesSingleProcess(parts, parts); break; case "pp": readStuff = collection.GroupEdgesPartsPerProcess(parts, parts, 16); break; case "op": readStuff = collection.GroupEdgesOnePerProcess(parts, parts, 16); break; case "hp": readStuff = collection.GroupEdgesHierarchyPerProcess(parts, machines, 16); break; case "hhp": readStuff = collection.GroupEdgesHierarchyPerProcess(parts, machines * another, 16); break; default: throw new ApplicationException("Grouping type must be sp, pp, op, hp or hpp"); } var sink = new InterGraphDataSink <int[]>(readStuff); readComputation.Activate(); readComputation.Join(); Console.WriteLine("Reading done: " + readWatch.Elapsed); for (int i = 0; i < 20; ++i) { var stopwatch = System.Diagnostics.Stopwatch.StartNew(); using (var computation = controller.NewComputation()) { var firstStage = computation.NewInput(sink.NewDataSource()) .ReformatInts(); if (parts * machines * another > 1) { firstStage = firstStage .UnionFindStruct(65000000, parts * machines * another, machines * another); } switch (args[3]) { case "sp": firstStage .PartitionBy(x => parts * parts) .UnionFind(65000000); break; case "pp": firstStage .PartitionBy(x => 16 * parts) .UnionFind(65000000); break; case "op": firstStage .PartitionBy(x => 16 * (parts * parts)) .UnionFind(65000000); break; case "hp": if (parts * parts < 16) { firstStage .PartitionBy(x => 16 * x.destination + (parts * parts)) .UnionFindStruct(65000000, 0, 0) .PartitionBy(x => 16 * (machines * machines)) .UnionFind(65000000); } else { firstStage .PartitionBy(x => 16 * (x.destination + (machines * machines))) .UnionFindStruct(65000000, 0, 0) .PartitionBy(x => 16 * ((machines * machines) + (machines * machines))) .UnionFind(65000000); } break; case "hhp": firstStage .PartitionBy(x => 16 * ((x.destination / (machines * machines)) + (machines * machines * another * another)) + (x.destination % (machines * machines))) .UnionFindStruct(65000000, -machines * another, another) .PartitionBy(x => 16 * (x.destination + (another * another) + (machines * machines * another * another))) .UnionFindStruct(65000000, -another, 1) .PartitionBy(x => 16 * ((another * another) + (another * another) + (machines * machines * another * another))) .UnionFind(65000000); break; default: throw new ApplicationException("Grouping type must be sp, pp, op, hp or hhp"); } computation.Activate(); computation.Join(); } Console.WriteLine(stopwatch.Elapsed); } } controller.Join(); } } if (algorithm == "hashtablecc" && dataset == "twitter") { using (Microsoft.Research.Peloponnese.Hdfs.HdfsInstance hdfs = new Microsoft.Research.Peloponnese.Hdfs.HdfsInstance(new Uri(uriBase))) { // HDFS needs to be initialized from the main thread before distributed use bool exists = hdfs.IsFileExists("/dummy"); } var readWatch = System.Diagnostics.Stopwatch.StartNew(); using (var controller = NewController.FromConfig(configuration)) { using (var readComputation = controller.NewComputation()) { int parts = (args.Length > 4) ? Int32.Parse(args[4]) : 1; int machines = (args.Length > 5) ? Int32.Parse(args[5]) : 1; int another = (args.Length > 6) ? Int32.Parse(args[6]) : 1; var format = new Uri(@uriBase + "twitter-40"); var collection = readComputation .ReadHdfsBinaryCollection <Edge>(format); Stream <int[], Epoch> readStuff = null; switch (args[3]) { case "sp": readStuff = collection.GroupEdgesSingleProcess(parts, parts); break; case "pp": readStuff = collection.GroupEdgesPartsPerProcess(parts, parts, 16); break; case "op": readStuff = collection.GroupEdgesOnePerProcess(parts, parts, 16); break; case "hp": readStuff = collection.GroupEdgesHierarchyPerProcess(parts, machines, 16); break; case "hhp": readStuff = collection.GroupEdgesHierarchyPerProcess(parts, machines * another, 16); break; default: throw new ApplicationException("Grouping type must be sp, pp, op, hp or hpp"); } var sink = new InterGraphDataSink <int[]>(readStuff); readComputation.Activate(); readComputation.Join(); Console.WriteLine("Reading done: " + readWatch.Elapsed); for (int i = 0; i < 20; ++i) { var stopwatch = System.Diagnostics.Stopwatch.StartNew(); using (var computation = controller.NewComputation()) { var firstStage = computation.NewInput(sink.NewDataSource()) .ReformatInts() .UnionFindHashTable(65000000, parts * machines * another, machines * another); switch (args[3]) { case "sp": firstStage .PartitionBy(x => parts * parts) .UnionFind(65000000); break; case "pp": firstStage .PartitionBy(x => 16 * parts) .UnionFind(65000000); break; case "op": firstStage .PartitionBy(x => 16 * (parts * parts)) .UnionFind(65000000); break; case "hp": if (parts * parts < 16) { firstStage .PartitionBy(x => 16 * x.destination + (parts * parts)) .UnionFindStruct(65000000, 0, 0) .PartitionBy(x => 16 * (machines * machines)) .UnionFind(65000000); } else { firstStage .PartitionBy(x => 16 * (x.destination + (machines * machines))) .UnionFindStruct(65000000, 0, 0) .PartitionBy(x => 16 * ((machines * machines) + (machines * machines))) .UnionFind(65000000); } break; case "hhp": firstStage .PartitionBy(x => 16 * ((x.destination / (machines * machines)) + (machines * machines * another * another)) + (x.destination % (machines * machines))) .UnionFindStruct(65000000, -machines * another, another) .PartitionBy(x => 16 * (x.destination + (another * another) + (machines * machines * another * another))) .UnionFindStruct(65000000, -another, 1) .PartitionBy(x => 16 * ((another * another) + (another * another) + (machines * machines * another * another))) .UnionFind(65000000); break; default: throw new ApplicationException("Grouping type must be sp, pp, op, hp or hpp"); } computation.Activate(); computation.Join(); } Console.WriteLine(stopwatch.Elapsed); } } controller.Join(); } } if (algorithm == "hashtableonlycc" && dataset == "twitter") { using (Microsoft.Research.Peloponnese.Hdfs.HdfsInstance hdfs = new Microsoft.Research.Peloponnese.Hdfs.HdfsInstance(new Uri(uriBase))) { // HDFS needs to be initialized from the main thread before distributed use bool exists = hdfs.IsFileExists("/dummy"); } var readWatch = System.Diagnostics.Stopwatch.StartNew(); using (var controller = NewController.FromConfig(configuration)) { using (var readComputation = controller.NewComputation()) { int parts = (args.Length > 4) ? Int32.Parse(args[4]) : 1; int machines = (args.Length > 5) ? Int32.Parse(args[5]) : 1; int another = (args.Length > 6) ? Int32.Parse(args[6]) : 1; var format = new Uri(@uriBase + "twitter-40"); var collection = readComputation .ReadHdfsBinaryCollection <Edge>(format); Stream <int[], Epoch> readStuff = null; switch (args[3]) { case "sp": readStuff = collection.GroupEdgesSingleProcess(parts, parts); break; case "pp": readStuff = collection.GroupEdgesPartsPerProcess(parts, parts, 16); break; case "op": readStuff = collection.GroupEdgesOnePerProcess(parts, parts, 16); break; case "hp": readStuff = collection.GroupEdgesHierarchyPerProcess(parts, machines, 16); break; case "hhp": readStuff = collection.GroupEdgesHierarchyPerProcess(parts, machines * another, 16); break; default: throw new ApplicationException("Grouping type must be sp, pp, op, hp or hpp"); } var sink = new InterGraphDataSink <int[]>(readStuff); readComputation.Activate(); readComputation.Join(); Console.WriteLine("Reading done: " + readWatch.Elapsed); for (int i = 0; i < 20; ++i) { var stopwatch = System.Diagnostics.Stopwatch.StartNew(); using (var computation = controller.NewComputation()) { var firstStage = computation.NewInput(sink.NewDataSource()) .ReformatInts(); if (parts * machines * another > 1) { firstStage = firstStage .UnionFindHashTable(65000000, parts * machines * another, machines * another); } switch (args[3]) { case "sp": firstStage .PartitionBy(x => parts * parts) .UnionFindHashTable(65000000); break; case "pp": firstStage .PartitionBy(x => 16 * parts) .UnionFindHashTable(65000000); break; case "op": firstStage .PartitionBy(x => 16 * (parts * parts)) .UnionFindHashTable(65000000); break; case "hp": if (parts * parts < 16) { firstStage .PartitionBy(x => 16 * x.destination + (parts * parts)) .UnionFindHashTable(65000000, 0, 0) .PartitionBy(x => 16 * (machines * machines)) .UnionFindHashTable(65000000); } else { firstStage .PartitionBy(x => 16 * (x.destination + (machines * machines))) .UnionFindHashTable(65000000, 0, 0) .PartitionBy(x => 16 * ((machines * machines) + (machines * machines))) .UnionFindHashTable(65000000); } break; case "hhp": firstStage .PartitionBy(x => 16 * ((x.destination / (machines * machines)) + (machines * machines * another * another)) + (x.destination % (machines * machines))) .UnionFindHashTable(65000000, -machines * another, another) .PartitionBy(x => 16 * (x.destination + (another * another) + (machines * machines * another * another))) .UnionFindHashTable(65000000, -another, 1) .PartitionBy(x => 16 * ((another * another) + (another * another) + (machines * machines * another * another))) .UnionFindHashTable(65000000); break; default: throw new ApplicationException("Grouping type must be sp, pp, op, hp or hpp"); } computation.Activate(); computation.Join(); } Console.WriteLine(stopwatch.Elapsed); } } controller.Join(); } } if (algorithm == "connectedcomponents" && dataset == "livejournal") { var stopwatch = System.Diagnostics.Stopwatch.StartNew(); using (var computation = NewComputation.FromConfig(configuration)) { var edges = System.IO.File.OpenRead(livejournalFile) .ReadEdges() .AsNaiadStream(computation); edges.UnionFind(5000000) .PartitionBy(x => 0) .UnionFind(5000000); computation.Activate(); computation.Join(); } Console.WriteLine(stopwatch.Elapsed); } #endregion }
public EmptyPrefixExtender(InterGraphDataSink <HashSet <TValue> > stream, Int64 count) { this.Stream = stream; this.Count = count; }
/// <summary> /// Creates an index with no key (only a collection of values) /// </summary> /// <typeparam name="TValue">The type of the values</typeparam> /// <typeparam name="TRecord">The type of the source records</typeparam> /// <param name="source">The source of records</param> /// <param name="controller">The controller</param> /// <param name="valueSelector">A function from record to value</param> /// <returns>An index of values with no key</returns> public static EmptyKeyIndex <TValue, TRecord> ToEmptyIndex <TValue, TRecord>(this InterGraphDataSink <TRecord> source, Controller controller, Func <TRecord, TValue> valueSelector) where TValue : IComparable <TValue> { return(new EmptyKeyIndex <TValue, TRecord>(source, controller, valueSelector)); }