Graph mining was "so hot right now" ten years ago. Remember GraphX (https://spar...

PaulHoule · on Oct 3, 2023

There are "graph databases" which see graphs as a universal approach to data, see RDF and SPARQL and numerous pretenders. For that matter, think of a C program where the master data structure is a graph of pointers. In a graph like that there is usually a huge number of different edge types such as "is married to", "has yearly average temperature", ...

Then there are "graph algorithms" such as PageRank, graph centrality, and such. In a lot of those cases there is one edge type or a small number of edge cases.

There are some generic algorithms you can apply to graphs with many typerd edges edges such as the magic SPARQL pattern

  ?s1 ?p ?o .
  ?s2 ?p ?o .

which finds ?s1 and ?s2 that share a relationship ?p with some ?o and is the basis for a similarity metric between ?s1 and ?s2. Then there are the cases that you pick out nodes with some specific ?p and apply some graph algorithm to those.

The thing about graphs is, in general, they are amorphous and could have any structure (or lack of structure) at all which can be a disaster from a memory latency perspective. Specific graphs usually do have some structure with some locality. There was a time I was using that magic SPARQL pattern and wrote a program that would have taken 100 years to run and then repacked the data structures and discovered an approximation that let me run the calculation in 20 minutes.

Thus practitioners tend to be skeptical about general purpose graph processing libraries as you may very have a problem that I could code up a special-purpose answer to in less time than you'll spend fighting with the build system for that thing that runs 1000x faster.

----

If you really want to be fashionable though, arXiv today is just crammed with papers about "graph neural networks" that never seem to get hyped elsewhere. YOShInOn has made me a long queue of GNN papers to look at but I've only skimmed a few. A lot of articles say they can be applied to the text analysis problems I do but they don’t seem to really perform better than the system YOShInOn and I use so I haven’t been in a hurry to get into them.

Someone · on Oct 3, 2023

> a universal approach to data, see RDF and SPARQL and numerous pretenders. For that matter, think of a C program where the master data structure is a graph of pointers.

A graph of typed pointers. As you likely know, the basic element of RDF is not “foo has a relationship with bar”, but “foo has a relationship with bar of type baz”.

Also, the types themselves can be part of relationships as in “baz has a relationship with quux of type foobar”

> The thing about graphs is, in general, they are amorphous and could have any structure (or lack of structure) at all which can be a disaster from a memory latency perspective

But that’s an implementation detail ;-)

In theory, the engine you use to store the graph could automatically optimize memory layout for both the data and the types of query that are run on it.

Practice is different.

> Thus practitioners tend to be skeptical about general purpose graph processing libraries

I am, too. I think the thing they’re mostly good for is producing PhD’s, both on the theory of querying them, ignoring performance, and on improving performance of implementations.

PaulHoule · on Oct 3, 2023

Funny, the core table of salesforce.com is triples but they got a patent circa 2000 on a system that builds indexes and materializes views based on query profiling so the performance is good (w/ gold plated hardware). That patent is one reason why graph databases sucked for a long time.

Now the Lotus notes patents have been long expired so I’d like to see some graph database based products that can do what Notes did 30 years ago but it is lost technology like the pyramids, stonehenge and how to make HTML form applications without React.

throwaway290 · on Oct 4, 2023

> foo has a relationship with bar of type baz

Nope, "of type baz" is not required.

hobofan · on Oct 4, 2023

Depends on the perspective. The predicate will always be an IRI. The object will either be an IRI or a literal, and all literals in RDF (as of RDF 1.1) are typed , though serialization formats like Turtle work with implied types.

There is also the option of blank nodes for objects, though in almost all implementations they are stand-ins for anonymous IRIs, so in some sense or another almost anything has "a" type.

PaulHoule · on Oct 4, 2023

God the word "type" is overloaded in common discourse. For a link

   ?s ?p ?o .

you could say this is a link of "type" ?p, but technically in RDF we say ?s is of type ?type if

   ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .

which can be abbreviated as

   ?s a ?type .

If you have RDFS inference turned on, notably, just using ?p in a triple will imply

   ?p a rdf:Property .

In plain RDF you can say a property is of some other ?type but I think you can get in trouble with that if you want to do OWL DL inference and you might want to say something like

   ?p rdfs:subPropertyOf :SomeSpecialKindOfProperty .

throwaway290 · on Oct 5, 2023

if baz is meant to be relationship then I was low key wrong in my comment (I thought baz was the kind of bar, which can be untyped). But I guess the relationship must be a property at least

esafak · on Oct 3, 2023

1. Graph algorithms like the ones you mentioned are processed not by graph databases like Neo4j, but graph processing libraries like the titular Google library.

2. Geometric learning is the broader category that subsumes graph neural networks.

https://geometricdeeplearning.com/

PaulHoule · on Oct 3, 2023

Depends, some graph databases have some support for graph algorithms.

I’ll also say I think graph algorithms are overrated, I mean you know the diameter of some graph: who cares? Physicists (like me back in the day) are notorious for plotting some statistics on log-log paper, seeing that the points sorta kinda fall on a line if you squint and decide that three of the points are really bug splats and then yelling “UNIVERSIALITY” and sending it to Physical Review E but the only thing that is universal is that none of them have ever heard of a statistical estimator or a significance test for power law distributions. Node 7741 is the “most central” node, but does that make a difference? Maybe if you kill the top 1% central nodes that will disrupt the terrorist network but for most of us I don’t see high quality insights coming out of graph algorithms most of the time.

esafak · on Oct 3, 2023

> Physicists (like me back in the day) are notorious for plotting some statistics on log-log paper...

For people who've missed it: So You Think You Have a Power Law — Well Isn't That Special? (http://bactra.org/weblog/491.html) :)

reaperman · on Oct 3, 2023

I still use NetworkX a lot when a problem is best solved with graph analysis, I really enjoy the DevEx of that package.