In 2021 and 2022, when Amazon Science asked members of the Knowledge Discovery and Data Mining Conference (KDD) program committees to discuss the state of their field, the conversations turned the Surrafen Neural Networks.
Graph learning remains the most popular topic on KDD 2023, but as Yizhou Sun, a lecturer in computer science at the University of California, Los Angeles; An Amazon -Learn; And the general meat of the conference, explains, it doesn’t mean the field has still stood.
Graf Neural Networks (GNNS) are machine learning models that produce embedders or vector representations of graph nodes that capture information about the nodes’ relationship with other nodes. They can be used for graph -related tasks, such as predicting edges or labeling nodes, but they can also be used for any downstream processing tasks that simply take advantage of the information encoded in graph structure.
But without the general definition, “involving ‘graph neural network’ could be very different,” says Sun. “‘Graph Neural Network’ is a very broad expression.”
For example, Sun explains traditional gnns user Passing message to produce embedders. Each knot in the graph is embedded, and then each node receives the embedders of its nearby nodes (the passed messages) as it integrates into an updated embedding. Typically, this process is performed two to three times, so the embedding of each knot catches around its one- to three-hop neighborhood.
“If I send a message, I can only collect information from my immediate neighbors,” explains Sun. “I have to go through many, many layers to model long -haul dependencies. For some specific applications, such as software analysis or simulation of physical systems, long -string addiction becomes critical.
“So people asked how we can change this architecture. They were inspired by the transformative” -The attention-based neural architecture, which underlies today’s great language models “because the transform can be considered a special case of a graph of neural network, where in the input window can every token all other token.
“If each knot can communicate with each knot in the graph, you can easily solve this long-strand-dependence problem. But there will be two restrictions. One is efficiency. For some graphs there are many millions or even billions of nodes.
The other thing about, explains Sun, explains that too long connection undermines the very point of graphic representation. Graphs are useful because they capture meaningful relationships between nodes – meaning to omit the meaningless. If each knot in the graph communicates with any other knot, they dilute meaningful connections.
To fight this problem, “people try to find a way to emulate the position that encodes the text setting or image setting,” says Sun. “In the text setting we just made the position a certain coding. And later, in the computer vision domain, people said, ‘Okay, let’s do it with photo patches too.’ So, for example, we can divide each image into six-to-six-patches, and the relative location of these patches can be transformed into a position that encodes.
“So the next question is, in the graph setting, how we can get the natural kind of relative position? There are different ways to do it, as a random walk – a very simple. And also people try to make self -emergency, where we ualize self -vectors to code the relative position of these nodes. Much time consumption, so again, it comes down to the effectiveness problem.”
Efficiency
In fact, Sun explains that improving the effectiveness of GNNs is in itself an active is of research-from-algorithmic design at high level down to the level of chip design.
“At the algorithm level, you can try to make something out of sampling technique, just try to make the number of operations smaller,” she says. “Now you can just design some more effective algorithms to sparsify the graphs. For example, let’s say we would do some kind of equality search to keep the most similar nodes for each target knot. Then people can design some smart index technology to do it very quickly.
”And in Inferens internships, we can do knowledge distillation to distill a very complicated model, let’s say a graph neural network, to a very simple graph neural network – or not necessary a graph of neural network, maybe just a very simple kind of structure, like a mlp [multilayer perceptron]. Then we can make calm much faster. Quantity can also be used in the inference practice space to make the calculation much faster.
“So it’s at the algorithm level. But today people go Depeeper. Somits, if you want to solve the problem, go to the system level. So people say, let’s see how we can design this distributed system to speed up training, speed up the inference.
“For example, in some cases, memory becomes the biggest limitation. In this case, the only thing we can do is distribute the workload. Then the natural problem men are how we can coordinate or synchronize the model parameters trained by each calculation node. If we need to distribute the data to 10 machines that coordinate with these 10 machines to make sure you have only one last version?
“And people are even going out of doing the acceleration on the hardware page. So software-hardware-co-Design is also becoming more and more popular. It requires people to really be so many different fields.
“Incidentally, at KDD, compared to many other machine learning conferences, properties in the real world are always our top focus. In many boxes to solve the real world, we have to talk to people with different backgrounds because we can’t just wrap it up for ideal problems we solved when we were in high school.”
Applications
In addition to such general efforts to improve Gnns’ versatility and accuracy, however, there is also new research on specific uses of GNN technology.
“There is some work on how we can do causal analysis in the graph setting, which means that the objects are actually disturbing each other,” explains Sun. “This is very different from the traditional framework: the patients in a drug examination are, for example, independent of each other.
“There is also a new tendency to combine deep representation learning with causal relationship. For example, how can we take over the treatment where you try a continuous vector that is a binary treatment? Can we make the treatment time continuous that it is not just a static kind of disposable treatment?
“Graphs can also consider a good data structure for desecrigent multi -dynamic systems – how these objects interact with each other in a dynamic network setting. And then how can we incorporate the generative idea into graphs? Graph generation is very useful for many fields, such as in the pharmaceutical industry.
“And then there are so many applications where we can benefit from large language models [LLMs]. For example, Knowledge Graph Reasoning. We know that LLMS hallucinate and reasoning of Kgs are very strict. What would be a good combination of these two?
“With GNNs there are always new things. Graphs are just a very useful data structure to model our interconnected world.”