People sitting around a table

adesso Blog

A practical application example of a Neo4J graph database

Cobol? Copybook? Assembler? JCL? Many stop listening when they hear these terms. It’s hardly surprising considering that the history of Cobol dates back to 1959.

When you think of legacy code, frustration, a fear of change and having to deal with disproportionate amounts of complexity immediately come to mind. For a company, however, it also means the opportunity for digital transformation. A chance to transform a system that has grown for decades into one built on an important cornerstone: clean software architecture. Having an in-depth analysis is a must if you want to get an overview of your legacy system.

Using Neo4J with at|analyze

I will explain how to get an overview of a legacy system with the analysis tool at|analyze and a Neo4J graph database:

Neo4J belongs to the class of NoSQL databases. What does NoSQL mean? NoSQL stands for ‘Not only SQL’ and for database systems that break with the characteristics of typical relational databases. Thanks to the lack of rigid schemas that relational databases have, NoSQL systems are very flexibly in the way they can be used and are suitable for large amounts of data.

In the at|analyze use case, there is one major advantage of a graph database over relational database systems in that it is much simpler to map hierarchical and networked structures. Neo4J is therefore perfect for mapping the complex program structures of legacy systems.

What is Cypher?

Just as SQL is the standard query language for relational databases, Cypher is an open, multi-vendor query language for graph technologies. The Cypher query language is used in Neo4J. It is a declarative graph query language that enables users to create expressive and efficient queries as well as update and manage graphs.

Cypher was inspired by different languages. Many of the keywords such as WHERE and ORDER BY are inspired by SQL. Pattern matching borrows expression approaches from SPARQL. Some of the list semantics are borrowed from languages such as Haskell and Python.

Structure of Neo4J nodes, relations, labels and properties

The starting point of a graph is a node. Nodes represent the entities of a domain. They may have no labels and properties, or multiple labels and properties. Labels can be used to group (in other words, classify) nodes into sets. Nodes can also have no relations, or several relations.

The simplest graph to map would be a single node with no relation.

Structure of a Neo4J node

Structure of a Neo4J node

The labels are Program and CobolProgram, the property names are 'CblProgram001' and linesOfCode: 2319.

A slightly more complex graph would be two Cobol programs connected by several nodes and relations:

Complex graph with multiple nodes and relations

Complex graph with multiple nodes and relations

This example shows that there is always a relation between a source node and a target node. However, a node can also have several relations to other nodes. For example, the Cobol program ‘CobolProgram001’ could not only have a relation to a call, but also to other calls/Usings/Records.

The graph only shows that the ‘CobolProgram001’ call calls the Cobol program ‘CobolProgram001’ with the record ‘AZ200’. This record is also used by the Cobol program ‘IE600’.

Cypher versus SQL

Now that we have gotten to know our data model, we will now list examples of queries in SQL along with the same queries in Cypher.

Simple reading of program names (first SQL, then Cypher):

	
		SELECT program.name
		FROM cobol_program;
	
	
		MATCH (cobol:CobolProgram)
		RETURN cobol.name;
	

In Cypher, we match the label ‘CobolProgram’ to ‘cobol’, can address ‘cobol’ in the RETURN statement and access the properties of the node.

Simple join of calls that call a Cobol programme:

	
		SELECT program.name, call.name
		FROM cobol_program
		JOIN calls AS calls ON calls.program_id = cobol_program.id
		JOIN call ON calls.call_id = call.id;
	
	
		MATCH (cobol:CobolProgram)<-[:CALLS]-(call:Call) 
		RETURN cobol.name, call.name
	

In the join example, the label ‘CobolProgram’ is matched to ‘cobol’ and ‘Call’ is matched to ‘call’. This allows ‘cobol’ as well as ‘call’ to be addressed in the RETURN statement and the properties of the nodes to be accessed. ‘<-[:CALLS]-’ allows you to define which relationship exists and in which direction it runs.

You can see a trend in the second example: SQL is optimised for relational database models, but as soon as it has to process complex, relationship-oriented queries, the queries become larger. In these cases, the fundamental problem is not with SQL, but with the relational model itself, which is not designed to handle graphically linked data.

The graph model is recommended for domains with highly interconnected data, meaning a graph query language such as Cypher is also recommended. However, Cypher is easy to learn if you have had some experience in SQL.

What does at|analyze do exactly?

The Assembler, JCL and Cobol programs are first read in. This is done by importing them into at|analyze. Various parsers are at the heart of the import, which analyse the source code and separate the actual code from other things such as comments.

After the parsing phase, the resolvers step in and resolve the dependencies/relationships between programs or program sections.

We can then get a general overview in the at|analyze dashboard.

at|analyze dashboard

at|analyze dashboard

Along with the dashboard, we can also view the detail page of a program that has been analysed. If we open the call graph tab of a Cobol programme on the details page, we see the complex structure of the program that has been analysed. In the following example, you can see a Cobol program with about 16,000 lines of code.

Program detail page in at|analyze

Program detail page in at|analyze

at|analyze contains even more features. These include:

Artefact details
  • Overview of the artefact call graph
  • Grouping of the program sections
  • Grouping of the program data structures
  • Grouping of the program CopyBooks
  • Grouping of the database tables addressed
  • Grouping of the called SQL queries
Reporting
  • Inventory report: Overview, statistics and diagrams of the analysis
  • Missing objects report: Overview of all programs found without source code
  • Migration report: Overview of the current migration status
  • Call tree report: Overview of a specific call tree of a program

Conclusion

at|analyze allows you to analyse a legacy system or legacy code in detail using various parsers, a graph database (Neo4J) and a rich portfolio of features. It’s impossible to avoid analysing legacy code for Cobol and/or Assembler software projects that have grown over decades.

Legacy systems slow down the digital transformation. Are you looking for the right solution? Then take a look at our website and talk to our experts at adesso Transformer GmbH.

adesso Transformer GmbH

You will find more exciting topics from the adesso world in our latest blog posts.

Picture Moritz  Michel

Author Moritz Michel

Moritz Michel is a software development trainee for the Line of Business Public at adesso in Düsseldorf. He is currently developing features for the analysis tool "at|analyze" and thereby supports the product development of the adesso Transformer.

Save this page. Remove this page.