Problem Statement

Concurrency was approached by a manufacturing client of ours requesting help with giving human understandable meaning to legacy code within their ERP systems. This client has more recently defined a standard engineering specification to define their product constraints for manufacturing. It’s this specification that is now used to specify new code added to their ERP systems.

The problems Concurrency was asked to help with were:

Reverse engineer the legacy ERP code into a product specification for a target audience like business analysts.
Automate or improve the process of this reverse engineering effort.

They provided a Product Specifications business document and a corresponding ERP code-dump, for one of their products.

At a high level, the legacy code wasn’t very clear to understand by a common software engineer, with condensed variable naming conventions, brief comments, external dependencies, and non-modular code resembling scripts. The system code was from a legacy BaaN ERP written in BaaN’s custom language. This language developed in the 90’s was popular up until the early 2000’s (link).

Throughout the rest of this article, we’ll cover:

Prompting Options
Large Language Model Choices
Primary Research Objective – Convert BaaN code to specifications
- Experiment 1 – Basic Prompting
- Experiment 2 – Prompting with templates
- Experiment 3 – One-shot Prompting (In-context learning)
Secondary Research Objective – Convert specifications to BaaN code
Concluding thoughts

Prompting Options

Pros and cons for techniques applicable for this use-case are summarized in the following flowchart we created at Concurrency. The brown boxes denote the general technique one could try, while the navy-blue boxes describe the experiment implementation for that technique:

Prompt engineering revolves around articulating questions or instructions as prompts to an AI language model, that guide it towards generating a desired response. A templated response, along with in-context learning for the task at hand, seemed the logical prompt engineering approach.

Fine-tuning an existing model was not explored here, since we didn’t have a training task (or data) defined clearly yet. Scraping the unstructured Baanboard website forum as unclean training data could be an intensive activity both in terms of research and standardization efforts, and an interesting point to note- “Unfortunately there is no (open-sourced) book to learn Baan” link. The efforts for curating and annotating a huge dataset for this supervised task, might render this automation invaluable in the long run.

All in all, prompt-engineering seemed a better fit, rather than fine-tuning an existing model.

Large Language Model Choices

We could also try out more versatile models than GPT’s, that were trained with more specialized programming languages (link). Relevant shortlisted candidates include:

HuggingFace’s StarCoder model (trained on 83GB of 86 programming languages, 54GB GitHub Issues, 13GB Jupyter notebooks, 32GB GitHub commits): demo.
HuggingFace’s WizardCoder: link.
OpenAI + Github Copilot’s Codex proprietary model: link.
Google’s Vertex AI Codey models (show capabilities in Prolog, Fortran, Verilog): link.
Google Deepmind’s AlphaCode (715GB of Github + CodeForce problems): link.

The LLM-Humaneval benchmark metrics on HuggingFace indicate GPT-4 is the best for code-related tasks, while WizardCoder seems to be the best open-sourced model (link) for code-related tasks, as of 25th August:

With access to Azure OpenAI’s easy to use interface for prompting, we decided to deploy and experiment with an instance of GPT-4 throughout this research activity.

Primary Research Objective

Translate the legacy ERP code into human readable text for a business analyst to review.

It became essential to evaluate the capabilities of a State-of-the-Art LLM (ie- GPT4), in terms of understanding a legacy programming language, which might have comprised a very small to no portion of its training corpus.

Experiment 1

A simple prompt to an OpenAI GPT-4 model, asking to explain a chunk of the code, returned a fairly vague and generic response.
Note- The following is a partial snippet just to show what the code looks like.

Prompt:

Can you tell me what this code is doing?
```
| Input Variables
string gsprtnmbr global | gid part number
string gsnotfund global | part number not found
string gssizupdt global | Size for Update
| Output Variables
long llerror9040 local | dll error
long lldattype local | Item Data Type
string lsitmcode local | item code
string lscompnr local | current company
string gsprdtype global | Product Type from GID
string lsprditem local | Product item
| Import item code and item field to be updated.
!#pragma used dll oawpcfdll9040
!#pragma used dll oawpcfdll9990
! table ttipcf300
! import (\"tipcf300.item\",lsitmcode)
! import (\"tipcf300.koid\",lldattype)
| Read product type for current item code when item field is Item Code
| If #2
! if str$(lldattype) is {\"1\"} then
! lscompnr = sprintf$(\"%03d\",get.compnr())
lsitmcode = [sls_var1] & [sls_var2] &
[sls_comp] & [sls_year]
! endif | If #2
if val([unt_mlid]) > 0 then
! lscompnr = sprintf$(\"%03d\",get.compnr())
lsitmcode = [sls_var1] & [sls_var2] &
lscompnr & [sls_year]
!table ttipcf300
!import (\"tipcf300.item\",lsprditem)
! llerror9040 = awpcf.dll9040.lookup.prod.type(
! lsprditem,gsprdtype)
if llerror9040 is not {0} then
gsprdtype = \"999\"
endif
```

Response:

Experiment 2

The second experiment here shows results from a prompt with templates for getting a human readable description in an expected format.

System Message:

You are an expert programmer in Baan 4GL language.

Prompt:

Comments are marked with '|'
Each Variable declaration is followed by a comment for its business meaning.       
Your task is to interpret the following Baan code from an ERP system into concise layman's terms with a business understanding. 
Your response should be as follows: 
" 
[Variables]:  
[Business Context]: 
" 
```Baan 
Code to be explained is here.
```

Response:

The output seems quite general and not really informative of the code’s behavior.

Experiment 3.1

The next experiment uses a 1-shot prompting technique where we use an example of code and human readable text with explanations for in-context training of the LLM, rather than fine-tuning the model.

System Message:

You are an expert programmer in 4GL Baan language.

Example 1 Prompt (User):

```Baan
Example 1 code here.
```

Example 1 Response (Assistant):

This Baan code is a part of an ERP system, dealing with the processing of sales orders and product data. Let's break down the code's business logic step by step:

**Variable Declarations:**

**Importing Data:**

**Generating Item Codes:**

**Processing Based on Conditions:**

**DLL Function Calls:**

**Sequence Number Handling:**

**Product Type Handling:**

**Final Item Data:**This code processes various product and sales order-related data. It generates item codes based on different conditions, including product types, sequence numbers, and other attributes. It interacts with external DLLs to perform lookups and retrieve additional data. The final result is the generation of an item code that is stored in the `item_data` variable.

Prompt:

Comments are marked with '|'.
Each Variable declaration is followed by a comment for its business meaning.     
Your task is to interpret the following Baan code from an ERP system into concise layman's terms with a business understanding.
```Baan
            New Code to be explained is here.
```

Response:

The output shows a much more descriptive and meaningful response. We could even structure its output with prompt templates, to ensure a standardized response.

Experiment 3.2

A response for the same setting as in Experiment 3, but with “Baan” replaced by “Bash” instead to check if the model can generalize to a programming language present in abundance during its training phase.

Response:

No new information gained than in Experiment 3; only that most variable declarations were identified and described in a structured manner.

Secondary Research Objective

Instead of trying to parse a Functional Document Specification artifact directly into Baan code, we also tried passing in human readable input prompts (like a specification) and expected the LLM to generate the Baan code.

It was essential to evaluate the capabilities of a State-Of-The-Art LLM (ie- GPT4), in generating a legacy programming language, based on token patterns.

Results from a 1-shot prompting technique with Chain-of-Thought reasoning for generating Baan code, shows syntax that resembles the ERP code, but variables are named in a Pythonic way (for example- glass_glazing_panel1, glass_glazing_panel2, etc.).

In the absence of Baan documentation, it’s not clear if such global/local variables will break the ERP system in Production. There could be specific variable naming conventions that must be followed.

More importantly, although the syntax looks fairly acceptable (a plausibly hallucinated programming language), we can’t verify the code executability or correctness, like ChatGPT does for Python/Java in a sandbox environment. Baan code seems to be specifically catered to an ERP ecosystem.

Concluding thoughts

This POC, although simplistic at first glance (convert code into docstring-like text), has nuances when it comes to legacy code that an LLM might not have encountered before.

Prompt-engineering (mainly in-context learning) does show promise when we demonstrate the expected output, for pulling out the essential functionality of legacy code.

However, there are limitations, refer to the “In-Context Learning” block in the theoretical summary flowchart:

LLMs have a fixed context-window size for input tokens (standard GPT-4 allows 8,000). By specifying task examples (ie- how we want the model to convert code into readable text), we’re subsequently restricting GPT4’s context-window size for the actual code to convert into text. There’s an unavoidable tradeoff.
We didn’t yet know if the code is going to be a huge monolithic script with randomly placed variable declarations and functionality, or will it be in chunks like self-contained functions.
- If monolithic codebase, same issue as #1 above of context-window limitations.If modular code chunks, we’d require someone to create at least 3 “code -> text” examples for us to experiment further with.
- In either case, we could propagate variable definitions with each prompt, by using a json-like mapping for a big picture of the ERP ecosystem:

        {
                “var1” : “This variable does ABC. It updates XYZ in the ERP.”,
                …
        }

Migrating the Baan ERP code to more modern frameworks like Python/Java/JavaScript would be beneficial to:

Make the code more readable and usable for future use-cases.
Then, easily integrate an LLM-based solution (like GitHub-Copilot) to summarize code into text, or convert docstring-like text into working code on the fly.

But, directly using LLMs for such code-migration has certain questionable aspects:

There are elements of randomness for the tokens generated: temperature, top likelihoods of next token prediction, etc. that would lead to inaccurate design patterns and code smells.
Even if we can ensure that the generated syntax looks accurate through static type checks, ensuring the code performs its required business functionality is a hard challenge revolving around unit testing and regression testing.
Executing the LLM generated code in a sandbox ERP-like environment is an option but could be a huge manual code-migration activity in itself.

We’ll be sure to publish a Part-2 to this blog for more approaches that show valuable promise. Reach out on Concurrency’s LinkedIn to get in touch and discuss more!

Delivering the Future – Build on Azure

Exploring Beyond Today, Shaping Tomorrow’s Success

What’s New With Concurrency

Discover Our Story

ERP Code to Product Specification Conversion with OpenAI

Problem Statement

Prompting Options

Large Language Model Choices

Primary Research Objective

Experiment 1

Experiment 2

Experiment 3.1

Experiment 3.2

Secondary Research Objective

Concluding thoughts