SAP Data Services receives an update
SAP Data Services is an ETL tool for extracting raw data, transforming it into business sense, and loading the result into a target for reporting, Business Intelligence tools, data migration, or any other generic IT requirements. data integration. Building such data flows and orchestrating them into workflows is the daily task of SAP Data Services users.
A data integration project requires hundreds of data flows with complex business logic. It’s easy to forget something. Data Services has just received an update – but not from SAP itself, but from SAP partner Wiiiisdom – and now has the functionality required to increase project resilience.
When I was a product manager for Data Services, we added some basic functionality in this regard, but not as much as I had always hoped. The main problem is that getting the data is harder than it looks, because the majority is stored as text in the repository in the so-called ATL format.
Example: The ATL description of a flat file looks like this and is stored as free text in a database table. Finding all files with a certain parameter requires interpreting the text.
Recently I learned that there is a product called 360Eyes for Data Services which does what I always imagined for data services and more.
360Eyes reads the repository, parses the ATL language, and stores the extracted data in database tables ready for consumption. This has multiple advantages:
- Data can be queried and viewed with any reporting tool, including SAP BI tools.
- All kinds of health checks can be created using the reporting tool.
- Since the data is never overwritten, comparisons can be made to determine what has changed.
- Regulatory business requirements can be validated as another type of control.
As I can attest from my own client projects, it improves the quality of Data Services projects, increasing developer efficiency and the quality of data integration data flows.
Another way to look at it is to use it as part of unit and integration testing that is common in professional software development. To get an idea of the savings, here are some examples of past projects I’ve been involved with.
User story: initial loading
A go-live is a stressful project phase, all the more so in a data integration project as the first task is to perform the initial load, for example taking the complete master data from the SAP material with million rows, converting each row to the target data model. , and load them into the target system.
In this specific use case, it had been tested over and over again, and it had to happen over a single weekend. And it did, with a few minor hiccups. On Monday morning, people complained that some documents existed twice for some unknown reason. Further analysis showed that all data streams related to an initial load were truncating the target except one. There, the flag had been forgotten.
The integrity check “Select all data streams with name DF_*_INITIAL where table loader is not set to truncate-table” would have discovered this problem. However, such functionality does not exist in Data Services. Checking these settings manually takes a lot of time. The 360Eyes database easily contains this information. One query is all it takes and we can immediately identify problematic data streams.
User story: version control
Not all Data Services implementations use the versioning functionality provided by central repositories and check-in/check-out processes. This is especially true in development environments, where changes are frequent. The Data Stream Compare feature allows users to quickly compare two different versions and see what has changed.
User story: reduced performance
Once I received a call from an IT department. They said the delta load job was taking too long, so much so that it was impacting other SAP ERP jobs scheduled at the same time. Delta charging took an hour when it was developed, now it takes three hours! We opted for a chart visualizing delta execution times for each day to uncover the root cause.
We mainly focused on three questions:
- Did this happen by accident, for example because of a slow database?
- Has the delta load gradually slowed down because the data volume has increased?
- Was there any particular day when the runtime went from one hour to three hours?
Thanks to the data made available in the 360Eyes database, such a graph can be created easily.
It clearly shows that something changed on a particular day. With this information, the 360Eyes database can be queried to find out what code change has been made.
If I had only used Data Services, I would have had to manually compare execution times for each task execution, note the execution times of each dataflow before and after, and then study the details of the dataflow.
These are just a few examples of the open internal data potential of SAP Data Services. In fact, I find the approach quite clever. Although SAP has integrated visualizations into the SAP Data Services management dashboard, which may or may not meet customer needs, 360Eyes’ approach is to make the data available in a regular database.
Thus, users can either use the provided reports or create additional visualizations according to their needs. They are no longer limited to predefined visualizations but can instead leverage the full power of BI tools already used in-house.
The data available ranges from individual Data Services objects and their settings to operational statistics and impact/lineage information – all with full history to answer questions like “What changed? ” or “Is there a trend?”. I can’t wait to see what else customers come up with now that the data is finally available.