Solution
Traditional methods for reporting on SMF data rely on extracting data periodically to off-platform tools. This results in data that is hours or days old, showing only historical data. Therefore, operators rarely use SMF for diagnosis. Using generative AI to query real-time SMF data using natural language revolutionizes troubleshooting. Giving LLMs access to real-time SMF data allows for quick querying, analysis, comparison and explanation of system behavior.

The system log is a critical source of information for both normal and unexpected system behaviour. Often, the logs can contain millions of messages per day, which might make it difficult to review and locate any errors. By reading syslog/operlog, Generative AI can detect errors, explain root cause of issues, recognize anomalies and recommend optimizations. This enables operations teams to quickly understand what’s happening in the system and find a resolution, reducing errors and outages.

Generative AI can simplify interpretation of job output. Rather than sifting through lengthy spool data sets, operators can use generative AI to quickly determine the job's execution status. Furthermore, generative AI can synthesize the key information from JCL, job messages, and log records to explain why a failure occurred, identify the root cause and suggest remediation steps, significantly reducing debugging time and improving overall operational efficiency.

In HA sysplex environments, maintaining symmetric configurations for critical subsystems like CICS and DB2 is crucial for consistent performance and failover integrity. Generative AI enables using real-time SMF data to compare configuration attributes and system activity between sites in natural language, quickly identify discrepancies and workload imbalances across the sysplex, allowing operators to proactively fix asymmetries before they cause service degradation or failover problems.

Imagine an operator who sees their monitor turning red due to increasing response time in CICS could ask “Retrieve the last 5 minutes of SMF data and OPERLOG from SYSA and check why CICS TRN1 is slow”. The same applies for other subsystems such as DB2 and MQ. The table below shows examples of using natural language to troubleshoot developing system issues. Various data sources may be combined to provide the best answer, such as SMF records, OPERLOG messages and job log output.
z/OS

CICS

DB2

IMS

MQ
