HERE platform pipelines generate certain standard metrics that can be used to track their status over time. The standard metrics are listed in the Logs, Monitoring, and Alerts User Guide. Custom metrics can also be inserted into the pipeline's code. These metrics are all displayed in a Pipeline Status dashboard and are used to generate alerts for specific events associated with a pipeline job, like a failed job.
Monitor Pipeline Status
To monitor the status of a pipeline, a Pipeline Status dashboard is available in Grafana. From the platform portal, open the Launcher menu and select Primary monitoring and alerts (No. 1 in Figure 1). This takes you to the Grafana home page.
Figure 1. Primary monitoring and Alerts
Note
Launcher menu item 2 is also Grafana, but only for high availability catalogs. Menu item 3 is the link to Splunk for reviewing event and error logs.
The home page looks something like this:
Figure 2. Grafana Home Page
The home page allows access to several dashboards. Several default dashboards are listed on the left side of the page. And the available User Defined Alerts are listed on the right side of the page.
From the list of default dashboards, locate the Pipeline Status dashboard. Click on the dashboard name to open it.
Figure 3. Choose Pipeline Status dashboard.
The Pipeline Status dashboard displays the following status of Pipeline jobs:
Failed
Completed
Canceled
Submitted
Running
Each Pipeline Status is color-coded to allow quick identification. The dashboard can also be filtered by Pipeline Status and Pipeline Type (Flink or Spark). For more details, see Pipeline Status definitions.
Figure 4. Pipeline Status Dashboard
Note: Default Dashboard Settings
Default Time Period: Last 24 hours Default Refresh Interval: 30 minutes
Configure Pipeline Job Failure Alerts
Click on the Grafana logo in the top left corner of the screen. This opens the side menu bar.
Locate the Alerting item on the menu bar. Then, locate the Notification Channel on the submenu as shown here.
Figure 5. Open Alerts
Click on “Notification Channels” and the screen will change to show something like the image below.
Figure 6. List of Notification channels
Locate the Notification Channel named "Pipeline Failure Notification." Click on the channel's Edit button for to change the configuration of the notification channel.
Specify the list of email addresses that will receive failure alerts as shown here.
Figure 8. Specify Email Addresses.
To test your alert changes, click the Send Test button. This will send a test message to each email on the alert list.
Click the Save button to save your notification changes.
With these changes, the listed email addresses will start receiving alerts when Pipeline Jobs fail. You can also see the alerts on the Pipeline Status dashboard.
Note: Default Alert Settings
Default Alert Interval: Last 1 minute Default Alert frequency for Failed jobs: Every 60 seconds
Dashboard and Failure Alert Limitations
Caution: Dashboard Sampling
When choosing a larger sampling time-period in Grafana, it uses a sampling mechanism that shows fewer data points than it should. This allows for quicker responses, but to see more accurate data, you should shorten the time-period to be investigated.
Failure Email Alert Behavior
Failure Emails are only sent when the alert's state changes. For example, if a pipeline job fails, the alert goes to Alerting state and a failure email is sent to the specified recipients. If another pipeline job fails within the default alert interval of 1 minute, a second email cannot be sent. The first alert state must transition to the “No Data” state, at the end of the 1 minute interval, before any subsequent failures can trigger alert emails. This behavior results in the following two emails being sent:
[Alerting] - For Pipeline Jobs that failed within the last 1 minute period, including details about the failed pipeline jobs. Sent when the alert is first reported.
[No Data] - For Pipeline Jobs that failed in within the last 1 minute period, including an empty email body. Sent at the end of the 1 minute interval.
This is an inherent behavior of Grafana and not a limitation of the HERE platform. Figure 9 illustrates what is happening and how Fault 2 is not processed.
Figure 9. Grafana Alert Handling
Note: Splunk Dashboard
Click on the Logs menu item to get to the Splunk Dashboard. This will not focus on any one specific job; see the Error Logs section below for how to access the logs for a specific job.
Error Logs
There are 4 levels of logging available for platform pipelines: Debug, Warn, Error, and Info. The logging level can be set using the platform portal, the CLI, or the API. Or, you can just use the default logging level of warn.
To examine the logs for running pipeline jobs, click on View Jobs for a Pipeline Version to display the jobs history.
Then, click on the Logging URL button for the job you wish to troubleshoot. This will open the Splunk dashboard where the logs for the selected Pipeline Version can be viewed.
Different levels of logging are available for different purposes. HERE platform pipelines support the following levels of logging:
Debug — Includes fine-grained informational events that are most useful to troubleshoot a pipeline.
Info — Includes informational messages that highlight the progress of the pipeline at a coarse-grained level.
Warn — Includes information on potentially harmful situations; including other run-time situations that are undesirable or unexpected, but not necessarily "wrong". This is the default logging level.
Error — Includes other run-time errors or unexpected conditions such as error events that might still allow the pipeline to continue running.
Setting the logging level from the platform portal can be done from the Pipeline Version Details page. An example is shown in Figure 5.
Figure 11. Change Pipeline Version logging level
The Logging Configuration panel is outlined in red here. The current logging level for this pipeline is display. To change the level, click the Edit button. This displays the dialog box shown in Figure 6.
Figure 12. Edit logging level dialog box
Info: Loggers and Levels
A Logging Level is set for a specific Pipeline Version and all of the Jobs it executes by a Logger. The default logger is set at the root level for the entire pipeline. But a logger can also be set for a specific pipeline class. And, because you can have multiple loggers, it is possible to set different loggers to different logging levels. This allows monitoring different parts of the executing pipeline code at different logging levels, if set up correctly.
To change the root logging level, use the drop-down list at the top of the dialog box to select the new logging level. Additional loggers can be added or deleted using the controls shown in Figure 7. To change the logging level of one of these loggers, click the indicated control and select the new level from the drop-down list.
Figure 13. Control identification
If adding a new logger, the dialog box will change to provide a place to enter the information for the new logger as shown in Figure 8. The logger name is normally the class name in the pipeline code to which it should be linked. The logging level can be set as needed and does not have to match the root logging level.
Figure 14. Add a new logger
Click Add to close the add function. Then, click Saveto save the addition.
When adding a new logger, if you chose a logger that already exists, you will get an error message like that shown in Figure 9.
Figure 15. Error: Logger already exists
Figure 10 shows the results of adding a new logger and how it is displayed on the Pipeline Version Detail page.
Figure 16. Added logger displayed
Caution
If you create a logger that cannot be linked to a class in the pipeline code, there will be no logging entries from that logger.