Here follows some sample SRE dashboards in Grafana.
http://demo.robustperception.io:9090/consoles/index.html
https://www.atlassian.com/br/incident-management/devops/sre
https://landing.google.com/sre/workbook/chapters/slo-engineering-case-studies/#the-valet-dashboard
What does the ideal SRE dashboard look like? Make sure it has these KPIs:
SLO violation duration graph, response time (99th percentile) and load for your critical API calls
-
Error rate
-
Database response time
-
End-user response time (99th percentile)
-
Requests per minute
-
Availability
-
Session duration
https://www.appdynamics.com/blog/product/software-reliability-metrics/