Explore the basics of observability with Ray and Anyscale in this foundations course.
Learn the fundamentals of observability—metrics, logs, and traces—and how they help you monitor and debug distributed systems. Then set up local Ray observability by installing Ray, launching Prometheus and Grafana, starting a two-node Ray cluster, and using the Ray Dashboard to verify and explore collected metrics.
Learn the core observability concepts in Ray—logs, metrics, and events—and how to use the Ray Dashboard to monitor application and cluster behavior. Then compare Ray’s native tooling with Anyscale’s managed, contextualized observability (persistent metrics, workload context, and post-failure visibility) through an example job that triggers memory pressure and OOM failures.
Learn how to monitor and debug Ray workloads using Ray and Anyscale observability dashboards, with hands-on examples that show when to use each view. You’ll explore Ray Data pipeline execution status, logs, and metrics (and Anyscale-specific workload dashboards) to identify bottlenecks and operational issues.