We discussed monitoring cloud-native platforms and the need for introducing it earlier in the development lifecycle. The monitoring platform space has been dominated by the likes of public companies such as Datadog and New Relic etc. Prometheus, however, has been an exciting entrant into this space since it was developed at SoundCloud in 2012. Closed-source observability platforms, on the other hand, are proprietary and typically require a license fee to use. However, from an executive standpoint, let us enumerate the challenges of deploying open-source monitoring in a large enterprise.
Open-source observability platforms like Prometheus offer powerful monitoring and alerting capabilities. When deployed at scale, it does introduce several pain points, which vendors such as AWS (https://aws.amazon.com/prometheus/) and Grafana Labs (https://grafana.com/) are trying to overcome via managed offerings.
What are the common pitfalls in setting up and managing open source monitoring platforms –
- Complexity and learning curve: While an open-source platform like Prometheus can be deployed at no cost or at a much lower cost when compared with the proprietary vendors, these platforms are complex to set up and manage, particularly for those who are new to the technology. Users must invest time and effort to learn the platform’s architecture, query language (PromQL), and best practices for configuration and alerting.
- Limited out-of-the-box functionality: While Prometheus offers extensive capabilities, it may lack some of the features found in proprietary platforms in terms of dashboards and usability. Users may need to develop custom solutions or integrate additional open-source tools (e.g., Grafana for visualization) to achieve a comprehensive monitoring and alerting solution.
- Scalability challenges: A platform like Prometheus is designed for single-node deployments, and scaling it horizontally can be complex. This may require sharding, federation, or using additional open-source projects like Thanos or Cortex to achieve the desired level of scalability, which can add complexity to the deployment as well as the people and management overhead in managing globally distributed deployments.
- Integration and compatibility issues: A key advantage of any open source platform is the fact that the source code is fully available and free however, integrating Prometheus with other open-source tools or existing monitoring systems can be challenging as opposed to using a proprietary platform where these are typically out of the box. Users may need to develop custom exporters or adapt existing ones to ensure compatibility with their existing infrastructure.
- Maintenance burden: Open-source platforms generally lack the support and maintenance services provided by commercial vendors. This means that users must be prepared to handle updates, patches, and troubleshooting independently or rely on community support, which may not always be timely or sufficient. Vendors such as AWS and Grafana build their open-source monitoring platform around ameliorating this aspect of the platform.
- Limited support for long-term storage: By default, Prometheus has limited support for long-term storage of historical data. Users must implement external storage solutions like Thanos or Cortex to store data beyond the default retention period, which can add complexity to the deployment.
- Alert fatigue: As with other observability platforms, users of Prometheus may experience alert fatigue due to a high number of generated alerts. The platform should be specifically architected to handle the high cardinality monitoring data with a large volume of tags and dimensions that is generated by container-based applications Proper configuration of alert rules and thresholds is essential to minimize false positives and focus on the most critical issues.
- New security features: Prometheus as a platform and the open-source project did not focus on including built-in security features like authentication, encryption, or access control. While this is changing via TLS and basic auth support which were introduced a few months ago, enterprise users must implement security measures themselves, which can be challenging and time-consuming.
While none of the above points should cause a rethink of open source monitoring, users should invest in proper training, carefully plan their deployment, actively participate in the community, and consider adopting complementary tools to create a more robust and scalable observability solution.
Featured Image by djedj from Pixabay