Cribl puts your IT and Security data at the center of your data management strategy and provides a one-stop shop for analyzing, collecting, processing, and routing it all at any scale. Try the Cribl suite of products and start building your data engine today!
Learn more ›Evolving demands placed on IT and Security teams are driving a new architecture for how observability data is captured, curated, and queried. This new architecture provides flexibility and control while managing the costs of increasing data volumes.
Read white paper ›Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
Learn more ›Cribl Edge provides an intelligent, highly scalable edge-based data collection system for logs, metrics, and application data.
Learn more ›Cribl Search turns the traditional search process on its head, allowing users to search data in place without having to collect/store first.
Learn more ›Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise needed. Leverage open formats, unified security with rich access controls, and central access to all IT and security data.
Learn more ›The Cribl.Cloud platform gets you up and running fast without the hassle of running infrastructure.
Learn more ›Cribl.Cloud Solution Brief
The fastest and easiest way to realize the value of an observability ecosystem.
Read Solution Brief ›Cribl Copilot gets your deployments up and running in minutes, not weeks or months.
Learn more ›AppScope gives operators the visibility they need into application behavior, metrics and events with no configuration and no agent required.
Learn more ›Explore Cribl’s Solutions by Use Cases:
Explore Cribl’s Solutions by Integrations:
Explore Cribl’s Solutions by Industry:
September 25 | 10am PT / 1pm ET
Hold my beer: lessons from one team’s data pipeline journey
Register ›Try Your Own Cribl Sandbox
Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›Get inspired by how our customers are innovating IT, security and observability. They inspire us daily!
Read Customer Stories ›Sally Beauty Holdings
Sally Beauty Swaps LogStash and Syslog-ng with Cribl.Cloud for a Resilient Security and Observability Pipeline
Read Case Study ›Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›Transform data management with Cribl, the Data Engine for IT and Security
Learn More ›Cribl Corporate Overview
Cribl makes open observability a reality, giving you the freedom and flexibility to make choices instead of compromises.
Get the Guide ›Stay up to date on all things Cribl and observability.
Visit the Newsroom ›Cribl’s leadership team has built and launched category-defining products for some of the most innovative companies in the technology sector, and is supported by the world’s most elite investors.
Meet our Leaders ›Join the Cribl herd! The smartest, funniest, most passionate goats you’ll ever meet.
Learn More ›Whether you’re just getting started or scaling up, the Cribl for Startups program gives you the tools and resources your company needs to be successful at every stage.
Learn More ›Want to learn more about Cribl from our sales experts? Send us your contact information and we’ll be in touch.
Talk to an Expert ›My story is not unique. Years ago, I saw the benefit of capturing system logs across all of my systems and being able to search through them. It started out pretty inexpensive: a few thousand dollars for a reasonable Splunk license, and a box to run it on. I was using the data to troubleshoot problems and maybe see some trends, so I had a fairly easy time balancing performance and retention – I didn’t need to retain most of my data for very long. The system got popular, and I needed to ingest more and more data. Finally, the security and compliance teams caught on, and suddenly I had to retain all of this data.
My logging infrastructure costs started exploding, and my performance got steadily worse – I went from retaining a month of data to retaining 13 months. Moreover, I was now ingesting a lot of data that was never used in the analysis, simply to get it retained. That felt like a waste of my available capacity. But why was I retaining all of that data in my analysis system? Because I really didn’t have another choice at the time. As the old saying goes, when all you have is a hammer, everything looks like a nail. Log Analytics was my hammer, and every requirement looked like a nail.
Over the last 10 years, with the move to cloud services, such as AWS, the cost of archival storage has been racing to the bottom, and now with services like Glacier Deep Archive, the cost of storing a Terabyte of data is less than $1/month. It’s time to separate retention from analysis and serve both needs better. Analysis and Troubleshooting of system problems benefit greatly from the fast response you get from not having “extra” data in the system. Retention benefits from having a well organized archival capability, but the data does not have to be at your fingertips. In all of my years in IT, I have *never* seen a request for year old log data that had to be available *right now*…
By inserting some sort of routing mechanism into the environment, as seen on the right, instead of feeding all of the data that has to be retained through the log analytics system, the entire “firehose” worth of logs can be retained by sending it to low-cost archival storage, like Amazon S3, Glacier/DeepArchive or Azure Block Blobs. This enables filtering the data for the log analytics system to only the data needed for analysis. If that routing system has the capability to transform, enrich and/or aggregate the data, now simple metrics about the data can be fed to the /blog/the-observability-pipeline/data lake for use in normal business reporting. This separates out the data retention requirement from all of our analysis needs.
Learn how to cut costs with Cribl LogStream.
Let’s take an example – say we’ve got an environment that is ingesting 2TB a day, and getting a roughly 2:1 compression ratio in storing that data, leading to about 1TB/day of storage consumption. Add to that a compliance requirement of 18 months of retention. Averaging 30 days in a month, that comes out to about 540TB worth of storage. In the table below, there are 4 scenarios for retention management, and you can see the drastic difference this can make in cost of infrastructure: An aggressive move from managing all of the data in General Purpose (GP) Elastic Block Store (EBS) volumes to one that keeps a minimum (30 days) in EBS, while moving the rest to Archival storage (with automated lifecycle management dealing with the migration between the “tiers”) reduces storage costs from $54K/month to $4.3K/month – that’s a net savings of almost $600K/year, and this is only looking at the storage costs, not taking into account compute resources, software licensing, etc. While you could go even more aggressive and go directly to Deep Archive for all retention, this scenario balances retention cost with the likelihood that more recent data is more likely to need to be retrieved – Deep Archive’s retrieval time is considerably longer than S3…
EBS | Archival Storage | |||||
---|---|---|---|---|---|---|
General Purpose | Hard Drive | S3 | Glacier | Deep Archive | ||
List Cost per GB/Month | $0.10 | $0.05 | $0.022 | $0.004 | $0.00099 | Monthly Cost |
Retain all in EBS GP Volumes | 540 | 0 | 0 | 0 | 0 | $54,000.00 |
Retain 90 days in EBS GP Volumes, 13 months in EBS HDD volumes. | 90 | 450 | 0 | 0 | 0 | $29,250.00 |
Retain 30 days in EBS GP Volumes, Remainder in S3 | 30 | 0 | 540 | 0 | 0 | $14,880.00 |
Retain 30 days in EBS GP Volumes, 30 days in S3, 60 days in Glacier and the remainder in Deep Archive | 30 | 0 | 30 | 90 | 420 | $4,435.80 |
The example is based on cloud services, but a similar approach can be followed in an on-premises data center environment. The economics are a bit harder to model, due to differences in hardware and operating costs, but it is possible using standard storage options.
The simple act of eliminating a massive amount of the data in a given analysis system will likely yield immediate, measurable improvement in the performance of queries, especially any queries that are not properly constrained by time ranges (we all have users who do this and then don’t understand why their queries take forever). With a much smaller footprint to work with, one can optimize the storage for its purpose. For example, moving to provisioned IOPS EBS volumes (the cost addition is far less prohibitive on 30 TB than it is on 540 TB).
Developers are incentivized to put every field in the log entry – it’s far more expensive to have to go back and add logging statements than to include it initially. As a result, a lot of the log entries that applications generate have empty fields or extraneous data. Without the need to retain the data in its original form, we can easily remove fields we don’t need, drop empty fields, and even aggregate repeated log entries (great examples are port status lines on switches or Windows Event Log login notifications). Cleaner data leads to quicker resolution and cleaner metrics.
At the heart of this approach is by building what we at Cribl call an Observability Pipeline. This can be built from open source components (see Clint Sharp’s post on this topic for details), but we believe that our product, Cribl LogStream, is the best way to do this – it can be a drop-in replacement for components in your logging environment like Splunk Heavy Forwarders or LogStash instances, and configuring it to do exactly this just takes a few clicks of the mouse.
The fastest way to get started with Cribl LogStream is to sign-up at Cribl.Cloud. You can process up to 1 TB of throughput per day at no cost. Sign-up and start using LogStream within a few minutes.
Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.
Classic choice. Sadly, our website is designed for all modern supported browsers like Edge, Chrome, Firefox, and Safari
Got one of those handy?