The Data Security Duo: Data Encryption and Vulnerability Scans
By Lulu Cheng on 2024-07-28
Industry
Analysis

With the rise of tools such as Wiz and Dig, data vulnerability scans have become more accessible than ever, allowing companies to quickly identify and address potential security issues. However, identifying vulnerabilities is just the first step—addressing them effectively requires robust solutions.

While scans can pinpoint weaknesses, encryption ultimately protects sensitive information. The combination empowers teams to effectively safeguard data, ensuring superior protection and a safer data infrastructure.

Vulnerability scans have long been a staple in software development. Developers have used tools like Sonar to ensure their builds don't have dependencies with reported CVEs (Common Vulnerabilities and Exposures). This process ensures developers update dependencies over time, so their software is not vulnerable to known issues. Over the years, this concept of vulnerability scanning has extended from code to infrastructure and, most recently, to data, and soon, AI/ML.

Early cloud adoption saw many instances of misconfigured S3 buckets accessible to the public, along with other misconfigurations in databases (Facebook database exposed, Elasticsearch server exposes personal data), Kubernetes (Tesla hack via Kubernetes) and so on. While Infrastructure as Code (IaC) tools like Terraform have greatly simplified infrastructure deployment and management, ensuring secure configurations remains complex for many app developers. This complexity has led to centralized abstractions in many companies, where platform teams handle the intricate security configurations, allowing app developers to focus on building features without worrying about permissions, networking, secret rotations, encryption and so on.

However, abstraction is only possible when implementations are homogeneous or generic enough across application or product teams. Data infrastructure is often an exception due to the increasing number of use cases, vendors, implementation patterns, integrations, data sources, etc. AI/ML pipelines, while also complex, typically focus on the variation of training iterations and model and endpoint publishing, which is slightly more streamlined compared to the myriad data sources and processing involved in general data infrastructure. While platforms like Wiz and Dig enable data infrastructure and platform teams to gain visibility over exposed or vulnerable data storage, without proper tools and abstractions, both app developers and data infrastructure teams remain stuck in a reactive mode, unable to fundamentally address data security issues or identify undesirable implementations.

Data Vulnerability Scans: Knowing the Problem is Only Half the Battle

For package dependencies, addressing vulnerabilities has become more manageable with the increasing adoption of microservices. Fixing individual package dependencies in a microservice is much easier than in a monolithic codebase. Additionally, extensive unit and integration tests also help to reduce the risk associated with dependency upgrades.

In cloud infrastructure, addressing vulnerabilities is slightly more complex. App developers are often outside their comfort zone when working with infrastructure, but platform engineers are well-versed in this area. Centralized and abstracted cloud infrastructure tech stack and deployments give platform and infrastructure teams significant authority to enforce security best practices.

When it comes to data, the situation becomes more nuanced. Data usage patterns are highly custom, with diverse requirements ranging from real-time to batch processing and varying performance needs. This diversity makes it difficult for data infrastructure teams to implement generic, yet flexible, security and governance abstractions.

Three Approaches to Data Security Abstraction

When abstractions are challenging, simplifying basic security implementations becomes essential. This is where data encryption shines, providing a robust solution to data security that vulnerability scans alone cannot achieve. Here are the three main approaches to implementing data encryption:

  1. Infrastructure Abstraction: Modern data infrastructure typically at the very minimal implements at-rest and in-transit encryption, along with permissions control, audit logs, and other security measures. However, this is often insufficient for data infrastructure, as data continues to be stored in its raw format in data lake storage like Snowflake or S3. If any security control fails at the data store level, such as excessive admin access permissions or internal threats, infrastructure security alone cannot address the issue. For example, Snowflake recently had a massive data breach due to lack of MFA.
  2. Sidecar/Proxy Approach: Passing sensitive data through a proxy for additional encryption processing is another approach. While this can be effective, deploying sidecars or proxies can be challenging depending on the infrastructure setup. Additionally, data security often needs to be schema-aware, making it difficult for sidecar or proxy layers to handle without additional client-side implementation. Despite these challenges, this approach is framework and client-agnostic, making it easier to implement across diverse data ecosystems. Examples of such offerings include Conduktor.
  3. Client-Side Implementation: This approach requires more effort for adoption but is the most flexible and simplest in implementation. By equipping developers with the tools they need for data encryption, they can take ownership of security implementation. Providing common tooling allows small platform teams to support a high number of application or product engineering teams. This is similar to giving application developers proper tools such as Sonar so that they can more effectively address security and vulnerability issues on their own.

Data Security Abstraction Comparison

ApproachDescriptionProsCons
Infrastructure AbstractionEncryption by default at infrastructure level.Comprehensive default encryption, centralized control.Raw data may still be vulnerable, not sufficient for all needs.
Sidecar/Proxy ApproachPassing data through a proxy for processing.Framework and client-agnostic, easier to implement.Challenging deployment, lacks schema awareness.
Client-Side ImplementationDevelopers use provided tools to encrypt data on the client side.Developers take more ownership with less platform engineering support.More upfront work, requires comprehensive toolsets.

Improving Data Security Through Encryption

At Jarrid, we aim to create data security standards that are easy to understand and implement for popular data handling frameworks. We believe creating better security tooling and embedding data security implementation into common languages and frameworks will have a long-lasting impact. For example, most API frameworks now include TLS middleware by default, making adoption much easier for most backend developers.

Our first step in this direction is Keyper, a suite of crypto key management APIs designed to simplify key creation, management, deployment, and encryption/decryption. Keyper integrates with cloud KMS services like AWS KMS and GCP KMS. We believe in making security simple and accessible, allowing developers to focus on building great products without compromising on data security.

Our roadmap includes adding advanced cryptographic capabilities and making these tools attractive to developers by reducing the friction for adoption. By incorporating the latest features in encryption, we enable companies to adopt the highest standards of data security while continuing to unlock the value in their data.

Trying to improve data security implementation to your internal data infrastructure? We'd love to help. Talk to us now.

Let's Talk

Conclusion

In summary, while vulnerability scans are necessary, data encryption provides the right tools for infrastructure and data teams to address these vulnerabilities effectively. By making encryption accessible and easy to implement, we can fundamentally solve data security challenges and create a safer, more secure digital landscape.

Combining data encryption with vulnerability scans offers a comprehensive approach to data security. Scans identify potential weaknesses, while encryption ensures that sensitive information remains protected even if vulnerabilities are exploited. The combination of both approaches empowers teams to proactively safeguard data, enhancing overall security and fostering trust in digital systems.