Cloud Data Platform

What is a cloud data platform?

A cloud data platform is a data centre that is located on the cloud, including a server and data storage. It provides virtualised access to data from multiple sources in multiple locations.

What does a cloud data platform do?

One step in an organisation’s digital transformation involves migrating its data ecosystem and enterprise data from its traditional on-prem data centres or warehouses to the cloud. These resources are relocated to a cloud data platform, allowing enterprises to create a data lake that can be accessed anywhere, at any time. With this “democratised” data, both structured and unstructured data can be rapidly ingested to empower analytics. The platform can also scale quickly as data and analytics needs change.

Why do enterprises use a cloud data platform?

By using a cloud data platform, enterprises gain an easier way to leverage their data. It allows the data to be managed, secured and viewed from any location, both remotely and on-prem. These virtual data platforms offer the reliability of an on-prem data warehouse with an affordability that physical hardware cannot match. Organisations use these platforms to gain a much more flexible data exchange, which then empowers more informed business decisions.

Cloud data platform elasticity

Cloud data platforms are far more elastic than their on-prem counterparts and provide an integrated view into the data hosted on the platform. These platforms enable full observability of everything running on them, including CPU and memory utilisation, as well as insights into what queries are running and how they can be optimised. 

Data is stored in clusters and by observing actual workload behaviour, an enterprise can grow or shrink a cluster to avoid having underutilised capacity.

Moving to a cloud data platform

CIOs often find it difficult to predict the peak usages of their enterprise, making it likely that they will overprovision their data warehouses to avoid performance problems. As a result, the case to modernise data resources and move them to a cloud data platform that can quickly scale seems obviously beneficial. 

However, many CIOs are slow to give up on more than six decades of running and maintaining their workloads on-prem. To stay on top of their data, enterprises need to do a cost-benefit analysis for a potential switch to a cloud data platform. Fundamentally, they need to decide if the cost of migration and new licenses outweighs the cost of overprovisioning and long-term operations.

What is the architecture of a cloud data platform?

A typical data platform is made up of several components that handle different aspects of data management. The architecture is layered into:

  • data lineage
  • date security and audit logging
  • metadata, business glossary, data catalogue and data search
  • storage and compute
  • Data governance
  • data quality and data trust.

The cloud itself allows users to decouple all components of data platforms, which helps enterprises scale applications and avoid getting locked into any vendor’s proprietary tools. And most cloud data platform providers separate compute and storage for better data control and agility. 

Data is first imported and then cleaned in data pipes. As for storage, cloud data platforms store data in two tiers: one for “hot” data and the other for “cold” data. The first tier is memory, where the data index and the most frequently accessed data are held. The second tier is local disk, or persistent disk (often a solid-state disk), which is typically basic cloud object storage. This tier usually delivers slower performance.

To store data, the cloud data platform first writes updates to the fastest in-memory tier and then copies out to the cloud object storage tier to help improve overall performance. The hot data tier pulls data up from the cold data tier when queried and looks at the data on a very deep, granular level, which eases the path toward business-critical insights.

WHAT ARE THE ADVANTAGES AND DISADVANTAGES OF CLOUD DATA PLATFORMS?

As workloads fluctuate and unstructured data volumes continue to rise, the pressure to modernise IT is accelerating. However, organisations need to carefully consider whether and how to incorporate cloud infrastructure, such as cloud data platforms, into their IT ecosystem. 

Advantages

  • Flexibility: As data and analytics needs evolve, cloud data platforms can scale capacity quickly and easily.
  • Visibility: Cloud data platforms rapidly ingest structured and unstructured data that empowers faster analytics.
  • Access: Moving resources to the cloud facilitates the creation of a data lake to democratise data and share it anywhere and at any time.
  • Right-sized costs: Rather than paying for an overprovisioned system, using a cloud data platform with its consumption-based model allows enterprises to pay only for what they use, as they use it.

Disadvantages

  • Utilisation: Data centre utilisation can quickly change from full capacity to two-thirds of utilisation as workloads are moved to the cloud. Dropping a single server-refresh cycle will create that scenario.
  • Complexity: Shifting workloads can increase the complexity of IT operations – decisions to ramp up/down are made on a case-by-case basis due to changes in business priorities or portfolio and workload shifts.
  • Increased compliance pressure: Data privacy and data residency regulations continue to evolve, making the need to move workloads changeable.

How are cloud data platforms used?

The elastic nature of cloud data platforms makes them an ideal tool for responding to changing workloads, business goals and markets. But how exactly do businesses use them? Read below for a few use cases:

  • Data consolidation: Rather than using multiple spreadsheets and other flat-file data sources, analysts use cloud data platforms to build a “data mart”. There, they can easily load and optimise data from multiple sources for analysis and actionable insights.
  • Operational insight: Data on a cloud data platform can be easily integrated with business-critical applications, offering a simple way for results to be operationalised and fed back into applications to enable data-driven decisions.
  • Versatile analysis: Data analysts all have their own favourite tools, particularly open-source tools, which can be incompatible with fixed data platforms. Cloud data platforms offer full interoperability, which enables subscribers to plug in their own tools and use them within the platform. This way, they can migrate insights to another tool if needed and prevent vendor lock-in.
  • Streaming data processing:  A cloud data platform combines the abilities of a data lake and a data warehouse to process streaming data and other unstructured enterprise data, enabling machine learning (ML).

HPE and cloud data platforms

Organisations face many challenges in managing their data – not just how to optimise data workloads on the cloud but also how to optimise them in hybrid environments that comprise edge, data centre, cloud and multicloud infrastructure. HPE offers an edge-to-cloud platform for users to run applications and services on-prem and in the cloud, along with services to manage the workload. For example, the growing portfolio of HPE GreenLake cloud services includes:

  • Analytics: Open and unified analytics cloud services to modernise all data and applications everywhere – on-prem, at the edge and in the cloud.
  • Data protection: Disaster recovery and backup cloud services to help customers tackle ransomware head-on and secure data from edge to cloud.
  • HPE Edge-to-Cloud Adoption Framework and automation tools: A comprehensive, proven set of methodologies, expertise and automation tools to accelerate and de-risk the path to a cloud experience everywhere.
  • HPE Ezmeral Data Fabric Object Store: A Kubernetes-based storage technology that will run across hybrid environments. It enables users to combine different types of data from files, object event streams and databases into the same data fabric.

HPE also recently introduced Ezmeral Unified Analytics, a cloud data lakehouse platform built with a group of open-source technologies that provide a data fabric for users to run data analytics and business intelligence workloads without being locked into any singular vendor’s technologies.