Unstructured data
What is unstructured data?

Unstructured data, in the context of data storage, refers to information that varies widely in terms of format and content. It includes file and object data and has a major role in artificial intelligence (AI). This type of data does not fit neatly into predefined data models which makes storing, retrieving, and analyzing it challenging. Unstructured data is often qualitative and comes in various formats, such as emails, social media posts, articles, photos, graphics, recordings, podcasts, movies, logs, and Internet-of-Things (IoT) streams and is often stored in its native format. It accounts for a significant portion of the world's data and requires advanced tools like natural language processing (NLP), image recognition, and AI-driven analytics to derive meaningful insights.

Portrait of a road engineer.
  • What are examples of unstructured data?
  • What are AI opportunities for unstructured data?
  • How can HPE help with unstructured data?
What are examples of unstructured data?

What are examples of unstructured data?

Information without a format or organizational framework is difficult to store and handle in typical databases. This diversified data comes from many sources and forms:

Emails, social media posts, blog articles, customer reviews, chat logs, PDFs, and Word and Excel files: This data can reveal useful insights but requires NLP technologies to examine.

Multimedia: Photos, YouTube videos, podcasts, and voice recordings are included. Image recognition, video analysis, and speech-to-text transcription increasingly use these formats.

Sensors and IoT devices: Fitness trackers, smart home temperature and activity sensors, and industrial equipment machine records are examples. This data typically needs real-time processing and complex analytics.

Internet: HTML pages, clickstream navigation patterns, and web scraping data are included. Use these sources to monitor user behavior, optimize websites, or gain market insight.

Contact center transcripts, open-ended survey replies, and legal filings: These data are crucial for customer service, market research, and legal analytics, but they need complex algorithms to analyze.

Unstructured data vs. structured data

Features
Unstructured data
Structured data

Format

Lacks a predefined format or organizational structure

Organized into a predefined schema (e.g., rows and columns in a database)

Storage

Stored in data lakes, NoSQL databases, or file or object storage systems

Stored in relational databases (e.g., SQL)

Examples

- Social media posts

- Images, videos, audio files

- Email content

- Customer data (name, age, email) in a CRM

- Inventory data in Excel

Querying

Requires a file or object storage system and specialized tools such as AI, NLP, or machine learning for analysis

Easy to query using SQL or similar tools

Volume

Usually larger in size and growing rapidly in number of files and/or objects

Typically smaller and more manageable

Analysis

Requires advanced analytics techniques, including AI and machine learning

Straightforward to analyze with conventional BI tools

Applications

Sentiment analysis, image recognition, video analytics, trend forecasting

Financial reporting, inventory management, operational databases

Flexibility

Highly flexible: Can handle diverse and evolving data formats

Inflexible: Schema changes require significant adjustments

Data sources

Social media platforms, IoT devices, emails, multimedia content

Transactional systems, structured surveys

Related topics

Structured data

Learn more

AI Data Management

Learn more

Data Protection as a Service (DPaaS)

Learn more

Object Storage

Learn more