
Unstructured data What is unstructured data?
Unstructured data, in the context of data storage, refers to information that varies widely in terms of format and content. It includes file and object data and has a major role in artificial intelligence (AI). This type of data does not fit neatly into predefined data models which makes storing, retrieving, and analyzing it challenging. Unstructured data is often qualitative and comes in various formats, such as emails, social media posts, articles, photos, graphics, recordings, podcasts, movies, logs, and Internet-of-Things (IoT) streams and is often stored in its native format. It accounts for a significant portion of the world's data and requires advanced tools like natural language processing (NLP), image recognition, and AI-driven analytics to derive meaningful insights.

- What are examples of unstructured data?
- What are AI opportunities for unstructured data?
- How can HPE help with unstructured data?
What are examples of unstructured data?
Information without a format or organizational framework is difficult to store and handle in typical databases. This diversified data comes from many sources and forms:
Emails, social media posts, blog articles, customer reviews, chat logs, PDFs, and Word and Excel files: This data can reveal useful insights but requires NLP technologies to examine.
Multimedia: Photos, YouTube videos, podcasts, and voice recordings are included. Image recognition, video analysis, and speech-to-text transcription increasingly use these formats.
Sensors and IoT devices: Fitness trackers, smart home temperature and activity sensors, and industrial equipment machine records are examples. This data typically needs real-time processing and complex analytics.
Internet: HTML pages, clickstream navigation patterns, and web scraping data are included. Use these sources to monitor user behavior, optimize websites, or gain market insight.
Contact center transcripts, open-ended survey replies, and legal filings: These data are crucial for customer service, market research, and legal analytics, but they need complex algorithms to analyze.
Unstructured data vs. structured data
Features | Unstructured data | Structured data |
---|---|---|
Format | Lacks a predefined format or organizational structure | Organized into a predefined schema (e.g., rows and columns in a database) |
Storage | Stored in data lakes, NoSQL databases, or file or object storage systems | Stored in relational databases (e.g., SQL) |
Examples | - Social media posts - Images, videos, audio files - Email content | - Customer data (name, age, email) in a CRM - Inventory data in Excel |
Querying | Requires a file or object storage system and specialized tools such as AI, NLP, or machine learning for analysis | Easy to query using SQL or similar tools |
Volume | Usually larger in size and growing rapidly in number of files and/or objects | Typically smaller and more manageable |
Analysis | Requires advanced analytics techniques, including AI and machine learning | Straightforward to analyze with conventional BI tools |
Applications | Sentiment analysis, image recognition, video analytics, trend forecasting | Financial reporting, inventory management, operational databases |
Flexibility | Highly flexible: Can handle diverse and evolving data formats | Inflexible: Schema changes require significant adjustments |
Data sources | Social media platforms, IoT devices, emails, multimedia content | Transactional systems, structured surveys |