Data Processing
Data processing is the method of converting raw data into meaningful insights. The processing method determines how businesses handle and analyse data from files, databases, APIs, and streaming applications to extract data. The three primary types of data processing are real-time, batch, and hybrid processing, each attending to different business requirements.

Realtime
Real-time data processing handles data as soon as it is received, allowing businesses to act immediately. Real-time data processing is essential in situations where delays could lead to missed opportunities or risks. IoT devices, like fitness trackers or smart thermostats, also rely on real-time processing to provide instant updates to users. Businesses typically use APIs, IoT devices, and system logs as sources for real-time data.
Batch data processing involves collecting data over a period and processing it in large chunks. This method is more efficient for handling high volumes of data that do not require immediate action. Batch processing typically works with files, databases, and documents, where data can be stored and processed later in bulk.
Hybrid
Hybrid data processing combines the strengths of real-time and batch processing, giving businesses flexibility to handle data based on its urgency and scale. Hybrid processing is often used across diverse data sources, including APIs, databases, and CRM systems, allowing businesses to balance speed and efficiency effectively.
| Characteristic | Real-Time Processing | Batch Processing | Hybrid Processing |
|---|---|---|---|
| Definition | Processes data instantly as it is generated. | Collects and processes data in large chunks at scheduled intervals. | Combines real-time and batch processing for flexibility. |
| Speed | Immediate, near-instant results. | Delayed; depends on batch schedules. | Real-time for urgent tasks, delayed for batch processes. |
| Data Volume | Handles smaller, continuous data streams. | Processes large volumes of accumulated data. | Balances both small and large datasets. |
| Use Cases | Fraud detection, stock updates, IoT device monitoring. | Payroll, sales analysis, marketing reports. | Patient monitoring and historical analysis, e-commerce trends. |
| Resource Requirements | High computing power for real-time data streams. | Less intensive; suitable for periodic processing. | Moderate; balances resources between real-time and batch. |
| Complexity | Requires advanced systems for streaming and processing. | Simple; uses scheduled jobs or workflows. | Moderate; integrates both real-time and batch systems. |
| Examples of Data Sources | APIs, IoT devices, system logs. | Files, databases, documents. | APIs for real-time tasks, databases for batch processing. |
| Best for | Time-sensitive tasks requiring immediate action. | Tasks with no immediate urgency. | Complex operations needing both instant action and periodic insights. |
| Cost | Higher due to infrastructure for continuous processing. | Lower; efficient for periodic tasks. | Moderate; balances cost across both methods. |
Now that we have explored the different types of Datasources, the next step is understanding how businesses organize and use this data effectively. In the next section, we will explore data normal forms, its levels, and how it impacts the design and performance of a data warehouse.