The volume of data that businesses can accumulate today is staggering. The information can provide a level of business intelligence that was only dreamed of two or three decades ago. However, a traditional data warehouse can only provide hindsight — what happened a few hours ago, yesterday or (in some cases) last week. In a competitive economy, businesses need access to data as quickly as possible, and mastering real-time data is one aspect of making technology a strength rather than a weakness. Many enterprises have already begun moving to real-time data aggregation to improve the speed with which data becomes available — but the process is not without challenges.
- Real-time ETL: Even if you run a nightly batch update, the “extract, transform and load” operation is one of the most challenging aspects of managing a data warehouse. Often, the data warehouse must deny users access while the operation is underway, which is not an acceptable practice if you decide to use real-time ETL as data will be loading more or less constantly. There are solutions, such as opting for ETL in “near real-time,” using an external data cache for real-time data or using a direct-trickled feed or a “trickle and flip” approach.
- Synchronizing Fact Tables: Introducing real-time information into a warehouse can lead to inaccuracies in your fact tables, as there will frequently be a lack of synchronization between metrics. For example, suppose you need sales data broken down on one table by product category and on another by sales territory. In the few seconds that elapse while the query by category is running, additional sales can occur. The totals on the two reports will not match. Discrepancies can normally be avoided by applying either the “trickle and flip” or direct trickle feed approach. Another solution is to use separate warehouse fact tables for real-time data. Using an external data cache for real-time data is also a possibility.
- Scalability and Query Contention: Data warehouses were always designed to be separate from transactional systems because of the issues that are inherent in the concept. Even when the data is fixed, the complexity of queries and number of simultaneous users seeking to access the data could slow processing speeds dramatically. With the addition of real-time loading, the strain on the system is multiplied. The contention between continuous loading and complex query statements can limit scalability. Real-time loading may become bottle-necked or queries may require an unacceptable amount of time to execute. Limiting real-time reporting, upgrading the database, using a separate data cache or applying and managing a “just in time” data merge are possible ways to address the issue.
- Alerting Applications: Alerting applications are designed to be used according to a fixed schedule or triggered by an event. Many third-party alerting applications were not engineered to work in real-time. The most common use for these applications was to trigger an email alert immediately following a nightly batch update. One issue that can arise with real-time data is the transmission of multiple alerts for a single event. Fortunately, there are possible workarounds, such as setting the cycle threshold to the minimum increment possible. Unfortunately, these solutions are often relatively complex and normally require careful management.
Businesses are becoming increasingly aware of the benefits of real-time data aggregation. Today’s technology makes two statements related to real-time aggregation true:
- The process of implementing and managing real-time data aggregation is a challenging issue.
- Current technology can overcome the challenges when wielded by experienced, knowledgeable developers.
EX Squared has the expertise needed to implement your real-time data aggregation project. If you would like to learn more about our approach, philosophy and qualifications or discuss your development needs, feel free to contact us today.