Data Warehouse
In order to get the clearest picture possible, our applications are built to report a vast array of happenings within the client side product, from simple events such as tracking when the user opened the product to more complex sequences that track user patterns over the course of several sessions. However, with millions of events logged each week per product, the amount of data and the ability to mine it can quickly grow out of control. Enter the Data Warehouse. With it’s powerful, cross-database analysis functionality and automated generation engine, it can carry the bulk of the information load without burdening the production databases.
Features and Benefits
- Automated data processing
- The hands-off processing engine runs in a daily capacity to collect and store queued events from the main production server, keeping warehouse data fresh and accurate.
- Production database load assistance
- Combing through hundreds of millions of records can be a server-intensive process, and off-loading the event analysis and storage to the Data Warehouse takes precious CPU cycles from the production server, leaving the production CPU cycles for client/server activity.
- Data integrity validation
- With its advanced validation functionality, all incoming data is analyzed and checked to ensure integrity, and ensuring the accuracy of reports subsequently disseminated to content owners and marketers.
- Cross database aggregation
- To best understand user patterns and behavior, it is often helpful to analyze activity across multiple products, and the Data Warehouse’s cross database aggregation capabilities make this possible. In addition, marketers can sponsor segments in different products and receive cumulative results of their potential audience's activity.
- Efficient storage techniques
- With hundreds of millions of data entry points, the volume of data could quickly bog down the analysis functionality, but with scheduled data recalculation, segmented read-only partitions allow for increased parallelism, and therefore faster reporting and analysis operations.
- Event sampling
- To further increase efficiency and capacity, the Data Warehouse employs a sampling technique used to group like events into sampled hourly snapshots alongside extended daily queues, reducing the volume of events transmitted (and therefore network and server resources) while still maintaining the integrity of the reported data.
- Production database maintenance
- As events are collected, recorded, and analyzed, the Data Warehouse optionally can purge previously processed events from the production servers, freeing up gigabytes of space and ensuring that the production databases stay lean and efficient.
- Full data redundancy
- By employing both full database mirroring as well as weekly offsite database backups, the data stays safe and quickly re-deployable in the event of corruption or disaster recovery.