At AISPEECH, we provide conversational AI services and natural language interaction solutions for a wide range of entities. This post will discuss the data tools and techniques used to power our AI services. We use Apache Hive + Apache Kylin to build our offline data warehouse, and Apache Spark + MySQL as a real-time analytic data warehouse. We have three data sources: Kafka via MQTT/HTTP protocol, business database Binlog, and Filebeat log collection. The data is diverted to two links: real-time and offline. Real-time data is cached by Kafka and computed by Spark, then put into MySQL for further analysis. Offline data is processed by Apache Hive and Apache Kylin, then stored in the offline data warehouse.