“Strategies depend on the particular projects and approaches undertaken by each company
“For big companies with huge amounts of data to work on, the cloud is not profitable from an economic point of view”
At the Teradata Universe event we had the opportunity to talk to Stephen Brost, Teradata CTO, along with several international press media. We talked about several current subjects about BigData and the world of data analytics. This is an ever-changing world, and companies must be ready to react. We also had the opportunity to know about the company’s position in terms of competitiveness and current technological trends.
SCASC.- Teradata focuses its business on appliances involving its Active Data Warehouse solutions. However, technologies such as Hadoop or cloud computing are becoming increasingly popular. What is the work approach of Teradata to ensure you keep at the top of your field in the next five years?
S.B.- We are currently offering cloud services in the USA, which are one of our research areas. Nevertheless, in our opinion cloud technologies are a good solution for small and medium enterprises (SME), but for big companies with huge amounts of data to work on, the cloud is not profitable from an economic point of view. Companies such as Netflix are an exception to the rule; in a sense, they are our bigger SME client in the cloud. Anyway, we do not disregard that strategy. When approaching cloud solutions, whether public or private, we use a baremetal strategy instead of a generic one; that means a server may one day be used for Teradata solutions, and work as a printing server some days later. Appliances are still very interesting and they are more cost-effective. In practice, there is nothing to prevent using other approaches in the application of Teradata solutions; for instance, Teradata solutions may be displayed on Amazon EC2 (Elastic Computing 2). But it should be noted that the virtualization layer costs are not as good as those for appliance-based systems.
SCASC.- From a CIO’s point of view, which strategies should be adopted right now to properly face such phenomena as the Internet of Things?
S.B.- The answer relies heavily on the specific industrial sector. Nowadays, everything is in the cloud and everything is included in the Internet of Things. Therefore, strategies depend on the particular projects and approaches undertaken by each company. From a CIO’s point of view, the question to be asked is the following: “which specific steps am I going to take in order to obtain value from the Internet of Things in the context of my own business? Let’s take the example of a car insurance company. IoT may be appealing to them in order to detect some driving habits that benefit clients, so that they may pay less for their insurance if they drive in a predictable and responsible way, or else the other way round. Systems to gather driving data using smartphones have already been spread out in the United Kingdom. Regardless of the results obtained by the insurance company, drivers tend to improve their driving habits when they know they are being watched. Another sector that could benefit from IoT is the Oil and Gas sector, for the analysis of prospection data in order to decide whether or not it is convenient to exploit a certain deposit. These are just two examples in an area with a huge number of applications, and progress is being made as required. Indeed, Teradata15 has already introduced compatibility with JSON, which makes it possible to specifically work with IoT scenarios using data that change fast and are not always homogeneous.
SCASC.- Teradata has a big client community: banks, telcos… Does it make sense to use Hadoop for small enterprises using commodity hardware?
S.B.- A small enterprise does not have the required resources to introduce a commodity solution. Working in Open Source is by no means free. Imagine you were given a puppy as a present: costs do not lie in the puppy itself, but in food, vaccination, time… The point from which a company may work with Teradata is reduced when it comes to the cloud, but creating a Teradata cloud service for each country is not an appealing option. It could be so if we were to create a European-wide cloud system; this would indeed be viable. Developing a cloud system for a single country would be worth in some cases, such as the United Kingdom and Germany, but not all countries are like these.
SCASC.- Teradata solutions are costly, but they work well and have been tested. Hadoop, on the other hand, is an open tool, but it is difficult to use and has not been totally validated. By putting these arguments in perspective, what do you think the situation will be in a few years?
This is not about being expensive or cheap: this is about TCV (Total Contract Value). In some cases Hadoop is more suitable than Teradata’s technology (voice, text, etc are not SQL-friendly and do not work well with relational systems). Anyway, Hadoop is already a part of Teradata’s unified solution. For relational data, Teradata systems may be configured with an economic dimension similar to that of Hadoop. Our Unified Data Architecture (UDA) involves choosing the right technology for the problem to be solved; we should not assume that a single technology may solve it all. This includes commercial solutions by Teradata, Hadoop (Open Source) and other technologies that are not in our possession. From a general point of view, the ecosystem lies on DataBase Computing, which contains different technologies. SAP BO (Business Objects) and SAP HANA are not ideal for Big Data analytics. They are great for Operational Data Stores (ODS) applications and for SAP BW, but it makes no sense to use Big Data Analytics in a system that keeps everything in memory. Ultimately, SAP with HANA are moving towards online transaction processing (OLP). Remarkably, in many cases SAP is directly displayed on Oracle, which is trying to get rid of SAP. Besides, IBM is an Oracle competitor in the data base field, and SAP is an Oracle competitor in the application field. Larry Elison (Oracle) was straightforward about his intentions: his aim is to get rid of SAP: no misunderstandings. Which means that Hasso Plattner is sleeping with (to say it so) with somebody holding a knife in his hand ready to kill him as soon as he falls asleep. IBM has good relational technology and is an Oracle competitor regarding data bases, and SAP competes in the application field. Thus, it would not be a wrong strategy to implement SAP solutions on IBM DB2. That is what I would do if I were the manager at SAP.
Rumour has it that a conversation about that has already taken place and no strategy was agreed upon; the fact is no agreement was reached. SAP acquired technology from Sybase, technology from a South Korean startup (Transact in Memory Inc), reaped the benefits from research done by a German university and funded by Hasso funds; all that became HANA. After all, what they should do is sever ties with Oracle. It is a matter of life and death for the company. Larry Ellison (Oracle) left no doubt about his intentions; Larry may be liked or disliked, but he is extremely good at what he does.
SCASC- Oracle is present in big companies. Is HANA an alternative to Oracle solutions?
S.B.- SAP is improving its design with HANA, evolving and solving problems. But this is a process that takes years to finish. SAP has always been straightforward in its promises, never using marketing arguments. But as of late it is somehow betraying its principles as an engineering-based company by trying to get more time.
SCASC.- Everybody is talking about Big Data and “In Memory”. Teradata has chosen a conservative approach here.
S.B.- Teradata is an engineering company. And good engineering is precisely our added value. When working in analytics, keeping all data in the memory is not a good strategy; from an economic point of view, it is irrational. OLTP applications use relatively small data bases, and keeping all data in the memory makes sense. However, in data analytics, data take less to grow than it takes memory to become cheaper. According to research, more than 90% of input/output operations exclusively depend on 20% of data, not on 100%. The challenge lies in the fact that this 20% keeps constantly changing, and software intelligence must be used here to tell apart the data that should be taken to RAM. These are the data that can and should be kept in the memory or in solid state storage: the rest are not needed. It would be ideal to keep only the right data in memory; putting it all in memory is just “easy engineering”. We use smart memory technologies in Teradata, which are required for real Enterprise Data Warehouse.
SCASC.- QueryGrid has just been introduced. What is its place in Teradata’s portfolio? Which clients use databases from Teradata and Aster, and how do you expect the adoption of such heterogeneous technological structures to unfold?
S.B.- The essential point in QueryGrid is that there is no technology that may solve all problems. We want to make it possible to transparently perform searches in different range settings in a UDA ecosystem. Some clients are already using Aster, Hadoop and Teradata in a beta phase. That business area is steadily growing since 2013. Before 2013, some extremely relevant Silicon Valley companies, as well as some “.com” companies were in it, but it is now reaching conventional businesses.
Traditionally, disruptive technologies are first adopted in Silicon Valley, then move to the East Coast, then the United Kingdom and finally reach the European continent. Nevertheless, the adoption of Big Data technologies is more aggressive in Germany than in the UK, for example. I have not yet found a plausible explanation for that pattern.
SCASC.- Back to Hadooop…
S.B.- Hadoop is a file system that was built for a specific purpose. Hadoop is extremely efficient for certain uses, such as voice to text conversion. Relational data bases are better than others. We are moving towards a future where several options may coexist. “And is better than or”.
SCASC.- What about parallelism and GP GPU?
S.B.- Some approaches already exist to manufacture tailor-made hardware to solve certain problems, but it is not a good idea to work on proprietary solutions instead of standard solutions. By the time a proprietary solution is working well, standard solutions have already reached a higher maturity level. I think that building super-computers such as Cray or Fujitsu on proprietary technologies would not be the best solution when compared to working with standard server farms, as is the case with Hadoop. As time goes by, tailor-made technologies will become standard and have a significant impact, but right now our systems are based on “classic” architectures such as x86.
SCASC.- Software Defined Architectures (SDA) are already on the map, but classic “appliances” are still being used. Will we reach convergence one day?
S.B.- There is much added value in hte integration of hardware and software in an “appliance” to be marketed in the optimal working conditions to our clients. It is possible to build a system using components obtained from different sources and install the software afterwards, but it does not make much sense. Several parameters, such as “firmware” or BIOS may be a hindrance when trying to display Teradata software on equipment that is not from Teradata itself. We chose appliances from the very beginning; it seems it is the right way to go. Perhaps hardware is cheaper, but Total Cost of Value is not better. There is no added value in it. Dell, an important client of Teradata, started a production team in the Massive Parallel Processing involving Teradata Aster, but to date they are the only manufacturer that managed to do it successfully. Even so, after a generation they reached the conclusion that buying the whole appliance from Teradata was the wisest thing to do. Clients who have been with us for a long time do not even think about what I mentioned as a real option. New clients are the ones trying to find out more about that solution, which is based on cheaper hardware on which the chosen software may be installed. But eventually the same conclusion is reached every time.
SCASC.- Judging by economic figures, it seems that investment on R+D by Teradata has been lower than that of last year. Is this correct?
S.B.- The amount of resources used to buy technology were lower.
SCASC.- Which are the technological challenges currently faced by Teradata?
S.B.- We made huge investments in the Multi Temperature Data Management area in order to offer high performance without increasing costs. A distinction between different kinds of data (hot or cold) should be made; the cost for each Terabyte should be in accordance with the value of this data and the right infrastructure should be used. Data grow exponentially, but we may not allow for an exponential increase in costs. We have been working on fields such as compression, as well as Hadoop, so that we may adapt to the economy of new kinds of data, such as video or voice, non-traditional data. The field of non-traditional data is growing more than any other one. Some clients in the fashion area use social networks such as YouTube in order to detect trends as soon as they come out. These data could be stored in a Teradata database, but that would make no sense. We should make a difference between High Value Density Data and Low Value Density Data. Trends appear in YouTube before they do so in shops.
SCASC.- What about the skills required to make the most of Teradata 15, with new functionalities in fields such as IoT?
S.B.- One of the purposes of Teradata is to provide technology that makes it easier for data professionals to work with IT systems themselves, so that they may focus on their abilities as data scientists (no need to be computer experts as well). Hadoop requires advanced technical knowledge. Teradata intends to reduce the level of required technical knowledge to work with data, so that it becomes less difficult to find better professionals in the field. Right now, the aforementioned professionals are also expected to be good at working with computers, which significantly hinders the recruitment process (the number of available candidates is limited).
SCASC.- Which are the challenges in such fields as Industry 4.0 or M2M?
S.B.- The challenge will be processing data in a more efficient manner. This is not only about sheer processing power, but also about processing data in a highly-sophisticated way. M2M requires taking decisions almost in real time, as is the case with autonomous cars, where situations need to be forecasted right when they may take place. M2M has to do with using data in a sophisticated manner, predicting and making decisions in the context of high volumes of data processed with Active Data WareHouses. This is the same in the area of industrial manufacturing. An assembly line cannot be stopped carelessly, so any unexpected events should be prevented or solved in real time.