Autonomous Vehicles are a Huge Market for Data Analytics – Cloudera

Cloudera’s Dave Shuman talks about the value of big data from autonomous vehicles, and using analytics to counter new and emerging cyber threats.

Autonomous Vehicles are a big Market for Data

There will be several areas where data can be collected and processed in autonomous vehicles. You have this concept of the edge, which in this instance would be the autonomous vehicle. When the vehicle is driving you around, it won’t be collecting data, sending it to the cloud, and having decisions made then sent back. If you have a child with a ball running across the street, it has to make a decision on the spot. So you are looking at a local computer processing capability to achieve that. You can see instances of this with Google, Uber and the rest, driving all these autonomous vehicles around and collecting data. Right now, it’s all about building and training models, which is a big data application in which we play a big role.

You will have compute and storage capabilities within the vehicles, allowing them to make autonomous decisions, which may run on Cloudera, or other data storage and compute capabilities.

The Combination of Data Analytics and IoT Helps in Building Strong Cyber Defense Capabilities

The way we do cyber protection today is mainly based on signature, which is great for things we know about, but not for the new and emerging threats. The ability to use predictive analytics models on top of data allows you to detect abnormal behavior, and the model can be trained to detect this. I can give that feedback to an operator who analyzes it and agrees that it is a threat. The model has this human feedback that validates the detection, allowing it to detect other instances better.

Cloudera has partnered with Intel and others in the open source community to create an open source project called Apache Spot, to handle such scenarios. That is a new model for handling cyber threats, leveraging open source, which is more than existing solutions would economically be able to tackle today.

We also have to think about the numerous sensors around us. We have a partnership with Intel and another company to use facial recognition for recognizing bad actors, all in near real-time.

Early Warning System Benefits

Developing an early warning system presumably involves data signals that you don’t have today. You’ve decided to put sensor networks out in different areas, to use image processing in a new way, and are basically collecting and curating data to build a new level of functionality. This ability to collect and curate data is one of the things that Apache brings to the table, not having to model your data prior to ingestion. You are able to collect and curate all of the data you have accessible to you, much in the way that you would load data to your C: drive on a computer.

Once you have that in place, you can build and discover elements in your data to build both a data model, and processing model to match your hypothesis. If you have a new sensor network out there, then you can use audio data that is coming off that to discover an intrusion, or you can use image processing to identify troop movements, or even use a cyber model to discover activities that are unnatural to that environment. It involves looking for patterns, and anomalies within those patterns to identify the type of instance that would allow you to make detections, then you build an application that uses that. All of this happens within this ecosystem.


We contribute to a vibrant open source community, and are part of the Apache open source community. If someone wanted to start work on an analytics workload, using data and building analytics, the person could go to our website, download our community edition, and start using it for free. Millions of people have actually done this.

For a commercial entity, the process is a bit different. For instance, Navister – a manufacturer of large, industrial vehicles such as school buses and military vehicles – analyzed their business and wanted to offer their customers a way to benefit by using IoT data coming off those trucks to create a more efficient model for doing maintenance. They turned to the cloud, and started ingesting the data coming in from edge devices. The cost was low and the impact was minimal. They were experimenting to find out what they could do with data, and were able to identify a use case where they established where to apply maintenance to vehicles in a timely manner, translating to a more economical transaction for their customers.

Machine learning helped them understand how to apply predictive maintenance on vehicles, available inventory, available driving hours for drivers, and the contacts available to service that. All of this was accomplished with a very small footprint, but with significant value.

State of Analytics Today

“The value of the information from IoT for the enterprise is huge,” said Rocky DeStefano Cloudera’s subject matter expert on cybersecurity. “It can transform the way products work, services are delivered, and also help us secure things better. That growing demand is not going anywhere, and will double every few years in terms of quantity of data. What will get better is the application of that data. As a community, what we are doing with Apache Spot is changing the way cybersecurity analytics work. From each organization working independently and only doing the best they can do, we have migrated to making it a community approach, so we can benefit as a community and improve together. This raises the bar, and allows us to share machine learning across companies.”

Cloudera’s Mission

We make what is impossible today, possible tomorrow, with data, continued Shuman. We have various objectives of things we want to achieve as a society – serve our customers better, create better models to understand model behavior within our supply chain in different environments, address human trafficking and the sustainability around food to address the problem of how to feed everybody. Other problems we tackle include how to protect people, either through cyber or physical security methods, and how to protect the environment. All of this can be enhanced through data, allowing us to say that the perspectives that we cannot see today can be made possible through data, and the application of machine learning and analytics.


Profile of Speakers:

Main Speaker:  Dave Shuman, industry lead for IoT & manufacturing at Cloudera

Dave Shuman is industry lead for IoT & manufacturing at Cloudera. He is a recognized retail and consumer products industry expert in leveraging explicit and implicit data to derive actionable insights.

Second Speaker: Rocky DeStefano, Cloudera’s subject matter expert on cybersecurity


Rocky DeStefano is Cloudera’s subject matter expert on cybersecurity. He has supported cybersecurity advancement in the public and private sectors for almost a quarter of a century and was a finalist for the Cybersecurity Professional of the Year this year.


About Cloudera: 

Cloudera delivers the modern platform for machine learning and advanced analytics built on the latest open source technologies. The world’s leading organizations trust Cloudera to help solve their most challenging business problems with Cloudera Enterprise, the fastest, easiest and most secure data platform available for the modern world. Cloudera’s customers efficiently capture, store, process and analyze vast amounts of data, empowering them to use advanced analytics and machine learning to drive business decisions quickly, flexibly and at lower cost than has been possible before. To ensure their customers are successful, Cloudera offers comprehensive support, training and professional services. Learn more at