Big Data is a collection of data that is huge in size and yet growing exponentially with time. Big data can exist in structured, semi structured and unstructured form. A structured data when presented in a given context is called information. There are four characteristics of Big Data-
- Volume- The size of the available data plays a significant role in determining the value of the data. Huge volumes of data cannot be processed and analyzed using the traditional means of data processing. So a set of data can only be considered as big data if it is huge in volume.
- Variety- Variety of big data refers to the heterogeneous source and nature of the available data. These data sets can be a combination of texts, images, videos and other formats.
- Velocity- The velocity of flow of big data is continuous and massive. In today’s world big data is generated and collected at a very fast pace and needs to be stored and analyzed at that very speed.
- Variability- This refers to the inconsistency in the big data sets.
- Veracity- Big data sets contain data from various sources and hence the quality of the available data varies. This is called veracity.
Big data testing-
Big data is collection of hue amount of data and hence they cannot be processed using traditional computation methods. In big data testing performance testing and functional testing are the most important part.
Here we are going to discuss a bit about the performance testing of the big data applications.
Performance Testing Approach
Performance testing for big data application involves testing of huge volumes of structured and unstructured data, and it requires a specific testing approach to test such massive data.
Performance testing of big data includes three actions
Data ingestion and throughout:
In this stage, tester checks how the quick framework can capture data from different data source. Testing includes distinguishing an alternate message that the queue can process in a given time period. It additionally includes how rapidly data can be embedded into the underlying data store for example insertion rate into a Mongo and Cassandra database.
It includes checking the speed with which the queries or map reduce jobs are executed. It also involves testing the data processing in isolation when the underlying data store is populated inside the data sets. For instance, running Map Reduce employments on the fundamental HDFS (Hadoop distributed File System).
These systems are comprised of various components, and it is important to test every one of these components in isolation. For instance, how rapidly the message is indexed and used, MapReduce jobs, query performance, search, and so on.
Performance Testing Approach
Performance testing for big data application includes testing of gigantic volumes of structured and unstructured information, and it requires a particular testing way to deal with test such massive data.
So, from the above discussion we can draw a conclusion that performance testing of Big Data cannot be achieved by the traditional methods of performance testing.