Mining information in unstructured text can be a real challenge. Patent documents, for example, provide a rich source of technological and scientific knowledge that can reveal technological trends as well as information on the legal landscape of the market. This makes analysis of the vast and ever-growing number of patents an important part of corporate business strategies.
Principal component analysis, or PCA, is a statistical procedure that allows you to summarize the information content in large data tables by means of a smaller set of “summary indices” that can be more easily visualized and analyzed. The underlying data can be measurements describing properties of production samples, chemical compounds or reactions, process time points of a continuous process, batches from a batch process, biological individuals or trials of a DOE-protocol, for example.
What do we mean by pre-processing of data, and why is it needed? Let's take a look at some data pre-processing methods and how they help create better models when using Principle Component Analysis (PCA) and other methods of data analytics.