Reduce dimensionality using PCA in Amazon SageMaker Data Wrangler

Today, we are excited to announce support for dimensionality reduction using principal components analysis (PCA) in Amazon SageMaker Data Wrangler. Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes. With Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization from a single visual interface. PCA is a popular technique for analyzing large datasets containing a high number of dimensions per observation and is a helpful statistical technique for reducing the dimensionality of a dataset for use with popular ML algorithms like XgBoost and random forest. Previously, to perform PCA on a data set, data scientists would have to find appropriate libraries and write code to reduce high-dimensional data.

Leave a Reply

Your email address will not be published.