Google Cloud Data Fusion is a managed service that allows users to visually build and run data pipelines without coding. It converts the graphical pipeline design into a directed acyclic graph (DAG) to run batch or streaming jobs on an ephemeral Dataproc cluster. The presenter demoed building a sample pipeline using the drag and drop interface and explored the tool's monitoring and logging capabilities. While the tool aims to save labor hours, its beta limitations include slow instance creation, lack of input validation, and Java stacktraces in errors. Pricing is based on hourly development costs and Dataproc execution costs.
Related topics: