Pentaho Kettle Solutions: Building Open Source Etl Solutions With Pentaho Data Integration 9780470635179 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线

Pentaho Kettle Solutions: Building Open Source Etl Solutions With Pentaho Data Integration 9780470635179电子书下载地址
- 文件名
- [epub 下载] Pentaho Kettle Solutions: Building Open Source Etl Solutions With Pentaho Data Integration 9780470635179 epub格式电子书
- [azw3 下载] Pentaho Kettle Solutions: Building Open Source Etl Solutions With Pentaho Data Integration 9780470635179 azw3格式电子书
- [pdf 下载] Pentaho Kettle Solutions: Building Open Source Etl Solutions With Pentaho Data Integration 9780470635179 pdf格式电子书
- [txt 下载] Pentaho Kettle Solutions: Building Open Source Etl Solutions With Pentaho Data Integration 9780470635179 txt格式电子书
- [mobi 下载] Pentaho Kettle Solutions: Building Open Source Etl Solutions With Pentaho Data Integration 9780470635179 mobi格式电子书
- [word 下载] Pentaho Kettle Solutions: Building Open Source Etl Solutions With Pentaho Data Integration 9780470635179 word格式电子书
- [kindle 下载] Pentaho Kettle Solutions: Building Open Source Etl Solutions With Pentaho Data Integration 9780470635179 kindle格式电子书
内容简介:
A complete guide to Pentaho Kettle, the Pentaho Data
lntegration toolset for ETL This practical book is a complete guide
to installing, configuring, and managing Pentaho Kettle. If you’re
a database administrator or developer, you’ll first get up to speed
on Kettle basics and how to apply Kettle to create ETL
solutions—before progressing to specialized concepts such as
clustering, extensibility, and data vault models. Learn how to
design and build every phase of an ETL solution.
Shows developers and database administrators how to use the
open-source Pentaho Kettle for enterprise-level ETL processes
(Extracting, Transforming, and Loading data) Assumes no prior
knowledge of Kettle or ETL, and brings beginners thoroughly up to
speed at their own pace Explains how to get Kettle solutions up and
running, then follows the 34 ETL subsystems model, as created by
the Kimball Group, to explore the entire ETL lifecycle, including
all aspects of data warehousing with Kettle Goes beyond routine
tasks to explore how to extend Kettle and scale Kettle solutions
using a distributed “cloud” Get the most out of Pentaho Kettle and
your data warehousing with this detailed guide—from simple single
table data migration to complex multisystem clustered data
integration tasks.
From the Back Cover The ultimate resource on building and
deploying data integration solutions with Kettle
Kettle is a scaleable and extensible open source ETL and data
integration tool that lets you extract data from databases, flat
and XML files, web services, ERP systems, and OLAP cubes. It
provides over 120 built-in transformation steps to validate,
cleanse, and conform data, as well as numerous options to load data
into data warehouses and many other targets. Kettle is a
comprehensive, low-cost alternative to traditional data integration
tools like Informatica PowerCenter, IBM InfoSphere DataStage, and
BusinessObjects Data Integrator.
This book explains in detail how to use Kettle to create, test,
and deploy your own ETL and data integration solutions. You'll
learn to use Kettle's programs to create transformations and jobs,
use version control, audit data, and schedule your ETL solution.
Then you'll progress to more advanced concepts such as clustering
and cloud computing, real-time data integration, loading a Data
Vault model, and extending Kettle by building your own plugins. In
addition, you'll find hands-on examples and case studies that show
exactly how to put Kettle's features into practice.
Explore the components of the Kettle ETL toolset
Discover how to install and configure Kettle and connect it to
various data sources and targets
Design and build every aspect of an ETL solution using
Kettle
Learn how to load a data warehouse with Kettle
Understand the steps for deploying and scheduling ETL
solutions
Gain the skills to integrate Kettle with third-party
products
Learn to extend Kettle and build your own plugins
Use clustering and cloud computing to scale and improve the
performance of your Kettle ETL solutions
Find out how to use Kettle for real-time data integration
书籍目录:
Introduction xxxi Part I Getting Started
Chapter
ETL Primer
OLTP versus Data Warehousing
What Is ETL?
The Evolution of ETL Solutions
ETL Building Blocks
ETL, ELT, and EII
ELT
EII: Virtual Data Integration
0 Data Integration Challenges
1 Methodology: Agile BI
2 ETL Design
4 Data Acquisition
4 Beware of Spreadsheets
5 Design for Failure
5 Change Data Capture
6 Data Quality
6 Data Profiling
6 Data Validation
7 ETL Tool Requirements
7 Connectivity
7 Platform Independence
8 Scalability
8 Design Flexibility
9 Reuse
9 Extensibility
9 Data Transformations
0 Testing and Debugging
1 Lineage and Impact Analysis
1 Logging and Auditing
2 Summary
2 Chapter
Kettle Concepts
3 Design Principles
3 The Building Blocks of Kettle Design
5 Transformations
5 Steps
6 Transformation Hops
6 Parallelism
7 Rows of Data
7 Data Conversion
9 Jobs
0 Job Entries
1 Job Hops
1 Multiple Paths and Backtracking
2 Parallel Execution
3 Job Entry Results
4 Transformation or Job Metadata
6 Database Connections
7 Special Options
8 The Power of the Relational Database
9 Connections and Transactions
9 Database Clustering
0 Tools and Utilities
1 Repositories
1 Virtual File Systems
2 Parameters and Variables
3 Defining Variables
3 Named Parameters
4 Using Variables
4 Visual Programming
5 Getting Started
6 Creating New Steps
7 Putting It All Together
9 Summary
1 Chapter
Installation and Configuration
3 Kettle Software Overview
3 Integrated Development Environment: Spoon
5 Command-Line Launchers: Kitchen and Pan
7 Job Server: Carte
7 Encr.bat and encr.sh
8 Installation
8 Java Environment
8 Installing Java Manually
8 Using Your Linux Package Management System
9 Installing Kettle
9 Versions and Releases
9 Archive Names and Formats
0 Downloading and Uncompressing
0 Running Kettle Programs
1 Creating a Shortcut Icon or Launcher for Spoon
2 Configuration
3 Configuration Files and the .kettle Directory
3 The Kettle Shell Scripts
9 General Structure of the Startup Scripts
0 Adding an Entry to the Classpath
0 Changing the Maximum Heap Size
1 Managing JDBC Drivers
2 Summary
2 Chapter
An Example ETL Solution--Sakila
3 Sakila
3 The Sakila Sample Database
4 DVD Rental Business Process
4 Sakila Database Schema Diagram
5 Sakila Database Subject Areas
5 General Design Considerations
7 Installing the Sakila Sample Database
7 The Rental Star Schema
8 Rental Star Schema Diagram
8 Rental Fact Table
9 Dimension Tables
9 Keys and Change Data Capture
0 Installing the Rental Star Schema
1 Prerequisites and Some Basic Spoon Skills
1 Setting Up the ETL Solution
2 Creating Database Accounts
2 Working with Spoon
2 Opening Transformation and Job Files
2 Opening the Step's Configuration Dialog
3 Examining Streams
3 Running Jobs and Transformations
3 The Sample ETL Solution
4 Static, Generated Dimensions
4 Loading the dim-date Dimension Table
4 Loading the dim-time Dimension Table
6 Recurring Load
7 The load-rentals Job
8 The load-dim-staff Transformation
1 Database Connections
1 The load-dim-customer Transformation
5 The load-dim-store Transformation
8 The fetch-address Subtransformation
9 The load-dim-actor Transformation
01 The load-dim-film Transformation
02 The load-fact-rental Transformation
07 Summary
09 Part II ETL
11 Chapter
ETL Subsystems
13 Introduction to the
4 Subsystems
14 Extraction
14 Subsystems
--3: Data Profiling, Change Data Capture, and Extraction
15 Cleaning and Conforming Data
16 Subsystem
: Data Cleaning and Quality Screen Handler System
16 Subsystem
: Error Event Handler
17 Subsystem
: Audit Dimension Assembler
17 Subsystem
: Deduplication System
17 Subsystem
: Data Conformer
18 Data Delivery
18 Subsystem
: Slowly Changing Dimension Processor
18 Subsystem
0: Surrogate Key Creation System
19 Subsystem
1: Hierarchy Dimension Builder
19 Subsystem
2: Special Dimension Builder
20 Subsystem
3: Fact Table Loader
21 Subsystem
4: Surrogate Key Pipeline
21 Subsystem
5: Multi-Valued Dimension Bridge Table Builder
21 Subsystem
6: Late-Arriving Data Handler
22 Subsystem
7: Dimension Manager System
22 Subsystem
8: Fact Table Provider System
22 Subsystem
9: Aggregate Builder
23 Subsystem
0: Multidimensional (OLAP) Cube Builder
23 Subsystem
1: Data Integration Manager
23 Managing the ETL Environment
23 Summary
26 Chapter
Data Extraction
27 Kettle Data Extraction Overview
28 File-Based Extraction
28 Working with Text Files
28 Working with XML files
33 Special File Types
34 Database-Based Extraction
34 Web-Based Extraction
37 Text-Based Web Extraction
37 HTTP Client
37 Using SOAP
38 Stream-Based and Real-Time Extraction
38 Working with ERP and CRM Systems
38 ERP Challenges
39 Kettle ERP Plugins
40 Working with SAP Data
40 ERP and CDC Issues
46 Data Profiling
46 Using eobjects.org DataCleaner
47 Adding Profile Tasks
49 Adding Database Connections
49 Doing an Initial Profile
51 Working with Regular Expressions
51 Profiling and Exploring Results
52 Validating and Comparing Data
53 Using a Dictionary for Column Dependency Checks
53 Alternative Solutions
54 Text Profiling with Kettle
54 CDC: Change Data Capture
54 Source Data--Based CDC
55 Trigger-Based CDC
57 Snapshot-Based CDC
58 Log-Based CDC
62 Which CDC Alternative Should You Choose?
63 Delivering Data
64 Summary
64 Chapter
Cleansing and Conforming
67 Data Cleansing
68 Data-Cleansing Steps
69 Using Reference Tables
72 Conforming Data Using Lookup Tables
72 Conforming Data Using Reference Tables
75 Data Validation
79 Applying Validation Rules
80 Validating Dependency Constraints
83 Error Handling
83 Handling Process Errors
84 Transformation Errors
86 Handling Data (Validation) Errors
87 Auditing Data and Process Quality
91 Deduplicating Data
92 Handling Exact Duplicates
93 The Problem of Non-Exact Duplicates
94 Building Deduplication Transforms
95 Step
: Fuzzy Match
97 Step
: Select Suspects
98 Step
: Lookup Validation Value
98 Step
: Filter Duplicates
99 Scripting
00 Formula
01 JavaScript
02 User-Defined Java Expressions
02 Regular Expressions
03 Summary
05 Chapter
Handling Dimension Tables
07 Managing Keys
08 Managing Business Keys
09 Keys in the Source System
09 Keys in the Data Warehouse
09 Business Keys
09 Storing Business Keys
10 Looking Up Keys with Kettle
10 Generating Surrogate Keys
10 The "Add sequence" Step
11 Working with auto-increment or IDENTITY Columns
17 Keys for Slowly Changing Dimensions
17 Loading Dimension Tables
18 Snowflaked Dimension Tables
18 Top-Down Level-Wise Loading
19 Sakila Snowflake Example
19 Sample Transformation
21 Database Lookup Configuration
22 Sample Job
25 Star Schema Dimension Tables
26 Denormalization
26 Denormalizing to
NF with the "Database lookup" Step
26 Change Data Capture
27 Slowly Changing Dimensions
28 Types of Slowly Changing Dimensions
28 Type
Slowly Changing Dimensions
29 The Insert / Update Step
29 Type
Slowly Changing Dimensions
32 The "Dimension lookup / update" Step
32 Other Types of Slowly Changing Dimensions
37 Type
Slowly Changing Dimensions
37 Hybrid Slowly Changing Dimensions
38 More Dimensions
39 Generated Dimensions
39 Date and Time Dimensions
39 Generated Mini-Dimensions
39 Junk Dimensions
41 Recursive Hierarchies
42 Summary
43 Chapter
Loading Fact Tables
45 Loading in Bulk
46 STDIN and FIFO
47 Kettle Bulk Loaders
48 MySQL Bulk Loading
49 LucidDB Bulk Loader
49 Oracle Bulk Loader
49 PostgreSQL Bulk Loader
50 Table Output Step
50 General Bulk Load Considerations
50 Dimension Lookups
51 Maintaining Referential Integrity
51 The Surrogate Key Pipeline
52 Using In-Memory Lookups
53 Stream Lookups
53 Late-Arriving Data
55 Late-Arriving Facts
56 Late-Arriving Dimensions
56 Fact Table Handling
60 Periodic and Accumulating Snapshots
60 Introducing State-Oriented Fact Tables
61 Loading Periodic Snapshots
63 Loading Accumulating Snapshots
64 Loading State-Oriented Fact Tables
65 Loading Aggregate Tables
66 Summary
67 Chapter
0 Working with OLAP Data
69 OLAP Benefits and Challenges
70 OLAP Storage Types
72 Positioning OLAP
72 Kettle OLAP Options
73 Working with Mondrian
74 Working with XML/A Servers
77 Working with Palo
82 Setting Up the Palo Connection
83 Palo Architecture
84 Reading Palo Data
85 Writing Palo Data
89 Summary
91 Part III Management and Deployment
93 Chapter
1 ETL Development Lifecycle
95 Solution Design
95 Best and Bad Practices
96 Data Mapping
97 Naming and Commentary Conventions
98 Common Pitfalls
99 ETL Flow Design
00 Reusability and Maintainability
00 Agile Development
01 Testing and Debugging
06 Test Activities
07 ETL Testing
08 Test Data Requirements
08 Testing for Completeness
09 Testing Data Transformations
11 Test Automation and Continuous Integration
11 Upgrade Tests
12 Debugging
12 Documenting the Solution
15 Why Isn't There Any Documentation?
16 Myth
: My Software Is Self-Explanatory
16 Myth
: Documentation Is Always Outdated
16 Myth
: Who Reads Documentation Anyway?
17 Kettle Documentation Features
17 Generating Documentation
19 Summary
20 Chapter
2 Scheduling and Monitoring
21 Scheduling
21 Operating System--Level Scheduling
22 Executing Kettle Jobs and Transformations from the Command
Line
22 UNIX-Based Systems: cron
26 Windows: The at utility and the Task Scheduler
27 Using Pentaho's Built-in Scheduler
27 Creating an Action Sequence to Run Kettle Jobs and
Transformations
28 Kettle Transformations in Action Sequences
29 Creating and Maintaining Schedules with the Administration
Console
30 Attaching an Action Sequence to a Schedule
33 Monitoring
33 Logging
33 Inspecting the Log
33 Logging Levels
35 Writing Custom Messages to the Log
36 E-mail Notifications...
作者介绍:
Matt Casters is Founder of Kettle and works as Chief Data
Integration at Pentaho, where he leads Kettle software development.
Roland Bouman is an application developer focusing on open source
web technology, databases, and business intelligence. Jos van
Dongen is an independent business intelligence consultant and
well-known author, analyst, and presenter.
出版社信息:
暂无出版社相关信息,正在全力查找中!
书籍摘录:
暂无相关书籍摘录,正在全力查找中!
在线阅读/听书/购买/PDF下载地址:
原文赏析:
暂无原文赏析,正在全力查找中!
其它内容:
书籍介绍
A complete guide to Pentaho Kettle, the Pentaho Data lntegration toolset for ETL This practical book is a complete guide to installing, configuring, and managing Pentaho Kettle. If you’re a database administrator or developer, you’ll first get up to speed on Kettle basics and how to apply Kettle to create ETL solutions—before progressing to specialized concepts such as clustering, extensibility, and data vault models. Learn how to design and build every phase of an ETL solution. Shows developers and database administrators how to use the open-source Pentaho Kettle for enterprise-level ETL processes (Extracting, Transforming, and Loading data) Assumes no prior knowledge of Kettle or ETL, and brings beginners thoroughly up to speed at their own pace Explains how to get Kettle solutions up and running, then follows the 34 ETL subsystems model, as created by the Kimball Group, to explore the entire ETL lifecycle, including all aspects of data warehousing with Kettle Goes beyond routine tasks to explore how to extend Kettle and scale Kettle solutions using a distributed “cloud” Get the most out of Pentaho Kettle and your data warehousing with this detailed guide—from simple single table data migration to complex multisystem clustered data integration tasks. From the Back Cover The ultimate resource on building and deploying data integration solutions with Kettle Kettle is a scaleable and extensible open source ETL and data integration tool that lets you extract data from databases, flat and XML files, web services, ERP systems, and OLAP cubes. It provides over 120 built-in transformation steps to validate, cleanse, and conform data, as well as numerous options to load data into data warehouses and many other targets. Kettle is a comprehensive, low-cost alternative to traditional data integration tools like Informatica PowerCenter, IBM InfoSphere DataStage, and BusinessObjects Data Integrator. This book explains in detail how to use Kettle to create, test, and deploy your own ETL and data integration solutions. You'll learn to use Kettle's programs to create transformations and jobs, use version control, audit data, and schedule your ETL solution. Then you'll progress to more advanced concepts such as clustering and cloud computing, real-time data integration, loading a Data Vault model, and extending Kettle by building your own plugins. In addition, you'll find hands-on examples and case studies that show exactly how to put Kettle's features into practice. Explore the components of the Kettle ETL toolset
Discover how to install and configure Kettle and connect it to various data sources and targets
Design and build every aspect of an ETL solution using Kettle
Learn how to load a data warehouse with Kettle
Understand the steps for deploying and scheduling ETL solutions
Gain the skills to integrate Kettle with third-party products
Learn to extend Kettle and build your own plugins
Use clustering and cloud computing to scale and improve the performance of your Kettle ETL solutions
Find out how to use Kettle for real-time data integration
网站评分
书籍多样性:5分
书籍信息完全性:9分
网站更新速度:7分
使用便利性:5分
书籍清晰度:3分
书籍格式兼容性:5分
是否包含广告:4分
加载速度:7分
安全性:5分
稳定性:7分
搜索功能:6分
下载便捷性:7分
下载点评
- 已买(203+)
- 书籍完整(401+)
- 中评多(638+)
- 还行吧(551+)
- 三星好评(142+)
- 微信读书(291+)
- txt(172+)
- 无广告(154+)
- 差评少(411+)
- 实惠(241+)
- 傻瓜式服务(444+)
- 排版满分(109+)
下载评价
- 网友 詹***萍:
好评的,这是自己一直选择的下载书的网站
- 网友 隗***杉:
挺好的,还好看!支持!快下载吧!
- 网友 国***舒:
中评,付点钱这里能找到就找到了,找不到别的地方也不一定能找到
- 网友 蓬***之:
好棒good
- 网友 冉***兮:
如果满分一百分,我愿意给你99分,剩下一分怕你骄傲
- 网友 饶***丽:
下载方式特简单,一直点就好了。
- 网友 曾***玉:
直接选择epub/azw3/mobi就可以了,然后导入微信读书,体验百分百!!!
- 网友 居***南:
请问,能在线转换格式吗?
- 网友 石***烟:
还可以吧,毕竟也是要成本的,付费应该的,更何况下载速度还挺快的
- 网友 温***欣:
可以可以可以
- 网友 丁***菱:
好好好好好好好好好好好好好好好好好好好好好好好好好
- 网友 仰***兰:
喜欢!很棒!!超级推荐!
- 网友 宫***玉:
我说完了。
喜欢"Pentaho Kettle Solutions: Building Open Source Etl Solutions With Pentaho Data Integration 9780470635179"的人也看了
2016全国房地产经纪人职业资格考试复习题及解析 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
全新正版图书 光伏水泵运行特与谈明高江苏大学出版社9787568414654 光伏水泵运行特光伏水泵优设计普通大众人天图书专营店 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
中国企业的国际代工问题研究/吴解生/浙江大学出版社 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
Spark机器学习 (英)彭特里思(Nick Pentreath) 著 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
走遍江苏 江苏景点景区导游词精选 (第2版) 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
钱夹子王国(全2册) 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
读童谣长知识·活蹦乱跳的水族馆 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
工会法及相关法规汇编(含典型案例)(金牌汇编系列) 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
摄影技术与艺术 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
全国卫生专业职称技术资格考试 妇产科主治医师资格考试考点速记(第二版)(主治医师晋升宝典) 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
- 小十点日历2021 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
- 机电工程专业英语(第二版)(配光盘) 化学工业出版社 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
- 近代物理实验 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
- 2011全国土地估价师资格考试历年真题与仿真试题 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
- 加拿大自驾游宝典9787504389282 正版新书希望阶梯图书专营店 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
- 冶金机电设备点检(中级) 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
- 海外直订Lingambudhi Lake Safety Book: The Essential Lake Safety Guide For Children 林甘布迪湖安全手册:儿童湖泊安全基本指南 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
- 给40岁男人看的长青书 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
- 中国这边,美国那边:81个话题透视中美差异 袁岳,(美)方大为(Firestei【正版保证】 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
- 毛泽东军事战略教程 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线
书籍真实打分
故事情节:7分
人物塑造:6分
主题深度:7分
文字风格:3分
语言运用:7分
文笔流畅:3分
思想传递:8分
知识深度:8分
知识广度:8分
实用性:6分
章节划分:3分
结构布局:7分
新颖与独特:6分
情感共鸣:7分
引人入胜:7分
现实相关:7分
沉浸感:3分
事实准确性:6分
文化贡献:6分