本書對1999年的初版做了重大的改動。雖說核心概念沒有變化,但本書進行了更新使其能反映過去5年里的變化,參考文獻幾乎翻了一番。新版的重要部分包括:30個新的技術章節;一個加強了的具有交互式界面的Weka機器學習工作平臺;有關神經網絡的完整信息,一個有關貝葉斯網絡的新節;等等。
本書提供了機器學習概念的完整基礎,此外還針對實際工作中應用相關工具和技術提出了一些建議,在本書中你將發現:
●成功數據挖掘技術的核心算法,包括歷經考驗的真實技術及前沿的方法。
●轉換輸入或輸出以改善性能的方法。
●可下載的Weka軟件??一個用于數據挖掘任務的機器學習算法的集合,包括用于數據預處理、分類、回歸、聚類、關聯規則以及在新的交互式界面上可視化的工具。
"本書將這門新的學科用一種非常容易理解的方式呈現給讀者:它既是一本用于培訓新一代實踐者和研究工作者的教科書,同時對于我這樣需要不斷充電的專業讀者也具有啟示作用。Witten和Frank熱切追求的是簡單而流暢的解決方案,他們時刻不忘將所有的概念都建立在具體實例的基礎之上,促使讀者首先考慮簡單的技術,如果這些簡單技術不足以解決問題,再進一步考慮更為高級和成熟的技術。
假如你想分析和理解數據,本書以及相關的Weka工具包將非常有用。"
――摘自微軟研究院圖靈獎得主Jim Gray所做的前言
Lan H.Witten新西蘭懷卡托大學計算機科學系教授,ACM和新西蘭皇家學會成員。他曾榮獲2004年國際信息處理研究聯合會頒發的Namur獎項,這是一個兩年一度的榮譽獎項,用于獎勵那些在信息和通信技術的社會應用方面做出杰現貢獻及具有國際影響的人。他的著作包括《Managing Gi
Foreword
Preface
Part I Machine learning tools and techniques
1. What?s it all about?
1.1 Data mining and machine learning
1.2 Simple examples: the weather problem and others
1.3 Fielded applications
1.4 Machine learning and statistics
1.5 Generalization as search
1.6 Data mining and ethics
1.7 Further reading
2. Input: Concepts, instances, attributes
2.1 What?s a concept?
2.2 What?s in an example?
2.3 What?s in an attribute?
2.4 Preparing the input
2.5 Further reading
3. Output: Knowledge representation
3.1 Decision tables
3.2 Decision trees
3.3 Classification rules
3.4 Association rules
3.5 Rules with exceptions
3.6 Rules involving relations
3.7 Trees for numeric prediction
3.8 Instance-based representation
3.9 Clusters
3.10 Further reading
4. Algorithms: The basic methods
4.1 Inferring rudimentary rules
4.2 Statistical modeling
4.3 Divide-and-conquer: constructing decision trees
4.4 Covering algorithms: constructing rules
4.5 Mining association rules
4.6 Linear models
4.7 Instance-based learning
4.8 Clustering
4.9 Further reading
5. Credibility: Evaluating what?s been learned
5.1 Training and testing
5.2 Predicting performance
5.3 Cross-validation
5.4 Other estimates
5.5 Comparing data mining schemes
5.6 Predicting probabilities
5.7 Counting the cost
5.8 Evaluating numeric prediction
5.9 The minimum description length principle
5.10 Applying MDL to clustering
5.11 Further reading
6. Implementations: Real machine learning schemes
6.1 Decision trees
6.2 Classification rules
6.3 Extending linear models
6.4 Instance-based learning
6.5 Numeric prediction
6.6 Clustering
6.7 Bayesian networks
7. Transformations: Engineering the input and output
7.1 Attribute selection
7.2 Discretizing numeric attributes
7.3 Some useful transformations
7.4 Automatic data cleansing
7.5 Combining multiple models
7.6 Using unlabeled data
7.7 Further reading
8. Moving on: Extensions and applications
8.1 Learning from massive datasets
8.2 Incorporating domain knowledge
8.3 Text and Web mining
8.4 Adversarial situations
8.5 Ubiquitous data mining
8.6 Further reading
Part II: The Weka machine learning workbench
9. Introduction to Weka
9.1 What?s in Weka?
9.2 How do you use it?
9.3 What else can you do?
9.4 How do you get it?
10. The Explorer
10.1 Getting started
10.2 Exploring the Explorer
10.3 Filtering algorithms
10.4 Learning algorithms
10.5 Meta-learning algorithms
10.6 Clustering algorithms
10.7 Association-rule learners
10.8 Attribute selection
11. The Knowledge Flow interface
11.1 Getting started
11.2 Knowledge Flow components
11.3 Configuring and connecting the components
11.4 Incremental learning
12. The Experimenter
12.1 Getting started
12.2 Simple setup
12.3 Advanced setup
12.4 The Analyze panel
12.5 Distributing processing over several machines
13. The command-line interface
13.1 Getting started
13.2 The structure of Weka
13.3 Command-line options
14. Embedded machine learning
……
15. Writing new learning schemes
References
Index