Applied Data Mining: Statistical Methods for Business and Industry

I’ve found out in the University of Cantabria Library a book on Data Mining that I think is well suited for an overview of real world applications, specially business applications (rather than the somehow theoretic view of the main course book of the Data Mining course of the Master Degree on AI)

Applied Data Mining: Statistical Methods for Business and Industry (Statistics in Practice) by Paolo Giudici, Faculty of Economics, University of Pavia, Italy. Wiley.


Data mining can be defined as the process of selection, exploration and modelling of large databases, in order to discover models and patterns. The increasing availability of data in the current information society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract such knowledge from data. Applications occur in many different fields, including statistics, computer science, machine learning, economics, marketing and finance.

This book is the first to describe applied data mining methods in a consistent statistical framework, and then show how they can be applied in practice. All the methods described are either computational, or of a statistical modelling nature. Complex probabilistic models and mathematical tools are not used, so the book is accessible to a wide audience of students and industry professionals. The second half of the book consists of nine case studies, taken from the author’s own work in industry, that demonstrate how the methods described can be applied to real problems.

  • Provides a solid introduction to applied data mining methods in a consistent statistical framework
  • Includes coverage of classical, multivariate and Bayesian statistical methodology
  • Includes many recent developments such as web mining, sequential Bayesian analysis and memory based reasoning
  • Each statistical method described is illustrated with real life applications
  • Features a number of detailed case studies based on applied projects within industry
  • Incorporates discussion on software used in data mining, with particular emphasis on SAS
  • Supported by a website featuring data sets, software and additional material
  • Includes an extensive bibliography and pointers to further reading within the text
  • Author has many years experience teaching introductory and multivariate statistics and data mining, and working on applied projects within industry

A valuable resource for advanced undergraduate and graduate students of applied statistics, data mining, computer science and economics, as well as for professionals working in industry on projects involving large volumes of data – such as in marketing or financial risk management.

Data sets used in the case studies are available


Un comentario en “Applied Data Mining: Statistical Methods for Business and Industry

  1. By Jiang, WeiPublication: IIE TransactionsDate: Friday, December 1 2006Applied Data Mining–Statistical Methods for Business and IndustryPaolo GiudiciJohn Wiley & Sons, 2003, 364 pages, ISBN 0-470-84678-XApplied Data Mining by Paolo Giudici is a good attempt to “establish a bridge between data mining methods and applications in the fields of business and industry by adopting a coherent and rigorous approach to statistical modeling.”Ads by GoogleStreaming Analytics. Deepdata analysis in near real time. Performance, value and simplicity.www.Netezza.comStatistical AnalysisRapid analysis of your critical scientific and business datawww.scienceops.comSSH Secure File TransferProtect Critical Business Data: SSH Secure File Transferwww.sterlingcommerce.esThe text contains not only useful data mining methods from machine learning and statistics, but also describes them in relation to the business goals. To achieve this compromise, the book is naturally divided into two complementary parts: (i) methodologies; and (ii) applications.Following the first chapter which briefly elaborates relationships between data mining, computer science, machine learning, and statistics, Part I including Chapters 2 to 6 presents the main methodologies used in data mining. Chapter 2 illustrates data aspects of data mining. Data structure, data summarization, and data transformation are briefly discussed. Chapter 3 builds quality of data mining upon exploratory tools for statistical analysis. Many elementary statistical measurements are introduced, which provide a useful reference for readers without a solid statistical background. Dimension reduction techniques such as principle component analysis are also discussed.Chapters 4 and 5 examine the main data mining methodologies. Computational data mining methods such as cluster analysis, linear regression models, trees, neural networks, nearest-neighbor methods and association rules are first introduced from the computational point of view, while statistical data mining methods such as generalized linear models, graphical models, and non-parametric methods are then discussed under the probabilistic framework. However, the distinction between the two groups of methods is not rigid, since many methods in the first group can be formulated by probability models as well. The argument to differentiate the two groups, as pointed out by the author, is based on computational efforts and mathematical foundations. In fact, it is more apparent to distinguish the methods in terms of problem formulations, i.e., either distance/utility-based or probability/likelihood-based. However, it seems that this natural link is easily missed when reading the chapters.Chapter 6 is an important component of the data mining process since the introduction of so many data analysis models means that the evaluation of data mining methods is crucial in practice. The author discusses discrepancy criteria based on statistical tests, loss-based criteria based on various score functions, and computational criteria based on cross-validation, etc. The discussion of each criterion is quite brief but transparent. Since data mining evaluation is not only a problem of statistical/mathematical justifications of various methods, but also a matter of data itself, it would be better to provide a clear link between different criteria to help practitioners select appropriate criteria in practice.Part II from Chapters 7 to 12 covers six business cases including market basket analysis, web clickstream analysis, profiling website visitors, customer relationship management, credit scoring, and business forecasting. The book provides a detailed presentation of these case studies, including business background, data exploratory analysis, model building, and project summaries. Many methods presented in Part I are illustrated in detail by the projects. However, it would be fruitful if the readers were to be able to access the data in these cases.The text’s references and index are complementary, which justify the text as a compact introduction and useful reference for data mining practitioners. Links to advanced data mining texts and software are also provided. Generally, the book is a useful reference for researchers and practitioners who are interested in applying data mining but do not possess a background in statistics and machine learning.Reviewed by Wei JiangDepartment of Systems Engineering and Engineering Management, Stevens Institute of Technology, Castle Point of Hudson, Hoboken, NJ 07030, USA


Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de

Estás comentando usando tu cuenta de Cerrar sesión /  Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión /  Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión /  Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión /  Cambiar )


Conectando a %s