Speaker: Isola Ajiferuke, University of Western Ontario
Sponsor: ASLS

Data mining is the process of discovering and interpreting previously unknown patterns among data.

Techniques of data mining: classification, estimation, prediction, affinity grouping, clustering, and description (Berry & Linoff, 2004)

Classification examines a newly presented object and assigning to predefined classes (present).

Classification also deals with discrete outcomes.

Estimation is the same as classification but it deals with possible or continuous outcomes (present).

Prediction – same as classification or estimation but deals with the future.

Affinity grouping – determine which things go together (retail stores do this- e.g. graham crackers, chocolate bars, and marshmallows on the same display)

Clustering – no predefined grouping but similar to classification – let the data show what groups are needed.

Description – just describing what you see.

Major applications of Data Mining:

Health care
Retail/Marketing (customer buying patterns, “reward” cards can track this)
Financial sector
online sellers (Amazon.com “Customers who bought this also bought” feature is data mining)

Related Concepts:

text mining
web mining

Bibliomining refers to the use if data mining techniques to examine library data records (Nicholson, 2003)

Bibliomining can be used to understand patterns of behavior among library users and staff members as well as  patterns of information resource use.

Uses of Bibliomining:

Improve library services (similar to Amazon.com).  Could link that kind of function to your OPAC.

Predict how many copies of a book you should buy.

Aid decision making within the library (staffing decisions and determining the circulation dates for certain patrons [eg undergrad vs. grad vs. faculty])

Assist in policy or budget justification

Steps in Bibliomining:

Identify the problem or determine the area of focus.

Identify the source of required data: Bibliographic information (OPACs), Acquisitions information, Patron information, Circulation information, Searching and navigation information, Reference Desk Interactions (both face to face and virtual), In-house use information, ILL.

Prepare data for analysis/create data warehouse. (Make a separate database w/ info from other databases).

Analyze data using appropriate software packages: Excel (Pivot tables and chart reports), SAS Enterprise miner ($$$), SPSS Clementine ($$$), Insightful Miner ($$$), WEKA (Open Source), Rapid Miner (Open Source).  [The latter two are written in Java and are not quite as user friendly as the ones that you would pay for.]

Interpret Results

Privacy Concerns:

How can we protect a patron’s identifiable info?
Seek consent of the patron.
Deleting or replacing personally identifiable information during data extraction and cleaning process.

http://www.bibliomining.com – Bibliomining information center.