The remit of the National Centre for Text Mining, believed to be the first such publicly funded centre in the world, is to contribute to the associated national and international research agenda, to establish a service for the wider academic community, and to make connections with industry. The centre will initially be focused on biological and biomedical science.

National Centre for Text Mining


Start date: 1 June 2004

End date: 1 March 2009

Funding programme: Support for e-Research programme

Project website: http://www.nactem.ac.uk/

JISC theme(s): e-Research, Information environment

Committees: JISC Support of Research committee

The National Centre for Text Mining (NaCTeM) is funded by the JISC, BBSRC and ESPRC. NACTeM officially began on 1 June 2004, as a successful outcome of a response to JISC Circular 7/03 by a consortium comprising UMIST, and the Universities of Manchester, Liverpool and Salford. Self-funded international partners include University of California at Berkeley, University of Geneva, University of Tokyo, and the San Diego Supercomputer Center. 

The remit of NaCTeM, believed to be the first such publicly funded centre in the world, is to contribute to the associated national and international research agenda, to establish a service for the wider academic community, and to make connections with industry.

What is Text Mining

Text mining attempts to discover new, previously unknown information by applying techniques from data mining, information retrieval, and natural language processing:

  • To identify and gather relevant textual sources
  • To analyse these to extract facts involving key entities and their properties
  • To combine the extracted facts to form new facts or to gain valuable insights

Text mining finds applications in many diverse areas of wide interest such as drug discovery and predictive toxicology, protein interaction, competitive intelligence, protection of the citizen, identification of new product possibilities, detection of links between lifestyle and states of health, and many more.

The Present Need

Within the research community there is growing interest in developing and applying text mining tools to help researchers deal with both the growing size of the academic literature and the amount of data held in electronic text format. This interest is especially strong in biology and biomedicine. At the same time, developments in computational linguistics have led to a better understanding of the scientific and technical problems involved in text mining. 

The Centre will be initially focused on biological and biomedical science. This area of science has the largest user community and the fastest growing literature, and the area where most applications research in text mining is being undertaken. At the same time, the tools developed by the Centre will be of interest and relevant to the needs of the wider academic community. A major challenge for the Centre will be to handle efficiently and robustly very large volumes of text and the intermediate data produced while processing. 

The Centre will be housed in the under-construction £34M Manchester Interdisciplinary Biocentre to facilitate interaction between text mining researchers and bio-domain users. Further, the North-West Development Agency, the National Centre for e-Social Science, the Consortium for Post-Genome Science, and e-Science Northwest have been most supportive of the initiative.

Key Staff

Dr Sophia Ananiadou (Computing, Science and Engineering, Salford) Co-Director

Ms Julia Chruszcz (MIMAS, Manchester Computing) Associate Director

Professor John Keane (Computation, UMIST) Co-Director

Mr John McNaught (Computation, UMIST) Associate Director

Dr Paul Waltry (Library, University of Liverpool) Associate Director

  • Last updated on 07/01/09 by Kerry Ann Down