Reconceptualising searching and screening: How new technologies might change the way that we identify studies morePoster presentation at the 2011 Cochrane Colloquium.
|
90 views |
Systematic review methodology, Information Science, Meta-analysis and systematic review (Health Sciences), Meta-Analysis and Systematic Review, and Text Mining
Reconceptualising searching and screening: How new technologies might change the way that we identify studies
James Thomas & Alison O’Mara-Eves EPPI-Centre, Institute of Education
The research problem A typical review deals with the ‗information explosion‘ by narrowing the search for studies (e.g., applying search filters). Relevant evidence can be missed through this approach. The alternative that minimises the risk of missing relevant studies is to search broadly and screen potentially tens of thousands of records, which is not always practical. Resource-efficient approaches that maximise sensitivity are needed. Figure 1. Screening during ‘active learning’ Reviewers screen the titles and abstracts of each item, selecting the appropriate criterion on the left and hitting ‗next‘ to advance to the next item to screen. When the count reaches 25, the classifier is run automatically and a new list of items to screen is generated, with the most informative items at the beginning of the list.
Objectives To evaluate whether new technologies allow us to search broadly without increasing the screening workload through semi-automated screening approaches. To evaluate two types of text mining: TerMine term clustering (http://www.nactem.ac.uk/software/termine/) in prioritising records for screening and a support vector machine using active learning (Wallace et al., 2010) for automatically classifying the records as relevant or not relevant to the review.
Approach The database Text mining techniques were employed in an ongoing review on young people‘s access to tobacco (Sutcliffe et al. 2011). Our broad, sensitive searches yielded over 38,000 unique titles and abstracts. Screening prioritisation We used the small number of relevant studies that we already knew about (8 in total) as the basis of a search to identify similar papers. We then conducted a search for those terms, using term recognition software, (TerMine©), in the titles and abstracts of all the studies on the database, giving more weight in the search to those terms with higher c-values (which represent how significant a given term is in the document in question; see Frantzi et al. 2000 for further details). Classification of records We adapted the active learning approach described by Wallace and colleagues (2010). The process started with a purposive search for a representative set of ‗includes‘ that provided a ‗training dataset‘ for the classifier. We supplemented the active learning process with a screening prioritisation approach, using the TerMine© term recognition software . From this set of training includes, the classifier learnt key terms that were common to included studies (e.g., ―tobacco‖ and ―young people‖ ). The reviewers were asked to manually screen the most informative titles and abstracts as identified by the software, based on the ones it knows about already. See Figure 1 for a screenshot of the EPPI-Reviewer Software 4 used to conduct this process. The classifier then ‗learnt‘ what an include ‗looks like‘ and classified records as includes or excludes accordingly. Assessing performance Screening prioritisation was assessed by comparison with a ‗baseline inclusion rate‘ and through power calculations. We established a ‗baseline inclusion rate‘ of 1.81% based on a random sample of 661 titles and abstracts that we screened manually. We calculated that this would be an appropriate sample size using standard power calculation methods. Classification was assessed through the stability of the classifier and the calculation of performance metrics (precision, recall, F-values). See www.eppi.ioe.ac.uk/cms/er4/ for more details on the software used. Results Screening prioritisation From our baseline inclusion rate, we expected that 1.81%, or about 652, of our 36,000 studies would be relevant. After using the prioritisation method described above to screen a little over 9,100 titles and abstracts manually, we had marked 656 as being potentially relevant: a rate of 7.16%. Classification of records The classifier achieved a good rate of stability. Most documents were consistently classified as either an include or exclude across 8 classification runs. The proposed performance metrics were not suitable for assessing the accuracy of classification in an in-progress review. Conclusions Text mining enabled us to identify the expected number of relevant studies with only 25% of the usual manual work, saving time. Prioritised screening allows the full-text document retrieval process to begin sooner, which can help prevent disruptions to workflow caused by delays in accessing copies of documents . One possible limitation is that it is impossible to know whether everything that was relevant has been found – short of reading all 36,000 titles and abstracts. Further evaluative work is needed before we are able to be more definitive. Work on developing performance assessment methods for use in ‗live‘ reviews is also needed. This method is highly promising and may save significant time and money, enabling research to be made available to policy and practice in a more timely way than can be achieved currently. Text mining shifts the emphasis of identification from the searching stage to screening. The bespoke nature of text mining tools allows greater control over the reasons for potentially missing relevant studies than can be achieved by narrowing the search process.
References. Frantzi, K. T., Ananiadou, S. & Mima, H. (2000). Automatic Recognition of Multi-Word Terms: the C-value/NC-value method. International Journal on Digital Libraries, 3, 115—130. Sutcliffe K., Brunton G., Twamley K. , Hinds .K, O‘Mara-Eves A. , Thomas J. (2011). Young people’s access to tobacco: a mixed-method systematic review. London: EPPI Centre, Social Science Research Unit, Institute of Education, University of London. Wallace, B. C., Small, K., Brodley, C. E., & Trikalinos, T. A. (2010). Active Learning for Biomedical Citation Screening. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). Funding. The tobacco sources review (Sutcliffe et al., 2011) was funded by the UK Department for Health.