The Closer the Better: Similarity of Publication Pairs at Different Cocitation Levels
Corresponding Author
Giovanni Colavizza
Digital Humanities Laboratory, École Polytechnique Fédérale de Lausanne (CH)
E-mail: [email protected]Search for more papers by this authorNees Jan van Eck
Centre for Science and Technology Studies, Leiden University (NL)
Search for more papers by this authorLudo Waltman
Centre for Science and Technology Studies, Leiden University (NL)
Search for more papers by this authorCorresponding Author
Giovanni Colavizza
Digital Humanities Laboratory, École Polytechnique Fédérale de Lausanne (CH)
E-mail: [email protected]Search for more papers by this authorNees Jan van Eck
Centre for Science and Technology Studies, Leiden University (NL)
Search for more papers by this authorLudo Waltman
Centre for Science and Technology Studies, Leiden University (NL)
Search for more papers by this authorAbstract
We investigated the similarities of pairs of articles that are cocited at the different cocitation levels of the journal, article, section, paragraph, sentence, and bracket. Our results indicate that textual similarity, intellectual overlap (shared references), author overlap (shared authors), proximity in publication time all rise monotonically as the cocitation level gets lower (from journal to bracket). While the main gain in similarity happens when moving from journal to article cocitation, all level changes entail an increase in similarity, especially section to paragraph and paragraph to sentence/bracket levels. We compared the results from four journals over the years 2010–2015: Cell, the European Journal of Operational Research, Physics Letters B, and Research Policy, with consistent general outcomes and some interesting differences. Our findings motivate the use of granular cocitation information as defined by meaningful units of text, with implications for, among others, the elaboration of maps of science and the retrieval of scholarly literature.
References
- Baez, M., Mirylenka, D., & Parra, C. (2011). Understanding and supporting search for scholarly knowledge. In Proceedings of the 7th European Computer Science Summit (pp. 1–8).
- Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Research-paper recommender systems: A literature survey. International Journal on Digital Libraries, 17, 305–338.
- Boyack, K.W., & Klavans, R. (2014). Creation of a highly detailed, dynamic, global model and map of science. Journal of the Association for Information Science and Technology, 65, 670–685.
- Boyack, K.W., Newman, D., Duhon, R.J., Klavans, R., Patek, M., Biberstine, J.R., … Börner, K. (2011). Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLoS One, 6, e18029.
- Boyack, K.W., Small, H., & Klavans, R. (2013). Improving the accuracy of cocitation clustering using full text: Improving the accuracy of cocitation clustering using full text. Journal of the American Society for Information Science and Technology, 64, 1759–1767.
- Callahan, A., Hockema, S., & Eysenbach, G. (2010). Contextual cocitation: Augmenting cocitation analysis and its applications. Journal of the American Society for Information Science and Technology, 61, 1130–1143.
- Colavizza, G., Boyack, K.W., van Eck, N.J., & Waltman, L. (2017). Exploring the similarity of articles cocited at different levels. In Proceedings of the 16th International Society of Scientometrics and Informetrics Conference.
- De Solla Price, D. (1970). Citation measures of hard science, soft science, technology, and nanoscience. In C.E. Nelson & D.K. Pollock (Eds.), Communication among scientists and engineers (pp. 3–22). Lexington, MA: Heath Lexington Books.
- Ding, Y., Chowdhury, G., & Foo, S. (2000). Journal as markers of intellectual space: Journal cocitation analysis of information retrieval area, 1987–1997. Scientometrics, 47, 55–73.
- Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., & Zhai, C. (2014). Content-based citation analysis: The next generation of citation analysis. Journal of the Association for Information Science and Technology, 65, 1820–1833.
- Doslu, M., & Bingol, H.O. (2016). Context sensitive article ranking with citation context analysis. Scientometrics, 108, 653–671.
- Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., & Radev, D.R. (2008). Blind men and elephants: What do citation summaries tell us about a research article? Journal of the American Society for Information Science and Technology, 59, 51–62.
-
Eto, A. (2016). Rough cocitation as a measure of relationship to expand cocitation networks for scientific paper searches. Proceedings of the Association for Information Science and Technology, 53, 1–4.
10.1002/pra2.2016.14505301131 Google Scholar
- Gipp, B., & Beel, J. (2009). Citation Proximity Analysis (CPA): A new approach for identifying related work based on cocitation analysis. In Proceedings of the 12th International Conference on Scientometrics and Informetrics (vol. 2, pp. 571–575).
- He, Q., Pei, J., Kifer, D., Mitra, P., & Giles, L. (2010). Context-aware citation recommendation. In Proceedings of the 19th cocit (pp. 421–430). New York: ACM.
- Hernández-Alvarez, M., & Gomez, J.M. (2016). Survey about citation context analysis: Tasks, techniques, and resources. Natural Language Engineering, 22, 327–349.
- Hicks, D. (1999). The difficulty of achieving full coverage of international social science literature and the bibliometric consequences. Scientometrics, 44, 193–215.
- Hyland, K. (1999). Academic attribution: Citation and the construction of disciplinary knowledge. Applied Linguistics, 20, 341–367.
- Jha, R., Jbara, A., Quazvinian, V., & Radev, D.R. (2016). NLP-driven citation analysis for scientometrics. Natural Language Engineering, 23, 93–130.
- Klavans, R., & Boyack, K.W. (2017). Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge? Journal of the Association for Information Science and Technology, 68, 984–998.
- Liang, Y., Li, Q., & Qian, T. (2011). Finding relevant papers based on citation relations. In Proceedings of the International Conference on Web-Age Information Management (pp. 403–414). Berlin: Springer.
- Liu, S., & Chen, C. (2012). The proximity of cocitation. Scientometrics, 91, 495–511.
- Liu, S., Chen, C., Ding, K., Wang, B., Xu, K., & Lin, Y. (2014). Literature retrieval based on citation context. Scientometrics, 101, 1293–1307.
- Marshakova, S.I. (1973). System of document connections based on references. Scientific and Technical Information Serial of VINITI, 6, 3–8.
- McCain, K.W. (1991). Mapping economics through the journal literature: an experiment in journal cocitation analysis. Journal of the American Society for Information Science, 42, 290–296.
- McKeown, K., Daume, H., Chaturvedi, S., Paparrizos, J., Thadani, K., Barrio, P., … Teufel, S. (2016). Predicting the impact of scientific concepts using full-text features. Journal of the Association for Information Science and Technology, 67, 2684–2696.
- Nanba, H., Kando, N., & Okumura, M. (2000). Classification of research papers using citation links and citation types: towards automatic review article generation. In Proceedings of the 11th ASIS SIG/CR Classification Research Workshop (pp. 117–134).
- Qazvinian, V., & Radev, D.R. (2008). Scientific paper summarization using citation summary networks. In Proceedings of the 22nd International Conference on Computational Linguistics (vol. 1, pp. 689–696).
- Qazvinian, V., Radev, D.R., Mohammad, S., Dorr, B.J., Zajic, D.M., Whidby, M., & Moon, T. (2013). Generating extractive summaries of scientific paradigms. Journal of Artificial Intelligence Research, 46, 165–201.
- Ribas, S., Ueda, A., Santos, R.L.T., Ribeiro-Neto, B., & Ziviani, N. (2016). Simplified relative citation ratio for static paper ranking. In Proceedings of the International Conference on Web Search and Data Mining.
- Schwarzer, M., Schubotz, M., Meuschke, N., Breitinger, C., Markl, V., & Gipp, B. (2016). Evaluating link-based recommendations for Wikipedia. In Proceedings of the Joint Conference on Digital Library, Newark, NJ (pp. 191–200).
- Small, H. (1973). Cocitation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24, 265–269.
- Small, H. (2010). Maps of science as interdisciplinary discourse: Cocitation contexts and the role of analogy. Scientometrics, 83, 835–849.
- Small, H. (2011). Interpreting maps of science using citation context sentiments: a preliminary investigation. Scientometrics, 87, 373–388.
- Small, H., Tseng, H., & Patek, M. (2017). Discovering discoveries: Identifying biomedical discoveries using citation contexts. Journal of Informetrics, 11, 46–62.
- Spark, J.K., Walker, S., & Robertson, S.E. (2000a). A probabilistic model of information retrieval: Development and comparative experiments. Part 1. Information Processing and Management, 36, 779–808.
- Spark, J.K., Walker, S., & Robertson, S.E. (2000b). A probabilistic model of information retrieval: Development and comparative experiments. Part 2. Information Processing and Management, 36, 809–840.
- Tran, N., Alves, P., Ma, S., & Krauthammer, M. (2009). Enhancing PubMed related article search with sentence level cocitations. In Proceedings of the AMIA 2009 Symposium (pp. 650–654).
- White, H.D., & Griffith, B.C. (1981). Author cocitation: A literature measure of intellectual structure. Journal of the American Society for Information Science, 32, 163–171.
- Woodruff, A., Gossweiler, R., Pitkow, J., Chi, E.H., & Card, S.K. (2000). Enhancing a digital book with a reading recommender. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 153–160). New York: ACM.
- Zhou, D., Zhu, S., Yu, K., Song, X., Tseng, B., L., Zha, H., & Giles, C.L. (2008). Learning multiple graphs for document recommendations. In Proceedings of the 17th cocit (pp. 141–150). New York: ACM.