RT Journal Article T1 A Collection of Samples for Research in Google: Design and Application of a Sample Selection Method: Results and Problems of Research A1 Pedro Carañana, Joan AB This article examines the use and validity of Google’s search engine for the collection of corpuses of materials for research. To this end, it develops two interrelated themes. In the first section, a methodology is developed which is designed to identify universes in Google that meet the criteria and parameters required by an academic study. This methodology makes use of the search engine’s own logic and is applicable to most on-line document searches. The second section discusses the limitations and skewing of results arising from Google’s mode of operation which have an impact on the scientific validity of the universes it generates. This part focuses on the completeness and representativeness of the Google universes with regards to the full range of contents available on the Internet. YR 2012 FD 2012 LK https://hdl.handle.net/20.500.14352/44068 UL https://hdl.handle.net/20.500.14352/44068 LA eng NO Altman, A., & Tennenholtz, M. (2005). Ranking systems: the PageRank axioms. Proceedings of the 6th ACM conference on Electronic commerce, Vancouver, BC, Canada, 1-8. Retrieved from http://stanford.edu/~epsalon/pagerank.pdfAubuchon, V. (2009). Google ranking factors - SEO checklist. Retrieved from http://www.vaughns-1-pagers.com/internet/google-ranking-factors.htmAyuda para Webmasters de Google. (s.d).Cómo calcula Google el número de resultados? Retrieved from http://www.google.com/support/webmasters/bin/answer.py?hl=es&answer=70920Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509-12. Retrieved from http://www.barabasilab.com/pubs/CCNR-ALB_Publications/199910-15_Science-Emergence/199910-15_Science-Emergence.pdfBarfourosh, A., Anderson, M. L., Nezhad, H. R. M., & Perlis, D. (2002). Information retrieval on the World Wide Web and active logic: a survey and problem definition. Technical Report, CS-TR-4291. College Park, MD: University of Maryland, Computer Science Department. Retrieved from http://www.lib.umd.edu/drum/bitstream/1903/1153/1/CS-TR-4291.pdfBenkler, Y. (2006). The wealth of networks. How social production transforms markets and freedom. London and New Haven: Yale University Press.Bergman, M. K. (2001, August). White paper: The deep web: Surfacing hidden value. Journal of Electronic Publishing, 7(1). Retrieved from http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Proceedings of the seventh international conference on World Wide Web, Brisbane, Australia, 7, 107-117. Retrieved from http://infolab.stanford.edu/~backrub/google.htmlBrin, S., Motwani, R., & Winograd, T. (1998). The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab, 29 January. Retrieved from http://ilpubs.stanford.edu:8090/422/1/1999-66.pdfEdelman, B., & Zittrain, J. (2002, October 26). Localized google search result exclusions. Statement of issues and call for data. Berkman Center for Internet & Society. Retrieved from http://cyber.law.harvard.edu/filtering/google/Elmer, G. (2006). Re-tooling the network. Parsing the links and codes of the web world. Convergence, 12(1), 9-19.Fleischner, M. (2009). SEO made simple: Strategies for dominating the world's largest search engine. USA: Lightning Press.Gerhards, J., & Schäfer, M. S. (2010). Is the internet a better public sphere? Comparing old and new media in the USA and Germany. New Media & Society, 12(1), 143-60.Google (2010). Search engine optimization starter guide. Retrieved from http://www.google.com/webmasters/docs/search-engine-optimization-starter-guide.pdfGoogle Corporate Info (n.d.). Technology overview. Retrieved from http://www.google.com/intl/en/corporate/tech.htmlGoogle Centro para Webmasters (n.d.). Directrices para webmasters. Retrieved from http://www.google.com/support/webmasters/bin/answer.py?answer=35769Google Información Corporativa (n.d.). Visión general de la tecnología. Retrieved from http://www.google.es/intl/es/corporate/tech.htmlGulli, A., & Signorini, A. (2005). The indexable web is more than 11.5 billion pages. International World Wide Web Conference, Special interest tracks and posters of the 14th international conference on World Wide Web, Chiba, Japan, New York, NY: ACM. 902–903. Retrieved from http://www.di.unipi.it/~gulli/papers/f692_gulli_signorini.pdfHe, B., Patel, M., Zhang, Z., & Chang, K. C. C. (2007). Accessing the deep web: A survey. Communications of the ACM, 50(5), 95–101. Retrieved from http://brightplanet.com/images/uploads/Accessing%20the%20Deep%20Web%20-%20A%20Survey.pdfHindman, M., Tsioutsiouliklis, K., & Johnson, J. J. (2003). Googlearchy: How a few heavily-linked sites dominate politics on the web. Annual Meeting of the Midwest Political Science Association, Chicago, IL. Retrieved from http://www.cs.princeton.edu/~kt/mpsa03.pdfHuberman, B., & Adamic, L. (1999). Growth dynamics of the World Wide Web. Nature, no. 401, p. 131.Introna, L., & Nissenbaum, H. (2000). The public good vision of the Internet and the politics of search engines. In R. Rogers (Ed.), Preferred placement: Knowledge politics on the Web (pp. 25–47). Maastricht: Jan van Eyck Akademy.Lawrence, S., & Giles, C.L. (1999). Accessibility of information on the web. Nature, 400, 107-109. Retrieved from http://www.cse.ust.hk/zsearch/qualify/DistributedSearch/acessibility%20of%20information%20on%20the%20web.pdfMadhavan, J., Ko, D., Kot, L., Ganapathy, A., Rasmussen, A., & Halevy, A. (2008, August). Google’s deep web crawl. PVLDB, 23-28. Retrieved from http://cseweb.ucsd.edu/groups/sysnet/miscpapers/p1241-madhavan.pdfMager, A. (2009). Health information mediated health: sociotechnical practices of providing and using online health information. New Media & Society, 11(7), 1123-42.Martín Serrano, M. (1974). Nuevos métodos para la investigación de la estructura y la dinámica de la enculturización. REIS, 37, 23-83.Martín Serrano, M. (1977; 2008). La mediación social. Madrid: Akal.Martín Serrano, M. (1978a). Métodos actuales de la investigación social. Madrid: Akal.Martín Serrano, M. (1978b). Un método lógico para analizar los significados. REIS, 2, 21-51.Martín Serrano, M. (1986; 2004). La producción social de comunicación. Madrid: Alianza.Mediaciones sociales. (2007). Número monográfico, segundo semestre. Retrieved from http://www.ucm.es/info/mediars/MediacioneS1/Indice/indice.htmlPennock, D.M., Flake, G. W., Lawrence, S., Glover, E. J., & Giles, C. L. (2002). Winners don’t take all: Characterizing the competition for links on the web. Proceedings of the National Academy of Sciences, 99(8), 5207-5211. Retrieved from http://www.modelingtheweb.com/Schwartz, B. (2008). The Google quality raters handbook. Retrieved from http://searchengineland.com/the-google-quality-raters-handbook-13575Seale, C. (2005). New directions for critical Internet health studies: Representing cancer experience on the Web. Sociology of Health & Illness, 27(4), 515–40.Segal, D. (2010). A bully finds a pulpit on the web. Retrieved from http://www.nytimes.com/2010/11/28/business/28borker.htmlSherman, C., & Price, G. (2001). The invisible web. Uncovering information sources search engines can’t see. Medford, NJ: Information Today Inc.Smarty, A. (2009). Let’s try to find all 200 parameters in Google algorithm. Retrieved from http://www.searchenginejournal.com/200-parameters-in-google-algorithm/15457/Sweeny, E., Curran, K., & Xie, E. (2010). Automating information discovery within the invisible web. In J.T. Yao (Ed.), Web-based support systems (pp. 167-81). London: Springer-Verlag.Wouters, P. (2006). On the visibility of information on the Web: an exploratory experimental approach. Research Evaluation, 15(2), 107-15. NO Documento vinculado con el proyecto de I+D+i “La producción social de la comunicación y la reproducción social en la era de la globalización” (ref. CSO2010-22104-C03-01). Este proyecto ha sido financiado por el Ministerio de Ciencia e Innovación (convocatoria competitiva del Plan Nacional de I+D+i 2008-2011 - Programa de Proyectos de Investigación Fundamental) y realizado por el grupo de investigación de la Universidad Complutense de Madrid “Identidades sociales y comunicación” desde el año 2011 al 2014. En E-Prints se han depositado los trabajos relacionados con el proyecto (véase “Trabajos relacionados con el proyecto de I+D+i La producción social de la comunicación y la reproducción social en la era de la globalización” (http://eprints.ucm.es/24131/). NO Ministerio de Ciencia e Innovación (convocatoria competitiva del Plan Nacional de I+D+i 2008-2011 - Programa de Proyectos de Investigación Fundamental) DS Docta Complutense RD 28 abr 2024