1. médialab Sciences Po
  2. News
  3. Build a web corpus with Hyphe!

Build a web corpus with Hyphe!

Hyphe was designed to propose researchers and students a web corpus curation tool featuring a research-driven web crawler.

Post

Mapping controversies can be highly facilitated by studying it from the prism of the web. Analysing the websites of the actors of a controversy and establishing a map from the links between them can be a source of great knowledge, although it can be quite complex to realize, especially for social scientists.Built as a free software available on GitHub, Hyphe was designed to propose researchers and students a web corpus curation tool featuring a research-driven web crawler. It provides users with a method to build web corpora with both granularity, flexibility and simple curation principles.Rather than websites, Hyphe manipulates WebEntities, which can be defined as a single page, as well as a subdomain, a combination of websites, and so on. Webpages relying within these WebEntities can then be crawled, in order to collecting all out-bounding links and text within the webpages of the entity. Most cited discovered WebEntities are then prospectable to enlarge your corpus before visualizing it as a network and export it for refinement within Gephi and publication with manylines.We release today a new version of Hyphe which handles multiple corpora simultaneously and proposes a brand new web front implemented as an HTML5 User Interface.Discover Hyphe on its dedicated website, try our demo or install it on your lab's servers!