Web Scrapping With Golang

Web scrapping is a technic to parse HTML output of website. Most of the online bots are based on same technic to get required information

Web scrapping is a technic to parse HTML output of website. Most of the online bots are based on same technic to get required information about particular website or page.


Using XML parser we can parse HTML page and get the required information. However, jquery selector are best to parse HTML page. So, in this tutorial we will be using Jquery library in Golang to parse the HTML doc.

Project Setup and dependencies

As mention above, we will be using Jquery library as a parser. So go get the library using following command
go get github.com/PuerkitoBio/goquery

Create a file webscraper.go and open it in any of your favorite text editor.

Web Scraper code to get post from website
package main

import (
// import standard libraries
"fmt"

// import third party libraries
"github.com/PuerkitoBio/goquery"
)

func postScrape() {
doc, err := goquery.NewDocument("http://code2succeed.com")
if err != nil {
log.Fatal(err)
}

// use CSS selector found with the browser inspector
// for each, use index and item
doc.Find("#main article .entry-title").Each(func(index int, item *goquery.Selection) {
title := item.Text()
linkTag := item.Find("a")
link, _ := linkTag.Attr("href")
fmt.Printf("Post #%d: %s - %s\n", index, title, link)
})
}

func main() {
postScrape()
}

Output
Post #0:
Getting started with ReactJs
- http://www.code2succeed.com/getting-started-with-reactjs/
Post #1:
Intro to React
- http://www.code2succeed.com/intro-to-react/
Post #2:
Caesar Decryption of string using javascript
- http://www.code2succeed.com/caesar-decryption-of-string-using-javascript/
Post #3:
Caesar encryption of string using JavaScript
- http://www.code2succeed.com/caesar-encryption-of-string-using-javascript/

On a mission to build Next-Gen Community Platform for Developers