Java实现Amazon数据抓取(包括Signature生成)

2 Comments

Amazon目前已提供了N多服务,这里只涉及其中的Product Advertising API。通过它,你可以访问Amazon的数据库,实现很多有用的功能,如:获取商品信息、买家/卖家的评论、还可以搜索物品、促销信息等,这些数据有助于建立你自己的电子商务网站。

下面简单讲述一下用Java语言访问Product Advertising API

首先,你要先去 http://aws.amazon.com 注册自己的账号,注册后记住系统生成的Access Key ID和 Secret Access Key,这两个key值在代码中会用到。这时你可以到Product Advertising API Signed Requests Helper测试一下这两个Key是否可用,顺便也能了解了解Amazon设定的request和response结构。

然后下载Product Advertising API Java Client Side Library(需要Java6),这个lib可以简化开发。下载步骤:

  1. Go to the directory where you want to generate the stubs and create a “build” directory and a “src” directory.

    All of the generated source code will go under “src” folder.

  2. If you are using Eclipse 3.2, create a custom binding to disable “Wrapper Style” code generation.

    <jaxws:bindings wsdlLocation="http://ecs.amazonaws.com/AWSECommerceService/AWSECommerceService.wsdl" xmlns:jaxws="http://java.sun.com/xml/ns/jaxws">
      <jaxws:enableWrapperStyle>false</jaxws:enableWrapperStyle>
    </jaxws:bindings>

    This step is necessary because Eclipse 3.2 does not support wrapper style generated code. However, if you are an IDE that does support wrapper style generated code, such as NetBeans, this step is not required.

  3. Run the command:

    wsimport -d ./build -s ./src  -p com.ECS.client.jax http://ecs.amazonaws.com/AWSECommerceService/AWSECommerceService.wsdl -b jaxws-custom.xml .

    You can find the generated stubs in the path, com.ECS.client.jax .

由于2009年8月后,Amazon升级了WebService,对request添加了更为严格的安全机制——Signature,所以还需要下载两个辅助文件:

Attachment awshandlerresolver.java (4.2 K) Attachment commons-codec-1.3.jar (45.6 K)

接下来,就可以写demo了,记得import刚刚下载的开发包和辅助代码。

DEMO1. 先写个Search的例子,搜索ASIN为0596007124的物品

示例代码如下:

public class Info {
 
	public static void main(String[] args) {
		new Info();
	}
 
	public Info() {
 
		// Initialize Web Service
		AWSECommerceService service = new AWSECommerceService();
		service.setHandlerResolver(new AwsHandlerResolver("<Your Secret Key>"));
 
		// Create Web Service Connection
		AWSECommerceServicePortType port = service.getAWSECommerceServicePort();
 
		// Add Parameters for the Item Lookup
		ItemLookupRequest request = new ItemLookupRequest();
		request.getItemId().add("0596007124");
		request.setIdType("ASIN");
		request.getResponseGroup().add("Large");
 
		// Wrap Request in Lookup Body
		ItemLookup body = new ItemLookup();
		body.setAWSAccessKeyId("<Your Access ID>");
		body.setShared(request);
 
		// Assign Results to a Response Object
		ItemLookupResponse response = port.itemLookup(body);
		Item item = response.getItems().get(0).getItem().get(0);
	}
}

上面的例子中ResponseGroup只添加了“Large”,意思是返回结果中包括物品的大图,除此以外,你还可以添加更多的返回信息,如:BrowseNodes,EditorialReview,ItemAttributes等。

DEMO2. 抓取Amazon某个目录下的所有物品。这种抓取需要知道要抓的目录id号,即nodeid,这个可以通过分析Amazon的url获得。
比如:Books Professional & Technical Architecture Materials 这个目录,它的url是

http://www.amazon.com/gp/search/ref=sr_nr_n_10?rh=i:stripbooks,n:!1000,n:173507,n:173508,n:1058&bbn=173508&ie=UTF8&qid=1262157489&rnid=173508

,它的nodeid就是1058

示例代码如下:

public class InfoScrap {
 
	public static void main(String[] args) {
		new InfoScrap();
	}
 
	public InfoScrap() {
 
		// Initialize Web Service
		AWSECommerceService service = new AWSECommerceService();
		service.setHandlerResolver(new AwsHandlerResolver("<Your Secret Key>"));
 
		// Create Web Service Connection
		AWSECommerceServicePortType port = service.getAWSECommerceServicePort();
 
		// Add Parameters for the Item Search
		ItemSearchRequest request = new ItemSearchRequest();
		java.util.List<String> attrs = new ArrayList<String>();
		attrs.add("Request");
		attrs.add("BrowseNodes");
		attrs.add("EditorialReview");
		attrs.add("ItemAttributes");
		attrs.add("Large");
		request.getResponseGroup().addAll(attrs);
		request.setBrowseNode("1058");
		request.setSearchIndex("Books");// Only search Book
 
		// Get browse node page count
		int page = 0;
 
		ItemSearch itemSearch = new ItemSearch();
		itemSearch.setAWSAccessKeyId("<Your Access ID>");
		itemSearch.getRequest().add(request);
		ItemSearchResponse response = port.itemSearch(itemSearch);
		java.util.List<Items> itemssList = response.getItems();
		for (Items items : itemssList) {
			page = items.getTotalPages().intValue();
		}
 
		// Start Scrapping Items per page
		for (int i = 1; i < page; i++) {
			request.setItemPage(BigInteger.valueOf(i));
			port = service.getAWSECommerceServicePort();
			itemSearch = new ItemSearch();
			itemSearch.setAWSAccessKeyId("<Your Access ID>");
			itemSearch.getRequest().add(request);
			response = port.itemSearch(itemSearch);
			itemssList = response.getItems();
			for (Items items : itemssList) {
				java.util.List<Item> itemsList = items.getItem();
				for (Item item : itemsList) {
					// callback(item);
					System.out.println("Complete " + i + "/" + page);
				}
			}
		}
	}
}

相关链接:
Product Advertising API
amazon-product-advertising-api-sample

http://docs.amazonwebservices.com/AWSECommerceService/2009-10-01/GSG/index.html?ImplementinganA2SRequest.html

2 Comments (+add yours?)

  1. Ting
    Dec 30, 2009 @ 16:26:10

    急需一个能更好显示code的插件~有空找找看

    [Reply]

  2. hzhjun
    Dec 31, 2009 @ 09:36:10

    runcode很不错的,可以试试!

    [Reply]

Leave a Reply