森林生態(tài)站大數(shù)據(jù)快速存儲(chǔ)與索引方法

doi:10.6041/j.issn.1000-1298.2021.08.019

首頁(yè) > 過(guò)刊瀏覽>2021年第52卷第8期 >195-204,，212. DOI:10.6041/j.issn.1000-1298.2021.08.019

森林生態(tài)站大數(shù)據(jù)快速存儲(chǔ)與索引方法
DOI:
                        10.6041/j.issn.1000-1298.2021.08.019
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者單位:
作者簡(jiǎn)介:
通訊作者:
中圖分類號(hào):
基金項(xiàng)目:中央高校基本科研業(yè)務(wù)費(fèi)專項(xiàng)資金項(xiàng)目（BLX201923）和國(guó)家自然科學(xué)基金項(xiàng)目（32071775）

Fast Storage and Indexing Method of Big Data in Forest Ecological Station

Author:

Affiliation:

Fund Project:

摘要

圖/表

訪問(wèn)統(tǒng)計(jì)

參考文獻(xiàn)

相似文獻(xiàn)

引證文獻(xiàn)

資源附件

文章評(píng)論

摘要:

針對(duì)森林生態(tài)站中大量圖像,、視頻,、GIS數(shù)據(jù)等非結(jié)構(gòu)化數(shù)據(jù)以及生態(tài)指標(biāo)等結(jié)構(gòu)化數(shù)據(jù)存儲(chǔ)效率低、檢索性能差的問(wèn)題,，提出了基于Hadoop和HBase的森林生態(tài)站大數(shù)據(jù)存儲(chǔ)框架,。基于所提出的框架,，給出了森林生態(tài)數(shù)據(jù)存儲(chǔ)業(yè)務(wù)流程,，并對(duì)森林生態(tài)大數(shù)據(jù)平臺(tái)涉及的核心技術(shù)進(jìn)行了優(yōu)化:①設(shè)計(jì)預(yù)分區(qū)算法保證數(shù)據(jù)在集群中均勻分布。②根據(jù)生態(tài)數(shù)據(jù)特點(diǎn)科學(xué)設(shè)計(jì)了RowKey,，實(shí)現(xiàn)生態(tài)數(shù)據(jù)的快速檢索,。③針對(duì)原生HBase不支持多條件查詢問(wèn)題，設(shè)計(jì)基于索引數(shù)據(jù)和服務(wù)器性能評(píng)估的ElasticSearch索引分片放置策略,，以此基于ElasticSearch的二級(jí)非主鍵索引技術(shù)優(yōu)化多條件檢索HBase生態(tài)數(shù)據(jù)庫(kù),。④針對(duì)生態(tài)站海量小圖像存儲(chǔ)困難問(wèn)題，提出基于數(shù)據(jù)站點(diǎn)及時(shí)間關(guān)聯(lián)性的打包合并策略,。⑤解析GIS數(shù)據(jù)使之進(jìn)行高效存儲(chǔ),。通過(guò)實(shí)驗(yàn)對(duì)以上理論進(jìn)行驗(yàn)證。結(jié)果表明,，ElasticSearch索引分片放置策略比默認(rèn)分片策略的查詢時(shí)間平均減少20 ms,，比基于改變ElasticSearch評(píng)分策略的查詢時(shí)間平均減少20 ms。結(jié)構(gòu)化數(shù)據(jù)規(guī)模為1×108條時(shí),，系統(tǒng)的檢索時(shí)間為1.045 s,，比原生HBase檢索速度提升3.99倍，在非結(jié)構(gòu)化數(shù)據(jù)為1×107條時(shí),，采用數(shù)據(jù)站點(diǎn)及時(shí)間關(guān)聯(lián)性的打包小圖像策略是基于SequenceFile合并效率的1.15倍,，是原生HBase的1.79倍;在1×104次并發(fā)用戶的情況下，優(yōu)化后的每秒查詢數(shù)是原來(lái)的1.88倍，每秒吞吐量是優(yōu)化前的1.74倍,，系統(tǒng)響應(yīng)時(shí)間比優(yōu)化前降低69.5%,。結(jié)果表明，本文所提出的方案在集群負(fù)載均衡,、海量結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)檢索效率以及系統(tǒng)吞吐量等方面都有了明顯的性能提升,，為森林生態(tài)數(shù)據(jù)的存儲(chǔ)和管理提供了必要的理論基礎(chǔ)和技術(shù)實(shí)現(xiàn)。

Abstract:

Aiming at the problems of low storage efficiency and poor retrieval performance of a large number of unstructured data such as images, videos, GIS data and ecological indicators in the forest ecological station, a forest ecological station big data storage framework was proposed based on Hadoop and HBase. Based on the proposed framework, the business process of forest ecological data storage was given and the core technologies involved in the forest ecological big data platform was optimized.A pre-partitioning algorithm was designed to ensure that the data was evenly distributed in the cluster. According to the characteristics of ecological data, the RowKey was scientifically designed to achieve rapid retrieval of ecological data. Aiming at the problem that native HBase did not support multi-condition query, an ElasticSearch index shard placement strategy was designed based on index data and server performance evaluation, and the multi-condition search HBase ecological database was optimized based on ElasticSearch＇s secondary non-primary key index technology. In view of the difficulty of storing large amounts of small pictures in the ecological station, a package and merge strategy was proposed based on data sites and time relevance. GIS data was analyzed for efficient storage. The above theory was verified through experiments. The results showed that the ElasticSearch index shard placement strategy reduced the query time by an average of 20 ms compared with the default shard strategy. The average query time was reduced by 20 ms compared with that based on changing the ElasticSearch scoring policy. When the structured data size was 1×108, the retrieval time of the system was 1.045 s, which was 3.99 times faster than the native HBase retrieval, and when the unstructured data was 1×107 pieces, the based on data site and time correlation package small picture strategy was 1.15 times that of SequenceFile-based merging efficiency and 1.79 times that of native HBase.In the case of 1×104 concurrent users, after optimization, the number of queries per second was 1.88 times as much as before, the throughput per second was 1.74 times as much as before, and the system response time was 69.5% lower than that before optimization. From the above results, it can be seen that the solution proposed had significant performance improvements in cluster load balancing, massive structured and unstructured data retrieval efficiency, and system throughput, which provided the necessary theoretical foundation and technical realization for the storage and management of forest ecological data.

參考文獻(xiàn)

相似文獻(xiàn)

引證文獻(xiàn)

引用本文

王新陽(yáng),賈相宇,陳志泊,崔曉暉,許福.森林生態(tài)站大數(shù)據(jù)快速存儲(chǔ)與索引方法[J].農(nóng)業(yè)機(jī)械學(xué)報(bào),2021,52(8):195-204,，212. WANG Xinyang, JIA Xiangyu, CHEN Zhibo, CUI Xiaohui, XU Fu. Fast Storage and Indexing Method of Big Data in Forest Ecological Station[J]. Transactions of the Chinese Society for Agricultural Machinery,2021,52(8):195-204,，212.

復(fù)制

文章指標(biāo)

點(diǎn)擊次數(shù):
下載次數(shù):
HTML閱讀次數(shù):
引用次數(shù):

歷史

收稿日期:2021-02-08
最后修改日期:
錄用日期:
在線發(fā)布日期: 2021-08-10
出版日期:

期刊瀏覽

EI收錄結(jié)果

引用本文

分享

文章指標(biāo)

歷史

文章二維碼