1 to deal with complex data via di erent

1 IntroductionThe capability of data storage (clound-based services), hardware technology (high-speed Central Processing Units (CPUs) and Graphic Processing Units (GPUs)) and computer vision algorithms (deep neuralnetwork (DNN), convolutional neural network (CNN) and recurrent neural networks (RNN)) have madeit possible to solve dicult tasks. As a speci c sub eld of machine learning, deep learning is a way todecode information from data by applying some useful tranformations via many layers of representationswhich can be done by speci c tasks such as classi cation 21, 3, 12. In this context, deep learningmeans multiple representations learning 21. Although machine learning techniques have been used indata mining for a long time 21, they are not able to handle unprocessed and complex data with higherdimensions 21. Support vector machines and decision trees are machine learning techniques, sometimescalled shallow learning, which tend to focus on learning only one or two layers of networks of the databy applying simple transformations 19, 25. Support vector machines are a classi cation techniquethat nd good decision boundaries between two sets of points belonging to two di erent categories byhigh-dimensional non-linear projections 19, 25. Decision trees are owchart-like structures that let usclassify input data points based on feature values or predict output values given inputs 19, 25.In deep learning, the multiple-layered representations are learned via models called neural networks.Neural networks or arti cial neural networks (ANNs) are processing unit systems that have complexlearning ability of brain 13. The ANNs are able to deal with complex data via di erent forms of linearand non-linear transformations by adjusting the number of parameters and connections’ weights 25.A CNN generally consists of three main neural layers (convolutional layers, pooling layers, and fullyconnected layers) 12. Figure 1 illustrates an architecture of a typical CNN which is structured asa series of stages. An essential advantage of a deep CNN is the automatic learning of task-speci crepresentations of the input data 2.Methods of processing image-based data are considered suitable for species identi cation 9, 17.Several workers have used CNNs in plant identi cation using vein morphological patterns 11, via optimization of transfer learning parameters 10 and taxon identi cation 2. Other more advanced methodsused in plant classi cation are discriminative deep feature learning 40, 34 and very deep convolutional networks by incorporating species and organ features to solve the multi-organ plant classi cationproblem 31, 24, 33. Deep neural networks have shown to be able to successfully identify, count anddescribe animal behaviours in camera-trap images 26.During the time rapid globalisation, many plant species have escaped outside their homes to newlands. Their impacts on their foreign environments are diverse but mostly harmful such as competitionwith native plant species on water and soil conditions, and they become invasive species 4. Knowledgeon identifying and reporting species without requiring expert’s involvement is important for rapid identi cation and successful control of invasive plant species and weeds. By simply collecting and processingimaging data from surveys, ecologists and environmental managers could exactly nd the locations,developmental stages and density of those species 8. This information is important for the application of herbicides or other control measures. Kaggle (https://www.kaggle.com/), among others, hasopened some competitions to perform deep learning algorithms that accurately identify invasive speciesin photos. It has been found that this problem is very tractable with deep CNNs.1Figure 1: Representation of deep learning for face image analysis(Source: https://www.edureka.co/blog/what-is-deep-learning)2 ObjectivesThe project will focus on exploring CNN models for the plant classi cation task including its implicationin the open-set recognition task by testing available and good data les.3 Literature ReviewIn this section, we briey introduce the ways deep learning work with data and then review studiesrelating to its application in ecology.3.1 How does deep learning work?To understand how does deep learning work, let us take an example from a CNN algorithm used onhuman’s face detection in Figure 1. In this illustrative example, four input units, two hidden units andone output unit are used. Like other machine learning techniques, the data is separated into a trainingset used for model tting and a test set used for checking the performance of the model(s). Local featureof images are identi ed automatically through input features based on the patterns of a local contrast.The hidden layer 1 separates each part of a local contrast into separated parts of the face. Finally,hidden layer 2 is able to apply those identi ed features to earch face 21. In practice, these layers ofinput features can contains many more units.To assess the performance of the deep learning models for classi cation tasks, confusion matrix andaccuracy are normally reported. A confusion matrix is an illustrative representation of predictions andactual outcomes (labels) of a classi cation model (or “classi er”) based on test data 27. There arefour outcomes: (1) true positives (TP) { positive prediction and positive label; (2) false positives (FP) {positive prediction and negative label; (3) true negatives (TN) { negative prediction and negative label;and (4) false negatives (FN) { negative prediction and positive label. Among these outcomes, a FP issimilar to a ype I error” and a FN is a ype II error” in statistical context 27. The more close toeach The accuracy is calculated by summing up the proportion of true positive rate with false positiverate over all four counts 6, 27 as follows: Accuracy = (TP + TN)/(TP + FP + FN + TN).23.2 Deep learning techniques used in ecologyApplications of deep learning algorithms in large-scale biodiversity monitoring 18 are important frompublic interests 15 but they are still challenging. These obstacles depend on data augmentation which isused to reduce model over tting 28 and expert knowledge 2. Taxon identi cation is an important stepin many plant ecological studies. Plant identi cation systems developed by computer vision researchershave helped ecologists to recognise and identify unknown plant species more rapidly 22. Recent workby 36 is the rst systematic literature review on the comparison of primary studies on computer visionapproaches for plant species identi cation. Several workers such as 23, 1 and 8 only focus on imagesof leaves of some species.Many imaging datasets on plants are available for public assess for research (Kaggle’s InvasivePlant Monitoring; Leaf Classi cation and Plant Seedlings Competition(https://www.kaggle.com/);Flavia(http://flavia.sourceforge.net/); ImageNet(http://www.image-net.org/); Google ImageSearch(https://images.google.com/) and iNaturalist(https://www.inaturalist.org/)) and competitions (Kaggle(https://www.kaggle.com/competitions) and ImageNet Large Scale Visual Recognition Competition (ILSVRC)(http://www.image-net.org/challenges/LSVRC/)) started from 2010.Krizhevsky et al. 20 developed a deep CNN algorithm to analyse large-scale images in ImageNetLSVRC-2010 competition 20. By using a technique called “dropout” 16, they could prevent over ttingby reduce the number of hidden layers in the network 20. With the tested dataset, their model gave abetter performance with error rates of 37.5% and 17.0% compared with previous algorithms such as 29who used a Fisher kernel framework.Lee et al. 23 presented a CNN approach to taxon identi cation based on leaf images and reportedan average accuracy of 99.7% on a dataset covering 44 species. Zhang et al. 39 used CNNs to classifythe Flavia dataset with data augmentation and obtained an accuracy of 94.69%. Reyes et al. 30 pretrained a CNN using 1.8 million images and used a ne-tuning strategy to transfer learned recognitioncapabilities from general domains to the speci c challenge of Plant Identi cation task with an averageaccuracy of 48.6%.Mehdipour Ghazi et al. 10 used deep CNNs with ne-tuning to identify the plant species captured ina photograph and evaluate di erent factors a ecting the performance of these networks. Three powerfuland popular deep learning architectures, namely GoogLeNet, AlexNet, and VGGNet, are used for thispurpose. The most important factor a ecting ne-tuning performance is the number of iterations.A successful application by Barre et al. 2 is LeafNet, a computer vision system for automatic plantspecies identi cation by learning discriminative features from leaf images along with a classi er for speciesidenti cation of plants 2. By comparing their results with customized systems like LeafSnapwe can showthat learning the features by a CNN can provide better feature representation for leaf images comparedto hand-crafted features 2. Sugata and Yang 32 developed Leaf App, a mobile leaf recognition systemwithout extracting leaf features by using a well-known deep CNN called VGG16 architecture with dataaugmentation. Their algorithm obtained an accuracy of 99.9% 32 higher than deep CNN by 23 (99.6%);CNN by 39 (94.6%); pulse-coupled neural network by 37 (96.67%); shape recognition method basedon radial basis probabilistic neural network by 7 (94%); and probabilistic neural network by 38 (90%).Lee et al. 22 investigated the use of deep learning to harvest discriminatory features from leafimages by learning, and apply them as classi ers for plant identi cation. They found that classi cationperformance is a ected when constraining the varieties of leaf data to be seen by CNN during training 22.Jordan and Ian 35 developed an open-source deep learning software for image-based plant phenotyping called Deep Plant Phenomics 35. Their application provides pre-trained neural networks forseveral common plant phenotyping tasks such as leaf counting task. Their application would be an usefulplatform for plant scientists to train models for their own phenotyping applications 35.In an e ort of using deep neutral networks to classify large-scale plant datasets (Google Search Image,Portuguese Flora and iNaturalist), Heredia 15 applied a CNN architecture developed by 14 by using50 convolutional layers (ResNet50) 15. The modi ed algorithm was able to classify nearly 50% of theobservations in iNaturalist dataset. Using the same set of image data, the ResNet architecture is able tobetter predict with higher accuracy than PlantNet, the most widespread online public plant classi cationalgorithm, by around a factor of 2 in top1 and top5 accuracies 15.Zuo and Wang 40 indicated that by combining discriminative information with the deep Independent Subspace Analysis feature learning framework, they can signi cantly improve the performance ofrecognizing similar tree species at a distance. Sun et al. 34 developed a 26-layer deep learning model3consisting of 8 residual building blocks is designed for large-scale plant classi cation in natural environment and their model achieves a recognition rate of 91.78% on the BJFU100 dataset, demonstrating thatdeep learning is a promising technology for smart forestry 34.Most other plant classi cation problems, such as in 23, 30 and 5 involve full-frame plant leafphotos with the background removed, rather than in a natural environment. One paper that investigatedclassifying plants in their natural environment used residual networks of various depths 34 but the plantswere usually always in full-frame. Literature on a need for classi cation without regional detection islimited.In conclusion, there are many CNN models used for the plant classi cation task. The project willtry develop an algorithm based on current models with data augmentation to apply to several availableand good datasets.References1 Aimen Aakif and Muhammad Faisal Khan. Automatic classi cation of plants based on their leaves.Biosystems Engineering, 139:66 { 75, 2015.2 Pierre Barre, Ben C. Stover, Kai F. Muller, and Volker Steinhage. Leafnet: A computer visionsystem for automatic plant species identi cation. Ecological Informatics, 40:50 { 56, 2017.3 B. Chandra and Rajesh K. Sharma. Fast learning in deep neural networks. Neurocomputing, 171:1205{ 1215, 2016.4 Je rey D. Corbin and Carla M. D’Antonio. Gone but not forgotten? invasive plants’ legacies oncommunity and ecosystem properties. Invasive Plant Science and Management, 5(1):117{124, 2012.5_Ilke C ugu, Eren Sener, C agr Erciyes, Burak Balc, Emre Akn, ItrOnal, and Ahmet Oguz Akyuz.Treelogy: A novel tree classi er utilizing deep and hand-crafted representations. arXiv preprintarXiv:1701.08291, 2017.6 Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scalehierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEEConference on, pages 248{255. IEEE, 2009.7 Jixiang Du, Deshuang Huang, Xiaofeng Wang, and Xiao Gu. Shape recognition based on radialbasis probabilistic neural network and application to plant species identi cation. In Jun Wang,Xiao-Feng Liao, and Zhang Yi, editors, Advances in Neural Networks { ISNN 2005, pages 281{285,Berlin, Heidelberg, 2005. Springer Berlin Heidelberg.8 Mads Dyrmann, Henrik Karstoft, and Henrik Skov Midtiby. Plant species classi cation using deepconvolutional neural network. Biosystems Engineering, 151:72 { 80, 2016.9 Kevin J Gaston and Mark A O’Neill. Automated species identi cation: why not? Philosophicaltransactions of the Royal society B: Biological sciences, 359(1444):655{667, 2004.10 Mostafa Mehdipour Ghazi, Berrin Yanikoglu, and Erchan Aptoula. Plant identi cation using deepneural networks via optimization of transfer learning parameters. Neurocomputing, 235:228 { 235,2017.11 Guillermo L. Grinblat, Lucas C. Uzal, Monica G. Larese, and Pablo M. Granitto. Deep learning forplant identi cation using vein morphological patterns. Computers and Electronics in Agriculture,127:418 { 424, 2016.12 Yanming Guo, Yu Liu, Ard Oerlemans, Songyang Lao, Song Wu, and Michael S. Lew. Deep learningfor visual understanding: A review. Neurocomputing, 187:27 { 48, 2016. Recent Developments onDeep Big Vision.13 Simon S Haykin, Simon S Haykin, Simon S Haykin, and Simon S Haykin. Neural networks andlearning machines, volume 3. Pearson Upper Saddle River, NJ, USA:, 2009.414 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages770{778, 2016.15 Ignacio Heredia. Large-scale plant classi cation with deep neural networks. In Proceedings of theComputing Frontiers Conference, pages 259{262. ACM, 2017.16 Geo rey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov.Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580,2012.17 Alexis Joly, Henning Muller, Herve Goeau, Herve Glotin, Concetto Spampinato, Andreas Rauber,Pierre Bonnet, Willem-Pier Vellinga, Robert B Fisher, and Robert Planque. Lifeclef: Multimedialife species identi cation. In [email protected] ICMR, pages 7{13, 2014.18 David J Klein, Matthew W McKown, and Bernie R Tershy. Deep learning for large scale biodiversitymonitoring. In Bloomberg Data for Good Exchange Conference, 2015.19 Sotiris B Kotsiantis, Ioannis D Zaharakis, and Panayiotis E Pintelas. Machine learning: a reviewof classi cation and combining techniques. Arti cial Intelligence Review, 26(3):159{190, 2006.20 Alex Krizhevsky, Ilya Sutskever, and Geo rey E Hinton. Imagenet classi cation with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097{1105,2012.21 Yann LeCun, Yoshua Bengio, and Geo rey Hinton. Deep learning. Nature, 521(7553):436{444,2015.22 Sue Han Lee, Chee Seng Chan, Simon Joseph Mayo, and Paolo Remagnino. How deep learningextracts and learns leaf features for plant classi cation. Pattern Recognition, 71:1 { 13, 2017.23 Sue Han Lee, Chee Seng Chan, Paul Wilkin, and Paolo Remagnino. Deep-plant: Plant identi -cation with convolutional neural networks. In Image Processing (ICIP), 2015 IEEE InternationalConference on, pages 452{456. IEEE, 2015.24 Sue Han Lee, Yang Loong Chang, Chee Seng Chan, and Paolo Remagnino. Plant identi cationsystem based on a convolutional neural network for the lifeclef 2016 plant classi cation task. InCLEF (Working Notes), pages 502{510, 2016.25 Ana C. Lorena, Luis F.O. Jacintho, Marinez F. Siqueira, Renato De Giovanni, Lucia G. Lohmann,Andre C.P.L.F. de Carvalho, and Missae Yamamoto. Comparing machine learning classi ers inpotential distribution modelling. Expert Systems with Applications, 38(5):5268 { 5275, 2011.26 Mohammad Sadegh Norouzzadeh, Anh Nguyen, Margaret Kosmala, Ali Swanson, Craig Packer,and Je Clune. Automatically identifying wild animals in camera trap images with deep learning.CoRR, abs/1703.05830, 2017.27 Josh Patterson and Adam Gibson. Deep Learning: A Practitioner’s Approach. ” O’Reilly Media,Inc.”, 2017.28 Luis Perez and Jason Wang. The e ectiveness of data augmentation in image classi cation usingdeep learning. arXiv preprint arXiv:1712.04621, 2017.29 Florent Perronnin, Jorge Sanchez, and Thomas Mensink. Improving the sher kernel for large-scaleimage classi cation. In Kostas Daniilidis, Petros Maragos, and Nikos Paragios, editors, ComputerVision { ECCV 2010, pages 143{156, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.30 Angie K. Reyes, Juan C. Caicedo, and Jorge E. Camargo. Fine-tuning deep convolutional networksfor plant recognition. In CEUR Workshop Proceedings, 2015.31 Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale imagerecognition. arXiv preprint arXiv:1409.1556, 2014.532 T L I Sugata and C K Yang. Leaf app: Leaf recognition with deep convolutional neural networks.IOP Conference Series: Materials Science and Engineering, 273(1):012004, 2017.33 Milan Sulc, Dmytro Mishkin, and Jiri Matas. Very deep residual networks with maxout for plantidenti cation in the wild. In CLEF, 2016.34 Yu Sun, Yuan Liu, Guan Wang, and Haiyan Zhang. Deep learning for plant identi cation in naturalenvironment. Computational intelligence and neuroscience, 2017:6 pages, 2017, 2017.35 Jordan R Ubbens and Ian Stavness. Deep plant phenomics: a deep learning platform for complexplant phenotyping tasks. Frontiers in plant science, 8:1190, 2017.36 Jana Waldchen and Patrick Mader. Plant species identi cation using computer vision techniques:A systematic literature review. Archives of Computational Methods in Engineering, Jan 2017.37 Zhaobin Wang, Xiaoguang Sun, Yaonan Zhang, Zhu Ying, and Yide Ma. Leaf recognition based onpcnn. Neural Computing and Applications, 27(4):899{908, May 2016.38 Stephen Gang Wu, Forrest Sheng Bao, Eric You Xu, Yu-Xuan Wang, Yi-Fan Chang, and QiaoLiang Xiang. A leaf recognition algorithm for plant classi cation using probabilistic neural network.In Signal Processing and Information Technology, 2007 IEEE International Symposium on, pages11{16. IEEE, 2007.39 Chaoyun Zhang, Pan Zhou, Chenghua Li, and Lijun Liu. A convolutional neural network for leavesrecognition using data augmentation. In Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligenceand Computing (CIT/IUCC/DASC/PICOM), 2015 IEEE International Conference on, pages 2143{2150. IEEE, 2015.40 Zhen Zuo and Gang Wang. Recognizing trees at a distance with discriminative deep feature learning.In Information, Communications and Signal Processing (ICICS) 2013 9th International Conferenceon, pages 1{5. IEEE, 2013.6