محدودیت های عرض زیاد شبکه های عصبی - ویکی‌پدیا، دانشنامهٔ آزاد

شبکه‌های عصبی مصنوعی دسته‌ای از مدل‌های مورد استفاده در یادگیری ماشین هستند و از شبکه‌های عصبی بیولوژیکی الهام گرفته‌اند. آنها جزء اصلی الگوریتم‌های یادگیری عمیق مدرن هستند. محاسبات در شبکه‌های عصبی مصنوعی معمولاً در لایه‌های متوالی نورون‌های مصنوعی سازماندهی می‌شوند. به تعداد نورون‌های یک لایه، عرض لایه می‌گویند. تحلیل نظری شبکه‌های عصبی مصنوعی گاهی اوقات این مورد محدود کننده را در نظر می‌گیرد که عرض لایه بزرگ یا بی‌نهایت می‌شود. این محدودیت، گزاره‌های تحلیلی ساده‌ای را در مورد پیش‌بینی‌های شبکه عصبی، آموزش پویا، تعمیم و زیان سطوح امکان می‌دهد. این محدودیت لایه گسترده نیز مورد توجه عملی است، زیرا شبکه‌های عصبی با عرض محدود اغلب با افزایش عرض لایه عملکرد بهتری دارند.^[۱]^[۲]^[۳]^[۴]^[۵]^[۶]

رویکردهای نظری مبتنی بر محدودیت عرض زیاد[ویرایش]

فرآیند گاوسی شبکه عصبی (NNGP) مطابق با حد عرض نامحدود شبکه‌های عصبی بیزی و توزیع بیش از توابعی است که توسط شبکه‌های عصبی غیر بیزی پس از مقداردهی اولیه تصادفی تحقق یافته است.^[۷]^[۸]^[۹]^[۱۰]
از همان محاسبات اصولی که برای استخراج هسته NNGP استفاده می شود در انتشار اطلاعات عمیق نیز برای مشخص کردن انتشار اطلاعات در مورد گرادیان ها و ورودی ها از طریق یک شبکه عمیق استفاده می شود. ^[۱۱] این مشخصه برای پیش بینی اینکه چگونه آموزش پذیری مدل به معماری و هایپر پارامترهای اولیه بستگی دارد استفاده می شود.

هسته مماس عصبی، تکامل پیش‌بینی‌های شبکه عصبی را در طول آموزش نزول گرادیان توصیف می‌کند. در محدوده عرض نامتناهی، NTK معمولاً ثابت می‌شود، و اغلب اجازه می‌دهد تا عبارات فرم بسته برای تابع محاسبه‌شده توسط یک شبکه عصبی گسترده در طول آموزش نزول گرادیان انجام شود.^[۱۲] پویایی آموزش اساساً خطی می شود.^[۱۳]

مطالعه شبکه‌های عصبی با پهنای بی‌نهایت با مقیاس وزن اولیه متفاوت و نرخ‌های یادگیری مناسب، منجر به پویایی‌های آموزشی غیرخطی کیفی متفاوتی نسبت به آنچه که توسط هسته مماس عصبی ثابت توصیف شده است، می‌شود.^[۱۴]^[۱۵]

منجنیق پویا آموزش پویا شبکه‌های عصبی را در موردی توصیف می‌کند که لجیت‌ها تا بی‌نهایت واگرا می‌شوند، زیرا عرض لایه تا بی‌نهایت گرفته می‌شود، و ویژگی‌های کیفی آموزش اولیه پویا را توصیف می‌کند.^[۱۶]

منابع[ویرایش]

↑ Novak, Roman; Bahri, Yasaman; Abolafia, Daniel A. ; Pennington, Jeffrey; Sohl-Dickstein, Jascha (2018-02-15). "Sensitivity and Generalization in Neural Networks: an Empirical Study". International Conference on Learning Representations. arXiv:1802.08760. Bibcode:2018arXiv180208760N.
↑ Canziani, Alfredo; Paszke, Adam; Culurciello, Eugenio (2016-11-04). "An Analysis of Deep Neural Network Models for Practical Applications". arXiv:1605.07678. Bibcode:2016arXiv160507678C.
↑ Novak, Roman; Xiao, Lechao; Lee, Jaehoon; Bahri, Yasaman; Yang, Greg; Abolafia, Dan; Pennington, Jeffrey; Sohl-Dickstein, Jascha (2018). "Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes". International Conference on Learning Representations. arXiv:1810.05148. Bibcode:2018arXiv181005148N.
↑ Neyshabur, Behnam; Li, Zhiyuan; Bhojanapalli, Srinadh; LeCun, Yann; Srebro, Nathan (2019). "Towards understanding the role of over-parametrization in generalization of neural networks". International Conference on Learning Representations. arXiv:1805.12076. Bibcode:2018arXiv180512076N.
↑ Lawrence, Steve; Giles, C. Lee; Tsoi, Ah Chung (1996). "What size neural network gives optimal generalization? convergence properties of backpropagation". CiteSeerX 10.1.1.125.6019.
↑ Bartlett, P.L. (1998). "The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network". IEEE Transactions on Information Theory. 44 (2): 525–536. doi:10.1109/18.661502. ISSN 1557-9654.
↑ Neal, Radford M. (1996), "Priors for Infinite Networks", Bayesian Learning for Neural Networks, Lecture Notes in Statistics, 118, Springer New York, pp. 29–53, doi:10.1007/978-1-4612-0745-0_2, ISBN 978-0-387-94724-2
↑ Lee, Jaehoon; Bahri, Yasaman; Novak, Roman; Schoenholz, Samuel S.; Pennington, Jeffrey; Sohl-Dickstein, Jascha (2017). "Deep Neural Networks as Gaussian Processes". International Conference on Learning Representations. arXiv:1711.00165. Bibcode:2017arXiv171100165L
↑ G. de G. Matthews, Alexander; Rowland, Mark; Hron, Jiri; Turner, Richard E.; Ghahramani, Zoubin (2017). "Gaussian Process Behaviour in Wide Deep Neural Networks". International Conference on Learning Representations. arXiv:1804.11271. Bibcode:2018arXiv180411271M.
↑ Hron, Jiri; Bahri, Yasaman; Novak, Roman; Pennington, Jeffrey; Sohl-Dickstein, Jascha (2020). "Exact posterior distributions of wide Bayesian neural networks". ICML 2020 Workshop on Uncertainty & Robustness in Deep Learning. arXiv:2006.10541.
↑ Schoenholz, Samuel S.; Gilmer, Justin; Ganguli, Surya; Sohl-Dickstein, Jascha (2016). "Deep information propagation". International Conference on Learning Representations. arXiv:1611.01232.
↑ Jacot, Arthur; Gabriel, Franck; Hongler, Clement (2018). "Neural tangent kernel: Convergence and generalization in neural networks". Advances in Neural Information Processing Systems. arXiv:1806.07572.
↑ Lee, Jaehoon; Xiao, Lechao; Schoenholz, Samuel S.; Bahri, Yasaman; Novak, Roman; Sohl-Dickstein, Jascha; Pennington, Jeffrey (2020). "Wide neural networks of any depth evolve as linear models under gradient descent". Journal of Statistical Mechanics: Theory and Experiment. 2020 (12): 124002. arXiv:1902.06720. doi:10.1088/1742-5468/abc62b. S2CID 62841516.
↑ Mei, Song Montanari, Andrea Nguyen, Phan-Minh (2018-04-18). A Mean Field View of the Landscape of Two-Layers Neural Networks. OCLC 1106295873.
↑ Nguyen, Phan-Minh; Pham, Huy Tuan (2020). "A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks". arXiv:2001.11443 [cs.LG].
↑ Lewkowycz, Aitor; Bahri, Yasaman; Dyer, Ethan; Sohl-Dickstein, Jascha; Gur-Ari, Guy (2020). "The large learning rate phase of deep learning: the catapult mechanism". arXiv:2003.02218 [stat.ML].

[1] Novak, Roman; Bahri, Yasaman; Abolafia, Daniel A. ; Pennington, Jeffrey; Sohl-Dickstein, Jascha (2018-02-15). "Sensitivity and Generalization in Neural Networks: an Empirical Study". International Conference on Learning Representations. arXiv:1802.08760. Bibcode:2018arXiv180208760N.

[2] Canziani, Alfredo; Paszke, Adam; Culurciello, Eugenio (2016-11-04). "An Analysis of Deep Neural Network Models for Practical Applications". arXiv:1605.07678. Bibcode:2016arXiv160507678C.

[3] Novak, Roman; Xiao, Lechao; Lee, Jaehoon; Bahri, Yasaman; Yang, Greg; Abolafia, Dan; Pennington, Jeffrey; Sohl-Dickstein, Jascha (2018). "Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes". International Conference on Learning Representations. arXiv:1810.05148. Bibcode:2018arXiv181005148N.

[4] Neyshabur, Behnam; Li, Zhiyuan; Bhojanapalli, Srinadh; LeCun, Yann; Srebro, Nathan (2019). "Towards understanding the role of over-parametrization in generalization of neural networks". International Conference on Learning Representations. arXiv:1805.12076. Bibcode:2018arXiv180512076N.

[5] Lawrence, Steve; Giles, C. Lee; Tsoi, Ah Chung (1996). "What size neural network gives optimal generalization? convergence properties of backpropagation". CiteSeerX 10.1.1.125.6019.

[6] Bartlett, P.L. (1998). "The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network". IEEE Transactions on Information Theory. 44 (2): 525–536. doi:10.1109/18.661502. ISSN 1557-9654.

[7] Neal, Radford M. (1996), "Priors for Infinite Networks", Bayesian Learning for Neural Networks, Lecture Notes in Statistics, 118, Springer New York, pp. 29–53, doi:10.1007/978-1-4612-0745-0_2, ISBN 978-0-387-94724-2

[8] Lee, Jaehoon; Bahri, Yasaman; Novak, Roman; Schoenholz, Samuel S.; Pennington, Jeffrey; Sohl-Dickstein, Jascha (2017). "Deep Neural Networks as Gaussian Processes". International Conference on Learning Representations. arXiv:1711.00165. Bibcode:2017arXiv171100165L

[9] G. de G. Matthews, Alexander; Rowland, Mark; Hron, Jiri; Turner, Richard E.; Ghahramani, Zoubin (2017). "Gaussian Process Behaviour in Wide Deep Neural Networks". International Conference on Learning Representations. arXiv:1804.11271. Bibcode:2018arXiv180411271M.

[10] Hron, Jiri; Bahri, Yasaman; Novak, Roman; Pennington, Jeffrey; Sohl-Dickstein, Jascha (2020). "Exact posterior distributions of wide Bayesian neural networks". ICML 2020 Workshop on Uncertainty & Robustness in Deep Learning. arXiv:2006.10541.

[11] Schoenholz, Samuel S.; Gilmer, Justin; Ganguli, Surya; Sohl-Dickstein, Jascha (2016). "Deep information propagation". International Conference on Learning Representations. arXiv:1611.01232.

[12] Jacot, Arthur; Gabriel, Franck; Hongler, Clement (2018). "Neural tangent kernel: Convergence and generalization in neural networks". Advances in Neural Information Processing Systems. arXiv:1806.07572.

[13] Lee, Jaehoon; Xiao, Lechao; Schoenholz, Samuel S.; Bahri, Yasaman; Novak, Roman; Sohl-Dickstein, Jascha; Pennington, Jeffrey (2020). "Wide neural networks of any depth evolve as linear models under gradient descent". Journal of Statistical Mechanics: Theory and Experiment. 2020 (12): 124002. arXiv:1902.06720. doi:10.1088/1742-5468/abc62b. S2CID 62841516.

[14] Mei, Song Montanari, Andrea Nguyen, Phan-Minh (2018-04-18). A Mean Field View of the Landscape of Two-Layers Neural Networks. OCLC 1106295873.

[15] Nguyen, Phan-Minh; Pham, Huy Tuan (2020). "A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks". arXiv:2001.11443 [cs.LG].

[16] Lewkowycz, Aitor; Bahri, Yasaman; Dyer, Ethan; Sohl-Dickstein, Jascha; Gur-Ari, Guy (2020). "The large learning rate phase of deep learning: the catapult mechanism". arXiv:2003.02218 [stat.ML].

[۱]

[۲]

[۳]

[۴]

[۵]

[۶]

[۷]

[۸]

[۹]

[۱۰]

[۱۱]

[۱۲]

[۱۳]

[۱۴]

[۱۵]

[۱۶]