Open Access Paper
28 December 2022 Automatic conversion system of digital music score image based on image recognition algorithm
Yi Yao
Author Affiliations +
Proceedings Volume 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022); 1250637 (2022) https://doi.org/10.1117/12.2662200
Event: International Conference on Computer Science and Communication Technology (ICCSCT 2022), 2022, Beijing, China
Abstract
The development of target detection and recognition algorithms in the field of image processing has promoted the development of automatic image conversion systems for digital musical scores and related algorithms. This was crucial for the development of note-recognition coding and the conversion of musical scores into readable digital formats. The purpose of this paper is to study the automatic conversion system of digital music score image based on image recognition algorithm. The deep learning technology based on computer vision is used for musical score image recognition, and a digital musical score image automatic conversion system is designed and built. Harmonic peak method and confidence method are introduced, and their shortcomings are analyzed. Combined with the advantages of these two algorithms, an improved normal harmonic method is proposed, and the detailed ideas and steps of the algorithm are given. And the extracted fundamental frequency and time information are converted into pitch and time value combined with the basic knowledge of music theory. The recognition accuracy rate of the normal harmonic method proposed in this paper reaches more than 92%, which not only greatly improves the recognition accuracy, but also retains the strong anti-noise characteristics of the traditional harmonic peak method.

1.

INTRODUCTION

Nowadays, with the development of computers, people gradually enter the field of information technology, the form of music begins to integrate with computers, and the pace of digitalization of floating music is gradually accelerated. As far as the current situation of music development is concerned, one of the difficulties facing people’s music is the technicalization of music information1, 2. Among them, the process of shaping music is divided into two stages: composition and performance. Composition and performance are often separated, and computers can only accept code processing. So one has to break this barrier and let the computer recognize the image of the musical score3, 4.

How to organically combine the art of music with computer science, that is, to automatically convert the images of paper scores to facilitate computer recognition and performance, has become an urgent problem to be solved5. Meng, F introduced a new notation called bootleg notation, which included the position of the points of the notes relative to the staff. MIDI representations can be converted to musical scores using Western music rules, and musical images can be converted to musical scores using classical computer techniques to recognize geometric shapes. Once the MIDI and phone images are converted into sheet music, the software can be used to simulate chords6. Brahmia, Z provides another way to integrate the content of two systems (OMR and AMT) to complete the integration process. Several experimental conditions were evaluated using monophonic musical compositions to assess the performance of individual transcription systems. In general, different methods are on average 40% more accurate than simple detection methods7. Because the music score image is complex, it not only contains the music score notes, but also the graphics that are easy to interfere with the recognition result, so there is no lack of research on the processing and recognition algorithm of the score image8, 9.

This paper is an interdisciplinary research involving theoretical achievements in the following fields as prerequisites: its background involves the development and application of different common algorithms; its technology involves the establishment and basic usage rules of deep neural networks in the computer field; its data The analysis involves the role of traditional music theory in the statistical classification of music data, and the degree of automation of computer language functions in processing music data. Therefore, this paper is actually an exploratory combination of the research results in the above fields. This paper is not only an exploration of the development theory in the field of algorithmic composition, but also a summary of the construction of a music deep learning composition system and the practical process of system operation.

2.

RESEARCH ON AUTOMATIC CONVERSION SYSTEM OF DIGITAL MUSIC SCORE IMAGE BASED ON IMAGE RECOGNITION ALGORITHM

2.1

Basic knowledge of musical scores

Sound is produced by the vibration of objects. All the sounds that can be felt by human hearing are recorded as a set S, and each element (i.e., sound) in S can become a musical sound in a musical work10. Level, strength, length, and timbre are the four characteristics of sound. The sound level is determined by the number of times the sound vibrates; the duration of the sound determines the length of the sound, and the magnitude of the vibration determines the intensity, shape and nature of the sound. The branches of beats in music are called beats, and beats are marked as fractions. The tracker is used to indicate the number of singles in each scale, the denominator is used to indicate the recorded duration of the singles, and the horizontal line in the punctuation is replaced by the third line on the staff.

2.2

Image preprocessing

2.2.1

Binarization.

Fractional images are usually grayscale images and contain no color channels. For a typical standard score image, the foreground color pixels (lines and annotations) are generally pure black, and the background color pixels are pure white. Therefore, the foreground and background must be separated, and the reproduction of the score image can maximize the separation.

2.2.2

Noise Reduction.

The adaptive filter obviously increases the complexity of the algorithm, but the processing effect is very good, and a clearer picture can be obtained. The original score image is usually superimposed with salt and pepper noise, so this paper adopts an intermediate filter as the noise reduction method.

2.3

Image recognition algorithm

2.3.1

Harmonic Peak Method.

The Harmonic Peak method is a standard algorithm based on velocity modulation. It shows the relationship between the frequency and amplitude of a signal and is therefore widely used to calculate the frequency spectrum of a signal. The best matching method means that the peak with the largest amplitude in the spectrum corresponds to the fundamental frequency of the audio signal, so its frequency is considered to be the fundamental frequency. The biggest advantage of this method is that it is very simple and requires very little time and space. In actual technical work, especially in the form of musical instruments, especially in some bass areas, the fundamental frequency of the spectrum is not necessarily high.

2.3.2

Confidence Method.

In the case where the peak amplitude of the harmonic is higher than the fundamental wave, the confidence-based optimization algorithm considers that the component with the largest peak amplitude should be the fundamental wave or the nth (usually no more than 5) harmonics as the candidate fundamental frequency using 1 to 5 and then add the inverse of the nth harmonic. The confidence-based algorithm solves the problem of the maximum peak-width component harmony to a certain extent. When dealing with low-frequency sound waves, the amplitude of the fundamental component in the spectrogram is usually very low or even no, the higher harmonic components are more and the amplitude is larger.

3.

INVESTIGATION AND RESEARCH ON AUTOMATIC CONVERSION SYSTEM OF DIGITAL MUSIC SCORE IMAGE BASED ON IMAGE RECOGNITION ALGORITHM

3.1

System development environment

OpenCV is a frequently used computer vision library, which contains a large number of excellent algorithms, including target detection, target tracking, contour detection, OCR algorithm, distortion correction, image denoising, image enhancement and image binarization processing etc. The library is very easy to use, and you only need to call a simple interface to complete the corresponding functions.

The cv2.imread (img, 1) function can read the picture; “img” means the original picture to be read; 1 means reading the color picture; 0 means reading the grayscale picture. The cv2.fastNlMeansDenoising (img, None, 10, 7, 21) function denoises the image; 10, 7 and 21 represent some hyperparameters that will be used in the algorithm. The cv2.imwrite (save_name, img) function can save the image function; save_name represents the name of the image to be saved; img represents the original image input. cv2.getTextSize (text, font, fontScale=0.7, thickness=1) function can display text information on the rectangular box; text represents the text to be displayed; font represents the displayed text font; thickness represents the thickness of the text.

3.2

Experimental setup

The identified experimental environment is built with VisualStudio2010+OpenCv2.4.8 under Windows. In the experiment, the recording of 88 piano keys with a duration of 2 seconds recorded in a basically noise-free environment was selected as the test sample from 40dB to 70dB.

3.3

Improved normal harmonic method

In this paper, an improved normal harmonic method is used to extract the fundamental frequency. First, we use the discrete first and second derivatives to find the top n maximum points x1, x2, …, xn-1, xn with higher peaks in the spectrogram as candidate frequencies to construct a confidence function h(xi) to reflect the possibility that x1 is the fundamental frequency:

00121_PSISDG12506_1250637_page_3_1.jpg

Among them, g(xk) is the energy value corresponding to the candidate frequency xk; t(xk, xi) represents the proximity of xk to an integer multiple of xi. For any candidate fundamental frequency xi, t(xk,xi) is defined as follows:

00121_PSISDG12506_1250637_page_3_2.jpg

In equation (1), the confidence h(xi) of the candidate fundamental frequency xi is determined by the energy value g(xk) of all candidate fundamental frequencies and the proximity degree t(xk, xi) of the candidate frequency to an integer multiple of xi The sum of the products of. For a given value of k, the energy value g(xk) of the candidate fundamental frequency xk is a constant. At this time, only the closer xk is to an integer multiple of xi, the larger t(xk, xi) is, and the more likely xk is Nth harmonic frequency of the candidate fundamental frequency xi. Therefore, h(xi) can effectively reflect the possibility that xi is the fundamental frequency.

Considering that there is a certain error in the ratio of frequency multiplication and fundamental frequency after FFT, the rate at which the curve decreases near a positive integer can be adjusted by changing the scale parameter σ in equation (2). From the properties of normal distribution, increasing the value of σ can make the curve smoother to increase the fault tolerance rate of the curve, and conversely, it can make the curve steeper to reduce the fault tolerance rate of the curve. If xi is a multiplier, then xk/xi is very likely to be close to a positive integer, and t(xk, xi) is very high; if xi is noise, then xk/xi is usually far away from a positive integer, t(xk, xi) is very low, In this way, most of the influence of noise on the fundamental frequency confidence calculation is reduced.

4.

ANALYSIS AND RESEARCH ON AUTOMATIC CONVERSION SYSTEM OF DIGITAL MUSIC SCORE IMAGE BASED ON IMAGE RECOGNITION ALGORITHM

4.1

Automatic conversion system of digital music score images

The schematic diagram of the image processing algorithm is shown in Figure 1. The whole algorithm includes four main parts: image processing, operator detection, detection and classification, and reconstruction. The purpose of image processing is to remove some information that interferes with the image, so as to facilitate the implementation of subsequent algorithms, including image enhancement, image fusion, image processing and image processing. The main purpose of sound detection is to accurately identify the specific position of the note in the score, since all the notes are drawn on a line, identifying the correct note in the score is a fundamental task.

Figure 1.

System workflow framework diagram.

00121_PSISDG12506_1250637_page_4_1.jpg

4.2

Music recognition

Its recognition accuracy is shown in Figure 2. In the figure, the abscissa represents the signal-to-noise ratio of the added Gaussian noise, and the ordinate represents the accuracy of fundamental frequency identification of different algorithms.

Figure 2.

Comparison of accuracy rates of fundamental frequency extraction algorithms for piano 88-key recording.

00121_PSISDG12506_1250637_page_4_2.jpg

In all cases, the normal harmonic method has the highest recognition accuracy, reaching an accuracy of more than 92%, and the recognition rate of this method does not decrease significantly when the noise gradually increases, reflecting that the method is not sensitive to noise shown in Table 1. Under the condition of high signal-to-noise ratio, the confidence method has a recognition accuracy of up to 80%, which is higher than 75% of the harmonic peak method. However, with the increase of noise, after the signal-to-noise ratio drops below 60 dB, the accuracy of the confidence method drops sharply. When the signal-to-noise ratio is 50 dB, the recognition accuracy of the confidence method is only 58%, while the harmonic peak. The accuracy of the method remains stable at around 70%.

Table 1.

Comparison of recognition accuracy of three algorithms.

Signal to noise ratio (dB)Normal harmonic methodConfidence methodHarmonic peak method
50955870
60927075
70988073
80947272

5.

CONCLUSIONS

In recent years, with the rapid development of computer technology, the pace of digitalization of music information is getting faster and faster. This paper mainly realizes the automatic conversion system of digital music score image based on the image recognition algorithm. The main work completed is as follows. The overall process of the music score image recognition system is designed; the hardware equipment required to build the system is selected according to the comprehensive analysis of the system; the Windows system with excellent openness and high operating efficiency is selected as the system software environment; the OpenCv open source visual library is used to analyze the collected data. The digital music score image is processed by size transformation, brightness transformation and so on. The system still has many shortcomings and needs to be improved. At present, the system only recognizes the notes in the musical score, and is not designed to recognize complex musical score information such as clefs, diacritics, and rests. Therefore, the function of the system is not perfect, and the recognition ability of complex musical scores is still lacking, which needs further research.

REFERENCES

[1] 

Mueller, M., Arzt, A., Balke, S., Dorfer, M. and Widmer, G., “Cross-modal music retrieval and applications: An overview of key methodologies,” IEEE Signal Processing Magazine, 36 (1), 52 –62 (2018). https://doi.org/10.1109/MSP.2018.2868887 Google Scholar

[2] 

Dar, S. A. and Madhusudhan, M., “Digital nomadism: Students experience of using mobile devices in delhi metro,” Library Hi Tech News, 35 (7), 5 –10 (2018). https://doi.org/10.1108/LHTN-04-2018-0027 Google Scholar

[3] 

Bostoen, F. and Vanherpe, J., “Competition law in the digitized music industry: the winners take it all - but should they?,” Competition Policy International, (2), 30 –38 (2021). Google Scholar

[4] 

Tsai, T. J., Yang, D., Shan, M., Tanprasert, T. and Jenrungrot, T., “Using cell phone pictures of sheet music to retrieve midi passages,” IEEE Transactions on Multimedia, 22 (12), 3115 –3127 (2020). https://doi.org/10.1109/TMM.2020.2973831 Google Scholar

[5] 

Mathew, P., Vijayakumar, R., Kuriakose, A. T., Sunny, J. and Ramani, B. V., “Optical music recognition using image processing and machine learning,” International Journal of Computer Sciences and Engineering, 6 (10), 18 –23 (2018). https://doi.org/10.26438/ijcse Google Scholar

[6] 

Meng, F., Lu, T. and Li, F., “Stabilization of solvent to α-sheet structure and conversion between α-sheet and β-sheet in the fibrillation process of amyloid peptide,” The Journal of Physical Chemistry. B, 123 (45), 9576 –9583 (2019). https://doi.org/10.1021/acs.jpcb.9b07903 Google Scholar

[7] 

Brahmia, Z., Grandi, F. and Bouaziz, R., “Conversion of xml schema design styles with style volution,” International Journal of Web Information Systems, 16 (1), 23 –64 (2020). https://doi.org/10.1108/IJWIS-05-2019-0022 Google Scholar

[8] 

Gomez-Alanis, A., Gonzalez-Lopez, J. A., Dubagunta, S. P., Peinado, A. M. and Magimai-Doss, M., “On joint optimization of automatic speaker verification and anti-spoofing in the embedding space,” IEEE Transactions on Information Forensics and Security, 16 1579 –1593 (2020). https://doi.org/10.1109/TIFS.10206 Google Scholar

[9] 

Khairnar, K. and Khan, S., “Automatic early leaf spot disease segmentation on cotton plant leaf,” International Journal of Recent Technology and Engineering, 9 (2), 2277 –3878 (2020). Google Scholar

[10] 

Chung, M. J., Hirose, T., Ono, T. and Chen, P. H., “A 115× conversion-ratio thermoelectric energy-harvesting battery charger for the internet of things,” IEEE Transactions on Circuits and Systems I: Regular Papers, 67 (11), 1 –12 (2020). https://doi.org/10.1109/TCSI.8919 Google Scholar
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yi Yao "Automatic conversion system of digital music score image based on image recognition algorithm", Proc. SPIE 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022), 1250637 (28 December 2022); https://doi.org/10.1117/12.2662200
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Detection and tracking algorithms

Digital imaging

Image processing

Algorithm development

Digital image processing

Signal to noise ratio

Image enhancement

RELATED CONTENT

Digital image processing of archival photodocuments
Proceedings of SPIE (March 10 1998)
Knowledge Based Image Processing
Proceedings of SPIE (January 30 1990)
Image enhancement, image quality, and noise
Proceedings of SPIE (September 15 2005)
Wavelet and multiscale analysis of discrete contour
Proceedings of SPIE (August 20 1992)

Back to Top