AUDIO FINGER PRINTING AND DETECTION VIA MATLAB

AUDIO FINGER PRINTING AND DETECTION VIA MATLAB


                    This project is named as Cryo. It relies on fingerprints of music constructed with the help of spectrographs. The application fingerprints a catalogue of music and saves these fingerprints in its database. Then the user who is interacting with this application records 10 seconds of the song though the microphone on the device and the 10 second sample is fingerprinted, this fingerprint is then stored into the local database and is then checked for a matching fingerprint in the same. If a match is found the details of the same must be displayed to the user. This article explains the terms fingerprinting and how a spectrograph is plotted. The main aim of the work is to explain how this program makes use of the algorithm and mathematical calculations to create a disguise of a comprehensive looking application.


                    Basic working:
                             1. Construct a database of features for each full-length song.
                             2. When a clip (hopefully part of one of the songs in the database) is to be                                                identified, calculate the corresponding features of the clip.
                             3. Search the database for a match with the features of the clip.

Matching a clip to a song this way can lead to computational challenges. To mitigate this, the features are simplified and preprocessed. Pairs of peaks that are close in both time and frequency are identified. resulting in the following table of information, 


For each song in the database, these feature pairs are stored in a hash table for easy access. The hash value is calculated from the vector (f1, f2, t2 − t1), so that peak pairs with the same frequencies and separation in time are considered a match. The timing t1 and the songid are stored in the hash table. When a clip is to be identifies, the list of pairs of peaks is produced, just as it would have been for a song in the database. Then the hash table is searched for each pair in the clip. This will produce a list of matches, each with different stored values of t1 and songid. Some of these matches will be accidental, either because the same peak pair occurred at another time or in another song, or because the hash table had a collision. However, we expect the correct song match to have a consistent timing offset from the clip. That is, the difference between t1 for the song and t1 for the clip should be the same for all correct matches. The song with the most matches for a single timing offset is declared the winner.



Explanation of the process:

                         Finger printing:- We first re-sample the given song so as to reduce the computational power and also to standardize the sampling frequency irrespective of any audio of any sampling frequency.
                         Then we take spectrogram of it and take the mod of amplitude and store it along with the time bin created. Now find out local peaks in the magnitudes using circshift function , We compare a magnitude with neighboring magnitudes for a defined range and decide the peaks . Sort all the peaks found in ascending order.

                          For instance if we have 400 local peaks for a clip duration of 10 seconds, we arrange them in ascending order and take the 300th magnitude as the threshold, so now if we eliminate all the magnitudes below it we will be left with only 300 peaks for 10 seconds duration of the song.

                          This way we can set the Peaks per second as per our convenience. This is done to reduce and simplify the amount of data we have.

                          Now pair the peaks found in a certain search window so that we have correspondence between them. we associate peak pairs and their respective time stamps so that we compare them with that of the peaks and time stamps obtained wile detecting a sample clip. While pairing if we encounter more than two pairs in a given window we take the largest two.

                          Store corresponding frequency of the paired magnitude and the difference in their . respective time stamps of the paired magnitudes in the database. This is done by using hash tables.

                          This program follows a very intuitive approach to compute the matches. It first fingerprints the sound clip as discussed earlier and extracts its features, then input them into a hash function to get the indices. Hash table at these indices might contain potential matches from wide different songs. The original song present in that particular index is a time shifted version of the clip, so in the original song the local peaks occur after a constant time difference, hence the time has to be offset from the original song with respect to the clip. Now after doing this we get a list of time-offset values of different songs, take the mode of time-offsets for the first level filtering to obtain the songs which have similar or exactly similar clip within them. Now for the second level of filtering we take the mode of Song ID’s to determine a single definitive match.


Conclusion 

This application uses simple calculations that require a simple logic and statistical knowledge to understand makes it possible for a million of its users of to recognize the song they hear no matter where they are and what kind of an environment it is. This algorithm can be used in many applications besides just music recognition. Due to the ability of the application to dig deep into noise we can identify music hidden behind a loud narration, such as in a radio announcement. Moreover, the algorithm is also relatively faster and can be used for copyright monitoring at a faster search speed. The algorithm is also suitable for content based cueing and indexing for library and archival. 


Future Scope

This program works fine as long as the magnitude of noise doesn’t exceed the magnitude of local peaks. The technique of forced matched filtering the clip against all the songs in a database can be implemented, machine learning can be implemented to further detect the genre of the song and compare or matched filter it against only that particular genre to make it much more efficient.





References



Co-Authors: Pradeep Pasula, Vivek Reddy


 

Comments

Post a Comment