How python3 automatically converts csv files into libsvm executable files!

1 Introduction

The method of automatically converting many csv files that can be found into libsvm executable files still needs to be based on matlab.

However, downloading and installing the corresponding libsvm installation package and writing the corresponding configuration file will always cause errors. The reason is that win10 or even w11 systems are commonly used now, and these tutorials are based on w7, which has fallen behind.

 ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑ ↑↑Only applicable to win7! ! !

If readers want the resource link here: Link: https://pan.baidu.com/s/12vnI3RdcX7PAIARVc2xLyg 
Extraction code: 7yj7

---------------------------------------------------------------------------------------------------------------------------------

2. Specific methods

As python is widely popular in the field of machine learning, then I will introduce a python method.

First create a file called csv2libsvm.py. Right-click to open in notepad++.

 If the python installed on your computer is version 3.0 or above, write the following code in it and save it:

#!/usr/bin/env python

"""
Convert CSV file to libsvm format. Works only with numeric variables.
Put -1 as label index (argv[3]) if there are no labels in your file.
Expecting no headers. If present, headers can be skipped with argv[4] == 1.
"""

import sys
import csv
from collections import defaultdict

def construct_line( label, line ):
	new_line = []
	if float( label ) == 0.0:
		label = "0"
	new_line.append( label )

	for i, item in enumerate( line ):
		if item == '' or float( item ) == 0.0:
			continue
		new_item = "%s:%s" % ( i + 1, item )
		new_line.append( new_item )
	new_line = " ".join( new_line )
	new_line += "\n"
	return new_line

# ---

input_file = sys.argv[1]
output_file = sys.argv[2]

try:
	label_index = int( sys.argv[3] )
except IndexError:
	label_index = 0

try:
	skip_headers = sys.argv[4]
except IndexError:
	skip_headers = 0

i = open( input_file, 'rt' )
o = open( output_file, 'wb' )

reader = csv.reader( i )

if skip_headers:
	headers = next(reader)

for line in reader:
	if label_index == -1:
		label = '1'
	else:
		label = line.pop( label_index )

	new_line = construct_line( label, line )
	o.write( new_line.encode() )

And if you still have python2 installed on your computer, please write the following code and save it:

#!/usr/bin/env python

"""
Convert CSV file to libsvm format. Works only with numeric variables.
Put -1 as label index (argv[3]) if there are no labels in your file.
Expecting no headers. If present, headers can be skipped with argv[4] == 1.
"""

import sys
import csv
from collections import defaultdict

def construct_line( label, line ):
	new_line = []
	if float( label ) == 0.0:
		label = "0"
	new_line.append( label )

	for i, item in enumerate( line ):
		if item == '' or float( item ) == 0.0:
			continue
		new_item = "%s:%s" % ( i + 1, item )
		new_line.append( new_item )
	new_line = " ".join( new_line )
	new_line += "\n"
	return new_line

# ---

input_file = sys.argv[1]
output_file = sys.argv[2]

try:
	label_index = int( sys.argv[3] )
except IndexError:
	label_index = 0

try:
	skip_headers = sys.argv[4]
except IndexError:
	skip_headers = 0

i = open( input_file, 'rb' )
o = open( output_file, 'wb' )

reader = csv.reader( i )

if skip_headers:
	headers = reader.next()

for line in reader:
	if label_index == -1:
		label = '1'
	else:
		label = line.pop( label_index )

	new_line = construct_line( label, line )
	o.write( new_line )

Next, put this csv2libsvm.py file in the python path. For example, I put it in the Scripts folder of anaconda and base environment ( D:\anaconda\Scripts\csv2libsvm.py ). If you are also using anaconda, you can refer to me Methods.

Then, find the location of the csv file you need to change. For example, mine is on the desktop. Then the address is: C:\Users\Lucy\Desktop\12.csv

Next, think about the location of your converted data file. For example, if I still want to put it on the desktop, the location is: C:\Users\Lucy\Desktop\libsvm.data

Then, enter the statement in the command line:

python D:\anaconda\Scripts\csv2libsvm.py C:\Users\Lucy\Desktop\12.csv C:\Users\Lucy\Desktop\libsvm.data 0 True

purple for index

Gray-green indicates whether there is a title. If there is, write True, if not, leave it blank

(I didn't write the title, so I didn't write it in the actual gray-green part. In order to show the integrity of the sentence, I added True to the above display)

If there is no error in the input, it is successful. Go check the address you wrote. 

---------------------------------------------------------------------------------------------------------------------------------

3. Effect display

Finally, take a look at the file diagram before and after the transformation.

csv data before transformation:

The converted .data data

 -------------------------------------------------------------------------------------------------------------------------------

4. Conclusion

Finally, the main content of this article is a reference link: Convert CSV files to LIBSVM compatible data files using python|

The purpose of writing the article is that the method for python2 introduced in the reference article is not applicable to the python3 version, and then some modifications have been made to the csv2libsvm.py file.

Guess you like

Origin blog.csdn.net/m0_72663423/article/details/126312977