How can I combine data from a mssql database and a mysql database if I am querying in Java?

sparta_ :

I am trying to perform some analysis on data that is stored in two seperate databases, where one is a mysql server and the other is mssql. They need to be joined based on one of the columns, so that I end up with one data structure.

I have tried to store the data separately into pandas dataframes in python, joining them in pandas, then writing to a csv and loading it back into java. But this is very cumbersome to do and is not very scalable.

In essence, I have two queries like this:

MySQL

String myDriver = "org.gjt.mm.mysql.Driver";
String myUrl = "jdbc:mysql://localhost/test";
Class.forName(myDriver);
Connection conn = DriverManager.getConnection(myUrl, "root", "");
String query = "SELECT * FROM users";
Statement st = conn.createStatement();
ResultSet rs = st.executeQuery(query);

mssql

String url = "jdbc:msql://someMSsqlserver/";
Connection conn = DriverManager.getConnection(url,"","");
Statement stmt = conn.createStatement();
ResultSet rs;
rs = stmt.executeQuery("SELECT * FROM people");

And I want to have them joined together into one data structure. Is there anyway this can be done natively in Java?

O. Jones :

Various strategies you could try:

  1. Using your Java program create a temporary table on server A, then copy the data you need from the server B (using SELECT on server B and INSERT on server A). Then do appropriate queries on server A to JOIN the tables already on that server with the temporary table. You probably have permission to create temporary tables on either server.

  2. Use a permanent table on server A if you have permission to create one. Then, copy the data from server B to server A whenever it's changed with one Java program, and query it with another Java program.

  3. Slurp the data from the smaller of your two tables into a HashMap in your Java program, where the HashMap's key is the join variable. Then process the resultset from the larger table row by row, looking up the joined entry in your HashMap.

  4. Switch to MariaDB and use the CONNECT storage engine to make your SQL Server table available to your MySQL queries.

How do you choose a strategy? It depends on a lot of things.How much cooperation can you get from your DBA krewe? How big are your tables? Do you always process all the rows, or sometimes a subset? (Your example queries didn't have WHERE clauses so maybe you're processing everything.) Can you get enough RAM in your JVM instances to hold a whole table? Do you need to do this multiple times an hour, or once a week? How much time can it take each time you do it?

Pro tip: For queries like your examples, give the SQL command SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; first (on both kinds of server) so you don't block other programs from accessing your tables while you retrieve your result sets.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=162674&siteId=1