最新消息:Welcome to the puzzle paradise for programmers! Here, a well-designed puzzle awaits you. From code logic puzzles to algorithmic challenges, each level is closely centered on the programmer's expertise and skills. Whether you're a novice programmer or an experienced tech guru, you'll find your own challenges on this site. In the process of solving puzzles, you can not only exercise your thinking skills, but also deepen your understanding and application of programming knowledge. Come to start this puzzle journey full of wisdom and challenges, with many programmers to compete with each other and show your programming wisdom! Translated with DeepL.com (free version)

python - Extract 4 million rows from database in chunks of 1 million rows each - Stack Overflow

matteradmin8PV0评论
  • I have a table in snowflake database which contains 4 to 8 millions of rows.

  • I want to extract 4 million rows in chunks to reduce load on database.

  • When loop runs in each loop, rows should append to array r (as shown in last line of code ) in chunks and at last all the data should be saved into CSV file.

I am new to python. I have seen other answers but they are not working as expected.

Can anyone help me.

chunksize= 1000000
query = "select from emp_table;"
cursor = conn.cursor()
cursor.execute(query)
r=[]

  • I have a table in snowflake database which contains 4 to 8 millions of rows.

  • I want to extract 4 million rows in chunks to reduce load on database.

  • When loop runs in each loop, rows should append to array r (as shown in last line of code ) in chunks and at last all the data should be saved into CSV file.

I am new to python. I have seen other answers but they are not working as expected.

Can anyone help me.

chunksize= 1000000
query = "select from emp_table;"
cursor = conn.cursor()
cursor.execute(query)
r=[]

Share Improve this question edited Nov 20, 2024 at 21:01 Christopher Jones 10.9k7 gold badges30 silver badges62 bronze badges asked Nov 17, 2024 at 5:20 LoganLogan 459 bronze badges 2
  • Your code doesn't help to qualify your problem. Please elaborate – Adon Bilivit Commented Nov 17, 2024 at 7:57
  • If you can sort the data, I would add a OFFSET/FETCH to your SQL query, so that you aren't selecting the entire table into memory. – Mike Walton Commented Nov 17, 2024 at 17:26
Add a comment  | 

1 Answer 1

Reset to default 1

As you mentioned, due to high volume data should get loaded in chunks to reduce load, however loading and appending in r could lead to a very memory-intensive task.

To avoid that you could load in chunks, write it and then load the next chunk and follow the same,so that we dont overload memory.

Here is a sample code

import snowflake.connector
import csv

def extract_and_write_data(conn, query, chunksize=1000000, 
    filename='extracted_data.csv'):
    cursor = conn.cursor()
    cursor.execute(query)

    with open(filename, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(cursor.description)  # Write header row

        ''' your cursor should have all the rows,
          and below while loop will fetch data in 
          each chunksize, first it will fetch 1000000, 
          then next 1000000 and so on and will write to 
          database on each fetch.At last when chunk is 
          less than 1000000, it should break meaning there
           is no more row to be written.'''
       
        while True:
          chunk = cursor.fetchmany(chunksize)
          writer.writerows(chunk)

        # Check if there are more rows to fetch
          if len(chunk) < chunksize:
             break

# Connect to Snowflake
conn = snowflake.connector.connect(
    user='your_user',
    password='your_password',
    account='your_account',
    warehouse='your_warehouse',
    database='your_database',
    schema='your_schema'
)

# SQL query to extract all rows from the table
query = "SELECT * FROM emp_table"

# Extract and write data to CSV
extract_and_write_data(conn, query)

# Close the connection
conn.close()
Post a comment

comment list (0)

  1. No comments so far